Differentially Private Bayesian Programming

2 downloads 0 Views 806KB Size Report
Aug 17, 2016 - University at Buffalo, SUNY. Marco Gaboardi. ˚. University at Buffalo, SUNY. Emilio Jesъs Gallego Arias. CRI Mines-ParisTech. Andy Gordon.
Differentially Private Bayesian Programming Gian Pietro Farina: Marco Gaboardi: Emilio Jesús Gallego Arias$ Andy Gordon˛ Justin Hsu; Pierre-Yves Strub‹

Gilles Barthe‹

arXiv:1605.00283v1 [cs.PL] 1 May 2016



IMDEA Software

;

$ University at Buffalo, SUNY CRI Mines-ParisTech ; University of Pennsylvania

Abstract

Microsoft Research, Cambridge UK

over µ and σ representing our belief about what these values are. Then, we observe several pieces of data, perhaps from a scientific study we conduct. Given the model, the beliefs, and the observations, probabilistic inference produces a distribution over the parameters reflecting our updated beliefs. Probabilistic programming allows the data analyst to specify the model and the observations in the same format—as a specially crafted program. Several works have explored the design of programming languages to compute efficiently the updated beliefs in order to produce efficient and usable tools for machine learning, e.g. (Lunn et al. 2000; Pfeffer 2001; Goodman et al. 2008; Minka et al. 2012; Gordon et al. 2014; Tolpin et al. 2015). Recently, research in Bayesian inference and machine learning has turned to privacy-preserving Bayesian inference Williams and McSherry (2010); Dimitrakakis et al. (2014); Zheng (2015); Zhang et al. (2016), where the observed data is private. Bayesian inference is a deterministic process, so releasing the posterior distribution would violate differential privacy. Instead, researchers have considered many different approaches to get around this restriction and make Bayesian inference differentially private. Basic techniques add noise on the input data, or add noise on the result of the data analysis, while more advanced techniques can ensure differential privacy by taking samples from the posterior. All these approaches have different pros and cons and data analysts may want to consider one over the other depending on the application. This diversity of approaches makes Bayesian inference an attractive target for verification tools for differential privacy. In this work we present PrivInfer, a programming framework combining verification techniques for differential privacy with learning techniques for Bayesian inference in a functional setting. PrivInfer consists of two main components: a probabilistic functional language for probabilistic programming using Bayesian inference, and a relational higher-order type system that can verify differential privacy for programs written in this language. One of the delicate design choices in languages for Bayesian learning is how to make conditional distributions fit naturally in the language. The core idea of Bayesian learning is indeed to use conditional distributions to represent the beliefs updated after some observations. The solution that several languages have proposed is to use an observe statement representing explicitly conditioning on observed data. In the design of PrivInfer we follow a similar approach, by extending the functional language PCF with an observe statement. Another delicate design choice is how to distinguish between probabilistic expressions coming from Bayesian inference, and probabilistic expressions coming from added noise. This is particularly important in our setting: because Bayesian inference is a deterministic process, while in order to ensure differential privacy we need to inject randomness. To achieve this distinction in PrivInfer we consider between two

We present an expressive framework, called PrivInfer, for writing and verifying differentially private machine learning algorithms. Programs in PrivInfer are written in a rich functional probabilistic language with constructs for performing Bayesian inference. Then, differential privacy of programs is established using a relational refinement type system, in which refinements on probability types are indexed by a metric on distributions. Our framework leverages recent developments in Bayesian inference, probabilistic programming languages, and in relational refinement types. We demonstrate the expressiveness of PrivInfer by verifying privacy for several examples of private Bayesian inference.

1.

˛

Introduction

Differential privacy (Dwork et al. 2006) is emerging as a gold standard in data privacy. Its statistical guarantee ensures that the probability distribution on outputs of a data analysis is almost the same as the distribution on outputs from a hypothetical dataset that differs in one individual. A standard way to ensure differential privacy is by using mechanisms, perturbing the data analysis by adding some statistical noise. The magnitude and the shape of noise must provide a protection to the influence of an individual on the result of the analysis, while ensuring that the algorithm provides useful results. While there are many desirable features of differential privacy, two features are especially relevant for our purposes: (1) differential privacy composes, i.e. we can often prove large programs to be differentially private by proving differential privacy for the subcomponents, (2) differential privacy works well on large datasets where the presence or absence of an individual has limited impact. These two properties have lead to the design of tools for differentially private data analysis. Many of these tools use programming language techniques to ensure that the resulting programs are indeed differentially private (McSherry 2009; Reed and Pierce 2010; Barthe et al. 2012; Gaboardi et al. 2013; Eigner and Maffei 2013; Barthe et al. 2013; Barthe and Olmedo 2013; Barthe et al. 2015b; Ebadi et al. 2015). Moreover, the fact that differential privacy works well with large datasets has encouraged the interaction of the differential privacy community with the machine learning community to design privacy-preserving machine learning techniques, e.g. (Dwork et al. 2010; Chaudhuri et al. 2011; Hardt et al. 2012; Zhang et al. 2014). At the same time, researchers in probabilistic programming are exploring programming languages as tools for machine learning. For example, Bayesian inference is a common task in machine learning. Informally, there are two steps. First, we fix a probabilistic model describing how the data is generated. For instance, we may believe that the heights of individuals follow a normal distribution centered at a certain average height µ, with standard deviation σ. We do not know what µ and σ are, but we have a distribution (the prior)

1

2016/5/3

kind of distributions: symbolic distributions and actual distributions. In particular, we parametrize our language with an algorithm to perform Bayesian inference returning symbolic distributions, and mechanisms to ensure differential privacy returning actual distributions. We demonstrate the expressivity of this approach by embedding in our language the spreadsheet language Tabular by Gordon et al. (2014), proposed recently to describe Bayesian inference with spreadsheet. Differential privacy is a probabilistic 2-property, i.e. a property expressed over pairs of execution traces. Because of this it presents several challenges for formal reasoning. To address these challenges we use an approach based on approximate relational higher-order refinement type systems (Barthe et al. 2015b). We show how to extend this approach to deal with the constructions that are needed for Bayesian inference like the observe construct and the distinction between symbolic and actual distribution. In addition, an important aspect of the verification of differential privacy is reasoning about the sensitivity of a data analysis. This measures the influence that two databases differing in one individual can have on the output. Calibrating noise to sensitivity ensures that the data analysis provides sufficient privacy. In Bayesian inference, the output of the computation is a distribution (often defined by a few numeric parameters) for which one can consider different measures. A simple approach is to considered standard metrics (Euclidian, Hamming, etc) to measure the distance between the parameters. Another more principled approach is instead to consider measure of distance between distributions—rather than the parameters. To address these challenges, the type system of PrivInfer allows one to reason about the parameters of a distribution, using standard metrics, but also about the distribution itself using f -divergences, a class of probabilistic measure including some well known examples like statistical distance (also known as total variation distance), Hellinger distance, KL divergence, etc. Having these diverse methods to reason about the sensitivity of a data analysis provides different opportunities to design differentially private algorithms that guarantee different levels of privacy and accuracy. Furthermore, our type system permits to study f -divergences for probabilistic programs in general even without differential privacy. To illustrate the different features of our approach we show how different basic Bayesian data analysis can be guaranteed differentially private in three different ways: by adding noise on the input, by adding noise on the parameters with sensitivity measured using the Euclidean distance between the parameters, and finally by adding noise on the distributions with sensitivity measured using f -divergences. This shows that PrivInfer can be used for a diverse set of Bayesian data analyses.

Euclidean metrics, and by adding noise on the output parameters based on f -divergences. In the next section we will provide an informal introduction to our work. Then, in § 3 we will provide the formal background on differential privacy and f -divergences.

2.

Bayesian Inference and Motivations

Our work is motivated by Bayesian inference, a statistical method which takes a prior distribution Prpξq over a parameter ξ and some observed data x, and produces the posterior distribution Prpξ | xq, an updated version of the prior distribution. Bayesian inference is based on Bayes theorem, which gives a formula for the posterior distribution: Prpx | ξq ¨ Prpξq Prpξ | xq “ Prpxq The expression Prpx | ξq is the likelihood of ξ when x is observed. This is a function Lx pξq of the parameter ξ for a fixed data x, describing the probability of observing the data x given a specific value of the parameter ξ. Since the data x is considered fixed, the expression Prpxq denotes a normalization constant ensuring that Prpξ | xq is a probability distribution. The choice of the prior reflects the prior knowledge or belief on the parameter ξ before any observation has been performed. For convenience, the prior and the likelihood are typically chosen in practice so that the posterior belongs to the same family of distributions as the prior. In this case the prior is said to be conjugate prior for the likelihood. Using conjugate priors, besides being mathematically convenient in the derivations, ensures that Bayesian inference can be performed by a recursive process over the data. Bayesian inference can also estimate the parameters of the model, in order to predict future events. For this we can use the posterior predictive distribution, computed by marginalizing the posterior distribution over the parameter: ż Prpx˚ | xq “ Prpx˚ | ξq Prpξ | xqdξ Our goal is to perform Bayesian inference under differential privacy. We provide the formal definition of differential privacy in Definition 3.1. For the purpose of this section it is enough to keep in mind what we saw in the previous section. Differential privacy is a statistical guarantee that requires the answer to a data analysis to be statistically close same when run on two adjacent databases, i.e. databases that differ in one individual. The notion of “statistically close” measured by two parameters  and δ. A way to achieve differential privacy is by adding some statistical noise and we present several primitives for doing this in § 3. We will use here one of them, the exponential mechanism, denoted ExpMech , which returns a possible output with probability proportional to a quality score function Q. The privacy and the utility of the mechanism depend on  and on the sensitivity of the quality score function, i.e. a measure of how much the quality score can differ for two adjacent databases. As a motivating example we will consider a simple Bayesian task: learning the bias of a coin from some observations. To give it more context, we can think of the observations as medical records asserting whether patients from a sample population have a disease or not. We can perform Bayesian learning to establish how likely it is to have the disease in the population. We will show how to program this simple example and make it differentially private in PrivInfer. First, the input of this example is a set of binary observations describing whether any given patient has the disease. We consider this the private information that we want to protect. We also assume that the number of patient n is public and that the adjacency

Summing up, the contributions of our work are: • A probabilistic extension PCFp of PCF for Bayesian infer-

ence that serves as the language underlying our framework PrivInfer (§ 4). This integrates an observe statement and primitives for distinguishing symbolic distributions and actual distributions. We also provide an embedding of the language Tabular in the language of PrivInfer (§ 5). • A higher-order approximate relational type system for reasoning

about properties of two runs of programs from PrivInfer (§ 6). In particular, the type system permits to reason about f -divergences between outputs. The f -divergences can be used to reason about differential privacy as well as about program sensitivity for Bayesian inference. • We show on several examples how PrivInfer can be used to

reason about differential privacy (§ 7). We will explore three ways to guarantee differential privacy: by adding noise on the input, by adding noise on the output parameters based on

2

2016/5/3

condition for differential privacy states that two databases differ in the data of one patient. In our concrete case this means that two databases d, d1 are adjacent if all of their records are the same except for one record that is 0 in one database and 1 in the other, but we don’t know which one is which. While in abstract our problem can be described as estimating the bias of the coin, we need to be more formal and provide the precise model and the parameters that we want to estimate. We can incorporate our initial belief on the fairness of the coin using a prior distribution on the bias ξ given by a beta distribution. This is a distribution over r0, 1s with two parameters a and b and probability density: betapξ | a, bq “

is closed under post-processing ensures that this is enough to obtain differential privacy for the whole program. This gives a program as follows: ´ ¯ infer observepλr.bernoulliprq “ ExpMech Q obsq betapa, bq In the notation above we denoted by Q the scoring function. Since obs is a boolean, we can use a quality score function that gives score 1 to b if b “ obs and 0 otherwise. This function has sensitivity 1 and so one achieves p, 0q-differential privacy. This is a very simple approach, but in some situations it can already be very useful (Williams and McSherry 2010). Another intuitive approach is the one of adding noise on the output. In this case the output is the posterior which is a betapa1 , b1 q for some values a1 , b1 . Using again the exponential mechanism we can describe it as ´ ` ˘¯ ExpMech Q infer observepλr.bernoulliprq “ obsqbetapa, bq

ξ a´1 p1 ´ ξ b´1 q Bpa, bq

For a, b P R` and B denoting the beta function. We report in Figure 1 the pdf of the beta distribution for different values of the parameter a, b. We can see that for high values of a and b the confidence around the value pa´1q{pa`b´2q (the mode) increases. The likelihood is the probability that a Bernoulli distributed random

In this case, however the exponential mechanism is not applied to booleans but instead to distributions. In particular, to distributions of the shape betapa1 , b1 q. So, a natural question is which Q we can use as quality score function and what is its sensitivity in this case. There are two natural choices. The first one is to consider pa1 , b1 q as a vector and measuring the possible distance in term of some metric on vectors, e.g. the one given by `1 norm dppa, bq, pa1 , b1 qq “ |a ´ a1 | ` |b ´ b1 |. The second is to consider betapa1 , b1 q as an actual distribution and then use some statistical distance, e.g. use Hellinger distance ∆H pbetapa, bq, betapa1 , b1 qq. The two approaches give different sensitivity and have different utility. The utility also depend on the metrics that we use to estimate it (Zhang et al. 2016). So, it is important that our system PrivInfer can prove privacy for both of them.

Figure 1: beta densities

3.

Background

3.1

Probability and Distributions

Lines 1-3 of the above spreadsheet program give the prior, line 4 represent the likelihood, this asserts that the observed values come from a Bernoulli distribution with the given bias. Finally, line 5 requires to infer the posterior bias. Using an informal notation1 the program can be translated to the following functional program. ´ ¯ ` ˘ infer observe λr.bernoulliprq “ obs betapa, bq

In our work we will consider discrete distributions. Following Dwork and Roth (2014) we will use standard names for several continuous distributions but we will consider them to be the approximate discrete versions of these distributions up to arbitrary precision. We define the set DpAq of distributions over a set A as the set of functions µ :ř A Ñ r0, 1s with discrete supportpµq “ tx | µ x ‰ 0u, such that xPA µ x “ 1. In our language we will consider only distribution over basic types, this guarantees that all our distributions are discrete2 (see § 4). We will use several basic distributions like uniform, bernoulli, normal, beta, etc. These are all standard distributions and we omit their definition here. We will also use some notation to describe distributions. For instance, given an element a P A, we will denote by 1a the probability distribution that assigns all mass to the value a. We will also denote by bind µ M the composition of a distribution µ over A with a function M that takes a value in A and returns a distribution over B.

This program uses a term infer representing an inference algorithm and an observe statement used to describe the model. This expression represents the posterior distribution that is computed using Bayes theorem with prior betapa, bq and with likelihood pλr.bernoulliprq “ obsq. Now, we want to ensure differential privacy. We have several ways to proceed. A first natural idea is to perturbate the input data using the exponential mechanism. The fact that differential privacy

Differential privacy is a strong, quantitative notion of statistical privacy proposed by Dwork et al. (2006). In the standard setting, we consider a program (sometimes called a mechanism) that takes a private database d as input, and produces a distribution over outputs. Intuitively, d represents a collection of data from different individuals. When two databases d, d1 are identical except for a

1 We

2 With

variable, with parameter ξ given by the prior, is equal to the observed values. Using the notation of Tabular (Gordon et al. 2014) we can describe this model as the following spreadsheet program: 1 2 3 4 5

a b bias obs inferB

real real real real real

param param param output static output

1.0 1.0 Beta(a,b) Bernoulli(bias) infer.Bernoulli.Bias(bias)

3.2

omit in particular the monadic probabilistic constructions. A formal description of this example can be found in § 7.

Differential Privacy

this restriction extending our framework to continuous distributions can be done following the approach by Sato (2016).

3

2016/5/3

single individual’s record, we say that d and d1 are adjacent3 , and we write d Φ d1 . Then, differential privacy states that the output distributions from running the program on two adjacent databases should be statistically similar. More formally:

The Laplace mechanism. The first primitive is the standard way to construct a private version of a function that maps databases to numbers. Such functions are also called numeric queries, and are fundamental tools for statistical analysis. For instance, the function that computes the average age of all the individuals in a database is a numeric query. When the numeric query has bounded sensitivity, we can use the Laplace mechanism to guarantee differential privacy.

Definition 3.1 (Dwork et al. (2006)). Let , δ ą 0 be two numeric parameters, let D be the set of databases, and let R be the set of possible outputs. A program M : D Ñ DpRq satisfies p, δqdifferential privacy if

Definition 3.3. Let  P R` and let f : D Ñ R be a numeric query. Then, the Laplace mechanism maps a database d P D to f pdq ` ν, where ν is a draw form the Laplace distribution with scale 1{. This distribution has the following probability density function:  Lap1{ pxq “ expp´|x|q. 2 If f is a k-sensitive function, then the Laplace mechanism is pk, 0qdifferentially private.

PrpM pdq P Sq ď e PrpM pd1 q P Sq ` δ for all pairs of adjacent databases d, d1 P D such that d Φ d1 , and for every subset of outputs S Ď R. As shown by Barthe et al. (2012), we can reformulate differential privacy using a specific statistical -distance -D: Lemma 3.1. Let , δ P R` . Let D be the set of databases, and let R be the set of possible outputs. A program M : D Ñ DpRq satisfies p, δq-differential privacy iff -DpM pdq, M pd1 qq ď δ, where d, d1 are adjacent databases and ` ˘ -Dpµ1 , µ2 q ” max Pr rx P Es ´ e ¨ Pr rx P Es EĎR

xе1

The Gaussian mechanism. The Gaussian mechanism is an alternative to the Laplace mechanism, adding Gaussian noise with an appropriate standard deviation to release a numeric query. Unlike the Laplace mechanism, the Gaussian mechanism does not satisfy p, 0qprivacy for any . However, it satisfies p, δq-differential privacy for δ P R` .

xе2

for µ1 , µ2 P DpRq.

Definition 3.4. Let , δ P R and let f : D Ñ R be a numeric query. Then, the Gaussian mechanism maps a database d P D to f pdq ` ν, where ν is a draw from the Gaussian distribution with standard deviation a σp, δq “ 2 lnp1.25{δq{.

Differential privacy is an unusually robust notion of privacy. It degrades smoothly when private mechanisms are composed in sequence or in parallel, and it is preserved under any post-processing that does not depend on the private database. The following lemmas capture these properties: Lemma 3.2 (Post-processing). Let M : D Ñ DpRq be an p, δqdifferentially private program. Let N : R Ñ DpR1 q be an arbitrary randomized program. Then λd.bind pM dq N : D Ñ DpR1 q is p, δq-differentially private.

If f is a k-sensitive function for k ă 1{, then the Gaussian mechanism is pk, δq-differentially private. The Exponential mechanism. The first two primitives can make numeric queries private, but in many situations we may want to privately release a non-numeric value. To accomplish this goal, the typical tool is the Exponential mechanism (McSherry and Talwar 2007), our final primitive. This mechanism is parameterized by a set R, representing the range of possible outputs, and a quality score function q : D ˆ R Ñ R, assigning a real-valued score to each possible output given a database. The exponential mechanism releases an output r P R with approximately the largest quality score on the private database. The level of privacy depends on the sensitivity of q in the database. Formally:

Differential privacy enjoys different composition schemes, we report here one of the simpler and most used. Lemma 3.3 (Composition). Let M1 : D Ñ DpR1 q, and M2 : D Ñ DpR2 q respectively p1 , δ1 q and p2 , δ2 q differentially private programs. Let M : D Ñ DpR1 ˆ R2 q the program defined as M pxq ” pM1 pxq, M2 pxqq. Then, M is p1 ` 2 , δ1 ` δ2 q differentially private. Accordingly, complex differentially private programs can be easily assembled from simpler private components, and researchers have proposed a staggering variety of private algorithms which we cannot hope to summarize here. (Interested readers can consult Dwork and Roth (2014) for a textbook treatment.) While these algorithms serve many different purposes, the vast majority are constructed from just three private operations, which we call primitives. These primitives offer different ways to create private mechanisms from non-private functions. Crucially, the function must satisfy the following sensitivity property:

Definition 3.5 (McSherry and Talwar (2007)). Let  P R` . Let R be the set of outputs, and q : D ˆ R Ñ R be the quality score. Then, the Exponential mechanism on database d P D releases r P R with probability proportional to ˆ ˙ qpd, rq Prprq „ exp 2

Definition 3.2. Let k P R` . Suppose f : A Ñ B is a function, where A and B are equipped with distances dA and dB . Then f is k-sensitive if dB pf a, f a1 q ď k ¨ dA pa, a1 q 1 for every a, a P A.

If f is a k-sensitive function in d for any fixed r P R, then the exponential mechanism is pk, 0q-differentially private. 3.3 f -divergences As we have seen, differential privacy is closely related to function sensitivity. To verify differential privacy for the result of probabilistic inferences, we will need to work with several notions of distance between distributions. These distances can be neatly described as f -divergences (Csiszár 1963), a rich class of metrics on probability distributions. Inspired by the definition of relative entropy, f -divergences are defined by a convex function f . Formally:

Intuitively, k-sensitivity bounds the effect of a small change in the input, a property that is similar in spirit to the differential privacy guarantee. With this property in hand, we can describe the three basic primitive operations in differential privacy, named after their noise distributions.

Definition 3.6 (Csiszár and Shields (2004)). Let f pxq be a convex function defined for x ą 0, with f p1q “ 0. Let µ1 , µ2 two

3 In

our concrete examples we will consider sometime as adjacent also two databases that differ by at most one individual.

4

2016/5/3

f -diverg.

f pxq 1 2

SDpxq HDpxq

1 2

|x ´ 1|

? p x ´ 1q2

KLpxq

x lnpxq ´ x ` 1

-Dpxq

maxpx ´ e , 0q

uniform : Drr0, 1ss bernoulli : r0, 1s Ñ DrBs beta : R` ˆ R` Ñ Drr0, 1ss normal : R ˆ R` Ñ DrRs

Simplified form ÿ1 |µ1 paq ´ µ2 paq| 2 aPA ´ ¯2 ÿ1 a a µ1 paq ´ µ2 paq 2 aPA ´ µ paq ¯ ÿ 1 µ1 paq ln µ paq 2 aPA ´ ¯ ÿ max µ1 paq ´ e µ2 paq, 0

lapMech : R` ˆ R Ñ MrRs gaussMech : R` ˆ R Ñ MrRs expMech : R ˆ ppD, Rq Ñ Rq ˆ D Ñ MrRs Figure 3: Primitive distributions types.

aPA

Table 1: f -divergences for statistical distance (SD), Hellinger distance (HD), KL divergence (KL), and -distance (-D)

• pKL, KLq are KL composable.

This form of composition will be internalized by the relational refinement type system that we will present in § 6. distributions over A. Then, the f -divergence of µ1 from µ2 , denoted ∆f pµ1 | µ2 q is defined as: ´ µ paq ¯ ÿ 1 ∆f pµ1 | µ2 q “ µ2 paqf µ paq 2 aPA

4. PrivInfer system The main components of PrivInfer are a language that permits to express Bayesian inference models and a type system for reasoning in a relational way about programs from the language.

where we assume 0 ¨ f p 00 q “ 0 and ´ f puq ¯ ´a¯ ´a¯ “ lim t ¨ f “ a lim . 0¨f tÑ0 uÑ8 0 t u If ∆f pµ1 | µ2 q ď δ we say that µ1 and µ2 are pf, δq-close.

4.1

The language

The language underlying PrivInfer is a probabilistic programming extension of PCF that we will call PCFp . Expressions of PCFp are defined by the following grammar

Examples of f -divergences include KL-divergence, Hellinger distance, and total variation distance. Moreover, Barthe and Olmedo (2013) showed how the -distance of Lemma 3.1 can be seen as an f -divergence for differential privacy. These f -divergences are summarized in Table 1. Notice that some of the f -divergences in the table above are not symmetric. In particular, this is the case for KL-divergence and -distance, which we use to describe p, δqdifferential privacy. We will denote by F the class of functions meeting the requirements of Definition 3.6. Not only do f -divergences measure useful statistical quantities, they also enjoy several properties that are useful for formal verification. (e.g. see (Csiszár and Shields 2004)). A property that is worth mentioning and that will be used implicitly in our example is the following.

e

::“ | | | | |

x | c | e e | λx. e letrec f x “ e | case e with rdi xi ñ ei si return e | mlet x “ e in e observe x ñ e in e | inferres | ranpeq bernoullipeq | normalpe, eq || betape, eq | uniformpq lapMechpe, eq | gaussMechpe, eq | expMechpe, e, eq

where c represent a constant from a set C. We will consider only expressions that are well typed using simple types of the form τ τr

::“ ::“

τr | Mrr τ s | MrDrr τ ss | Drr τs | τ Ñ τ ` ‚ | B | N | R | R` | R | r0, 1s | τr list.

where τr are basic types. As usual a typing judgment is a judgment of the shape Γ $ e : τ where an environment Γ is an assignment of types to variables. The simply typed system is given in Figure 2. The syntax and types of PCFp extends the one of PCF by means of several constructors. Basic types include the unit type ‚ and types for booleans B and natural numbers N. We also have types for real numbers R, positive real numbers R` , positive real number ` plus infinity R and for real numbers in the unit interval r0, 1s. Finally we have lists over basic types. Simple types combines basic types using arrow types, a probability monad Mrr τ s over the basic τ s representing symbolic distributions over type τr, and a type Drr the basic type τr. The probability monad can also be over symbolic distributions. We consider only terminating programs, where termination is ensured by requiring a SN -guard in the LetRec rule, e.g. the SN -guard can be achieved by using sized types or structural recursion. Probabilities (actual distributions) are encapsulated in the probabilistic monad Mrr τ s that can be manipulated by the let-binder mlet x “ e1 in e2 and by the unit return e. Symbolic distributions are built using basic probabilistic primitives like bernoullipeq for Bernoulli distributions, normalpe1 , e2 q for normal distribution, etc. These primitives are assigned types as described in Figure 3. For symbolic distributions we also assume that we have an operation getParams to extract the parameters. We also have primitives lapMechpe1 , e2 q, gaussMechpe1 , e2 q and

Theorem 3.1 (Data processing inequality). Let f P F, µ1 , µ2 be two distributions over A, and M1 , M2 be two functions mapping values in A to distributions over B. Then, we have: ∆f pbind µ1 M1 , bind µ2 M2 q ď ∆f pµ1 , µ2 q Another important property for our framework is composition. As shown by Barthe and Olmedo (2013) we can compose f divergences in an additive way. More specifically, they give the following definition. Definition 3.7 (Barthe and Olmedo (2013)). Let f1 , f2 , f3 P F. We say that pf1 , f2 q are f3 composable if and only if for every A, B, two distributions µ1 , µ2 over A, and two functions M1 , M2 mapping values in A to distributions over B we have ∆f3 pbind µ1 M1 , bind µ2 M2 q ď ∆f1 pµ1 , µ2 q ` sup ∆f2 pM1 v, M2 vq v

In particular, we have the following. Lemma 3.4 (Barthe and Olmedo (2013)). • p1 -D, 2 -Dq are p1 ` 2 q-DP composable. • pSD, SDq are SD composable. • pHD, HDq are HD composable.

5

2016/5/3

VAR

L ET R EC



x P dompΓq Γ $ x : xΓ

A PP

Γ $ e1 : τ Ñ σ Γ $ e2 : τ Γ $ e1 e2 : σ

Γ, f : τr Ñ σ $ λx. e : τr Ñ σ SN -guard Γ $ letrec f x “ e : τr Ñ σ U NIT M

O BSERVE

B IND M

Γ $ e1 : Mrr τs Γ, x : τr $ e2 : MrBs τs Γ $ observe x ñ e2 in e1 : Mrr

Γ, x : τ $ e : σ Γ $ λx. e : τ Ñ σ

Γ $ e : τr list Γ $ e1 : σ Γ, x : τr, xs : τr list $ e2 : σ Γ $ case e with r ñ e1 , x :: xs ñ e1 s : σ

C ASE

$ MrT s Γ$e:T Γ $ return e : MrT s

F UN

R AN

Γ $ e1 : MrT1 s Γ, x : T1 $ e2 : MrT2 s Γ $ mlet x “ e1 in e2 : MrT2 s Γ $ e : Drr τs τs Γ $ ranpeq : Mrr

I NFER

Γ $ e : Mrr τs τs Γ $ inferres : Drr

Figure 2: PCFp type system

theorem. The semantics of infer relies on a given algorithm4 AlgInf for inference. We leave the algorithm unspecified because it is not central to our verification task. We will also restrict our attention to inference that never fail: they can always synthesize a symbolic distribution. To avoid this restriction one can use a maybe monad to deal with possible failures—we omit this to focus just on the verification of differential privacy and to keep our system simpler. Symbolic distributions are just syntactic constructions, this is reflected in their interpretation. For an example, we give in Figure 4 the interpretation JbernoullipeqKθ . The operator ran turns a symbolic distribution in an actual distribution. Its semantics is defined by cases on the given symbolic distribution. In Figure 4 we give its interpretation for the case when the given symbolic distribution is bernoullipeq. The cases for the other symbolic distributions are similar. The soundness of the semantics can be stated as follows.

expMechpe1 , e2 , e3 q that provide implementations for the mechanism ensuring differential privacy as described in § 3.2. Finally, we have three special constructs for representing learning. The primitive observe x ñ e1 in e2 can be used to describe conditional distributions. This is a functional version of a similar primitive used in languages like Fun (Gordon et al. 2013) and others. In the PCFp language this primitive takes two arguments, a prior e2 and a predicate e1 over x. The intended semantics is the one provided by Bayes theorem: it filters the prior by means of the observation provided by e1 and renormalize the obtained distribution (see Section § 4.2 for more details). The primitives inferres and ranpeq are used to transform symbolic distributions in actual distributions and viceversa. In particular, inferres is the main component performing probabilistic inference. We will sometimes need to distinguish expressions that use different set of variables. In these cases we will write PCFp pX q for the set of expressions using variables in X . 4.2

Lemma 4.1. If Γ $ e : τ and θ validates Γ, then JeKθ P Jτ K.

Denotational Semantics

5.

The semantics of PCFp is largely standard. Basic types are interpreted in the corresponding sets, e.g. J‚K “ t‚u, JBK “ ttrue, falseu, JNK “ t0, 1, 2, . . .u, etc. As usual, arrow types Jτ Ñ σK are interpreted as set of functions Jτ K Ñ JσK. A monadic type Mrτ s for τ P tr τ , Drr τ su is interpreted as the set of discrete probabilities over τ , i.e.: ÿ ( JMrτ sK “ µ : Jτ K Ñ R` | supppµq discrete ^ µx “ 1

From Tabular to PrivInfer

In this section we show how a modern language for Bayesian inference can be used to write programs that can be verified in PrivInfer. We will consider the language Tabular by Gordon et al. (2014). Tabular is a spreadsheet-like probabilistic programming language for machine learning. It provides a handy interface for writing Bayesian networks as spreadsheets and it integrates several algorithms for exact and approximate inference. Tabular comprises two parts: a language for describing relational schemas, and a language for annotating a relational schema with expressions denoting probabilistic models. Using these two languages one can express a rich set of Bayesian model as spreadsheets. These model are then compiled to infer.NET (Minka et al. 2012) and run. The language for relational schemas is quite general. In its core part it can be described by the following grammar:

xPJτ K

Types of the shape Drr τ s are interpreted in set of symbolic representations for distributions parametrized by values. As an example, DrBs is interpreted as: ( JDrBsK “ bernoullipvq | v P r0, 1s

σ ::“rti ÞÑ τiiP1..n s

The interpretation of expressions is given as usual under a validation θ which is a finite map from variables to values in the interpretation of types. We will say that θ validates an environment Γ if @x : τ P Γ we have θpxq P Jτ K. For most of the expressions the interpretation is standard, we detail the less standard interpretations in Figure 4. Probabilistic expressions are interpreted into discrete probabilities. In particular, Jreturn eKθ is defined as the Dirac distribution returning JeKθ with probability one. The mlet-binder mlet x “ e1 in e2 composes probabilities. The expression observe x ñ t in u filters the distribution JuKθ using the predicate x ñ t and rescale it in order to obtain a distribution. Observe is the key component to have conditional distributions and to update a prior using Bayes

τ ::“rci ÞÑ aiP1..m s i a ::“`pV q V ::“? | s | rV0 , . . . , Vn´1 s ` ::“inst | static 4 In this work we consider only exact inference, and we leave for future works to consider approximate inference. We also consider only programs with a well defined semantics: e.g. we don’t consider programs where we make observations of events with 0 probability.

6

2016/5/3

JΓ $ return e : Mrτ sKθ “ 1JeKθ

JΓ $ observe x ñ t in u : Mrτ sKθ “ d ÞÑ

JΓ $ mlet x “ e1 in e2 : MrσsKθ “ d ÞÑ JuKθ pdq ˆ pJtKθxd ptrueqq ÿ pJuKθ pgq ˆ pJtKθxg ptrueqqq

ÿ gPJτ Kθ

pJe1 Kθ pgq ˆ Je2 Kθxg pdqq

# v JΓ $ infer e : Drτ sKθ “ fail

if AlgInf e “ v otherwise

gPJτ Kθ

JΓ $ bernoullipeq : DrBsKθ “ bernoullipJeKθ q

# JeKθ JΓ $ ranpbernoullipeqq : MrBsKθ “ d ÞÑ 1 ´ JeKθ

if d “ true otherwise

Figure 4: Interpretation for some of PCFp expressions (selected rules).

hMean and hVar is our prior distribution. The likelihood is described by the code in lines 1-5, and the realization of the likelihood is given by the data points in Heights corresponding to the observations. Finally, the code in line 6 requires to infer the posterior mean distribution. This is a gaussian distribution with two parameters hMean’ and hVar’ which are updated version of the parameters in the prior. Concretely, when a Tabular program is actually run, we will have in a cell linked to line 6 the value of the posterior. Internally, this will be computed by Tabular using one of the inference algorithms provided by Infer.NET by matching the model with the realizations of the values in Heights. We have implemented a simple compiler that given a Tabular program translates it in PrivInfer. Instead of giving the formal translation, we present it with the previous spreadsheet as an example. The code generated by the translation is the following.

Here, a database σ is a record of tables. A table is a record of attributes, where every attribute is a value tagged with a level `. The level specifies if the attribute is a static attribute or a whole column. An attribute can also be empty (?) denoting that the data is missing and Tabular will then predict the value from the posterior predictive distribution learned. Once one has a relational schema, Tabular permits to add annotations describing a bayesian model. The description language is quite rich but we can consider it an extension of the following: E, F ::“e | gpE1 , . . . , En q | Dre1 , . . . , en spF1 , . . . , Fp q | if E then F1 else F2 | infer.Dre1 , . . . , en s.ypxq Here g denotes a primitive expression of arity n and D a distribution whose probability mass (or density) function has arity n. The expression infer.Dre1 , . . . , em s.ypxq denotes the parameter y of the inferred distribution D from the prior distribution specified by x. Finally, we have a conditional statement if E then F1 else F2 that can be used to build more complex models. The control flow of this language is quite simple but the compilation of a relational schema generates a richer (but still quite simple) control flow containing loops. The following is a simple example of how a typical annotated Tabular program looks like: 1 2 3 4 5 6

hMean hVar mean kv Heights inferM

real real real real real real

param param param param output static output

1. let hMean = 0 in 2. let hVar = 1 in 3. let mean = normal(hMean, hVar) in 4. let kv = 1 in 5 let rec learnMean heights prior = 6. match heights with 7. | [] -> prior 8. | h::hxs -> 9. learnMean hxs (observe 10. (fun (x:real)Ñmlet m = normal(x, kv) 11. in munit(m=h)) prior) 12. in inferM = infer(learnMean heights mean)

0.0 1.0 Gaussian(hMean,hVar) 1.0 Gaussian(mean,kv) infer.Gaussian.Mean(mean)

Here lines 1-2 and 4 initialize the parameter kv and the hyperparameters hMean and hVar, while Line 3 generates the needed prior. This part is just a direct step by step translation of the same code in the spreadsheet. The interesting part is how we translate line 5 of the spreadsheet. Tabular itself, when compiled to infer.NET generates an observation for each data using a foreach constructor. Unsurprisingly, we translate this with a let rec that structurally recurse on the list of observed values. For each of this we build a predicate in line 10-11 which is the one given by line 5 of the spreadsheet with some renaming and with additional monadic constructions. Once we have this code we pass it to the infer algorithm and we store the result in inferM. This is generated by the translation of line 6 of the spreadsheet.

This program is also associated with a table of data containing one column named Heights whose elements have type real. We omit this table here. Lines 1-5 in the spreadsheet describe a possible model of the data in Heights. In particular, in line 5 the model says that data points in Heights are generated using a Gaussian distribution with parameters mean and kv. The parameter kv is given as the constant value 1.0 in line 4. The parameter mean is instead described by the data points described by another Gaussian distribution, this time with parameters hMean and hVar of value 0.0 and 1.0, respectively. Parameters of a distribution that is itself used as parameter are often called hyper-parameters. This model asserts that the data is generated by a gaussian whose mean comes from a gaussian distribution with hyper-parameters hMean and hVar, and whose variance is 1.0. To map this model to the standard terminology of Bayesian inference we notice that the gaussian distribution with hyper-parameters

6.

Relational Type System

6.1

Relational Typing

To reason about differential privacy as well as about f -divergences we will consider a higher-order relational refinement type system. We follow the approach proposed by Barthe et al. (2015b).

7

2016/5/3

We will distinguish two sets of variables: XR (relational) and XP (plain). Associated with every relational variable x P XR , we have a left instance xŸ and a right instance xŹ . We write XR ’ for Ť ’ for XR ’ Y XP . xPXR txŸ , xŹ u and X The set of PrivInfer expressions E is the set of expressions defined over plain variables, i.e. expressions in PCFp pXP q. The set of PrivInfer relational expressions E ’ is the set of expressions defined over plain and relational variables, expressions in PCFp pX ’ q, where only non-relational variables can be bound. The sets of relational types T “ tT, U, . . .u and assertions A “ tφ, ψ, . . .u are defined by the following grammars: T, U P T

::“

φ, ψ P A

::“

C



G $ tx :: T | φu S-L G $ tx :: T | φu ĺ T

G $ T ĺ U }G, x :: U } $ φ @θ. θ ( G, x :: T ñ JφKθ S-R G $ T ĺ tx :: U | φu `

G$T ĺU }G} $ δi : R }G} $ fi P F @θ. θ ( G, x :: T ñ Jf1 ď f2 ^ δ1 ď δ2 ă 8Kθ S-M G $ Mf1 ,δ1 rT s ĺ Mf2 ,δ2 rU s S-P

τr | Mf,δ rtx :: τr | φus | Mf,δ rtx :: Drr τ s | φus | Drr τ s | Πpx :: T q. T | tx :: T | φu Q px : τ q. φ px P XP q Q px :: T q. φ px P XR q

G $ T2 ĺ T1 G, x :: T2 $ U1 ĺ U2 G $ Πpx :: T1 q. U1 ĺ Πpx :: T2 q. U2

Figure 5: Relational Subtyping (selected rules)

for the unit and composition of monads but additionally they require the indices to well behave. In particular pf1 , f2 q are required to be f3 composable to guarantee that the composition is respected. The rules Infer and Ran are similar to their simply typed version but they also transfer the information on the f -divergence from the indices to the refinements and viceversa. The rule Observe requires that the subexpressions are well typed relationally and it further requires the validity of the assertion:

’ ’ | ∆D f pe , e q ď δ | f P F ’ | e “ e’ | e’ ď e’ | Cpφ1 , . . . , φn q tJ{0, K{0, {1, _{2, ^{2, ñ{2u,



where f, δ, e P E ’ , and Q P t@, Du. Relational types extend simple types by means of relational refinements of the shape tx :: T | φu. This is a refinement type that uses a relational assertion φ stating some relational property that the inhabitants of this type have to satisfy. Relational assertions are first order formulas over some basic predicates: ’ ’ ∆D f pe , e q ď δ asserting a bound on a specific f -divergence, f P F asserting that f is a convex function meeting the requirements of Definition 3.6, and e’ “ e’ and e’ ď e’ for the equality and inequality of relational expressions, respectively. Moreover, in relational assertions we have two kinds of quantifiers, for plain and relational variables, respectively. Relational types also refines the probability monad. This has now the shape Mf,δ rtx :: T | φus, for T P tr τ , Drr τ su, and it corresponds to a polymonad (Hicks et al. 2014) or parametric effect monads (Katsumata 2014) where f and δ are parameters useful to reason about f -divergences. Relational expressions can appear in relational refinements and in the parameters of the probabilistic monad, so the usual arrow type constructor is replaced by the dependent type constructor Πpx :: T q. S. Before introducing the typing rule of PrivInfer we need to introduce some notation. A relational environment G is a finite sequence of bindings px :: T q such that each variable x is never bound twice and only relational variables from XR are bound. We write xG for the type of x in G. We will denote by |¨| the type erasure from relational types to simple types and its extension to environments. We will use instead the notation }¨}, to describe the following map from relational environments to environments xs }G} “ x|G| iff x P dompGq, where s P tŸ, Źu. For example, given a relational binding px :: T q, we have }px :: T q} “ xŸ : |T |, xŹ : |T |. We can now present our relational type system. PrivInfer proves typing judgment of the form G $ e1 „ e2 :: T . We will use Γ $ e :: T as a shorthand for Γ $ e „ e :: T . Several of the typing rules of PrivInfer come from the system proposed by Barthe et al. (2015b). We report some of them in Figure 6. Notice in particular that we have two kinds of rules: one sided and two sided. The latter are used to prove judgments of the shape G $ e1 „ e2 :: T for e1 and e2 not necessarily with the same syntactic structure. In these rules we also include a subtyping rule using the subtyping relation described in Figure 5. We present the rule that are specific to PrivInfer in Figure 7. The rules UnitM and BindM correspond to the unit and the composition of the probabilistic polymonad. These are similar to the usual rule

}G} $ ∆f pobserve xŸ ñ eŸ in e1Ÿ , observe xŹ ñ eŹ in e1Ź q ď δ 2 for the δ 2 that can then be used as a bound for the f -divergence in the conclusion. This assertion may be surprising at first, since it doesn’t consider the bounds δ and δ 1 for the subexpressions but instead require to provide directly a bound for the conclusion—it is not really compositional. The reason for this presentation is that a generic bound in term of δ and δ 1 would be often too large to say something useful about the conclusion. In fact, estimating bounds on the f -divergence of conditional distributions is an hard task and often it is only expressed in term of prior perturbations (Dey and Birmiwal 1994). So, instead we prefer to postpone the task to verify a bound to the concrete application where a tighter bound can be found with some calculation. We will see some uses of this rule in the examples in § 7. 6.2

Relational Interpretation

We want now to give a relational interpretation of relational types so that we can prove the soundness of the relational type system of PrivInfer. Before doing this we need to introduce an important component of our interpretation, the notion of pf, δq-lifting of a relation, inspired from the relational lifting of f -divergences by Barthe and Olmedo (2013). Definition 6.1 (pf, δq-Lifting of a relation Ψ). Let Ψ Ď T1 ˆT2 , let f be a convex function providing an f -divergence and let δ P R` . Then, we have that µ1 P MrT1 s and µ2 P MrT2 s are in the pf, δqlifting of Ψ, denoted Lpf,δq pΨq iff there exist two distributions µL , µR P MrT1 ˆ T2 s such that 1. µi pa, bq ą 0 implies pa, bq P Ψ, for i P tL, Ru, 2. π1 µL “ µ1 ^ π2 µR “ µ2 , and 3. ∆f pµL , µR q ď δ. ř ř where π1 µ “ λx. y µ px, yq and π2 µ “ λy. x µ px, yq. We will call the distributions µL and µR the left and right witnesses for the lifting, respectively. This notion of lifting will be used to give a relational interpretation of monadic types. We say that a valuation θ validates a relational environment G, denoted θ ( G, if θ ( }G} and @x P dompGq, pxŸ θ, xŹ θq P LxGMθ . We give in Figure 8 the relational interpre8

2016/5/3

VAR

x P dompGq G $ x :: xG

L ET R EC

S UB

G, x :: T $ e :: U G $ λx. e :: Πpx :: T q. U

A PP

G, f :: Πpx :: T q. U $ λx. e :: Πpx :: T q. U G $ Πpx :: T q. U SN -guard G $ letrec f x “ e :: Πpx :: T q. U

G $ e :: T G$T ĺU G $ e :: U

AC ASE

A BS

C ASE

G$T G $ e1 :: T

G $ e1 :: Πpx :: T q. U G $ e2 :: T G $ e1 e2 :: U tx ÞÑ e2 u

AR ED L EFT

e1 Ñ e11 G $ e1 „ e2 :: T 1 G $ e1 „ e2 :: T

@θ. θ ( G ñ Jp|e|Ÿ “ q ô p|e|Ź “ qKθ G $ e :: τr list G, x :: τr, y :: τr list, t|e|Ÿ “ xŸ :: yŸ ^ |e|Ź “ xŹ :: yŹ u $ e2 :: T G $ case e with r ñ e1 | x :: y ñ e2 s :: T

|G| $ e1 : |T | G$T |G| $ e : τr list 1 G, t|e|Ÿ “ u $ e1 „ e :: T G, x :: τr, y :: τr list, t|e|Ÿ “ xŸ :: yŸ u $ e2 „ e1 :: T G $ case e with r ñ e1 | x :: y ñ e2 s „ e1 :: T Figure 6: Basic Relational Typing (Selected Rules)

`

U NIT M

G $ e :: T }G} $ f P F }G} $ δ : R G $ return e :: Mf,δ rT s

I NFER

G $ e1 :: Mf1 ,δ1 rT1 s pf1 , f2 q are f3 -composable G $ Mf2 ,δ2 rT2 s G, x :: T1 $ e2 :: Mf2 ,δ2 rT2 s B IND M G $ mlet x “ e1 in e2 :: Mf3 ,δ1 `δ2 rT2 s

O BSERVE

G $ e : Mf,δ rtx :: τr | xŸ “ xŹ us τ s | ∆D G $ inferres : tx :: Drr f pxŸ , xŹ q ď δu

R AN

τ s | ∆D G $ e : tx :: Drr f pxŸ , xŹ q ď δu G $ ranpeq : Mf,δ rtx :: τr | xŸ “ xŹ us

G $ e :: Mf,δ rty :: τr | yŸ “ yŹ us G, x : τr $ e1 :: Mf,δ1 rty :: B | yŸ “ yŹ us }G} $ ∆f pobserve xŸ ñ eŸ in e1Ÿ , observe xŹ ñ eŹ in e1Ź q ď δ 2 G $ observe x ñ e in e1 :: Mf,δ2 rty :: τr | yŸ “ yŹ us Figure 7: PrivInfer Relational Typing Rules

tation JφKθ P tJ, Ku of an assertion φ with respect to a valuation θ ( Γ. This is quite standard except for the interpretation of quantifiers bounding relational variables where we need to extend the valuation θ with left and right components. Notice that we interpret the ’ ’ assertion ∆D f pe1 , e2 q ď δ with the corresponding f -divergence. We give in Figure 9 the relational interpretation JT Kθ of a relational type T with respect to the valuation θ ( }G}. This corresponds to pairs of values in the standard interpretation of PCFp expressions. To define this interpretation we use both the interpretation of relational assertions given in Figure 8 and the definition of lifting given in Definition 6.1. The interpretation of relational assertions is used in the interpretation of relational refinement types, while the lifting is used to provide interpretation to the probabilistic polymonad. Notice that the relational interpretation τ s is just the set of pairs of values in the standard of a type Drr interpretation of Drr τ s. This can then be restricted by using relational refinement types. We can then prove that the relational refinement type system is sound with respect to the relational interpretation of types.

Moreover, thanks to the characterization of differential privacy in terms of f -divergence given by Barthe and Olmedo (2013) we can refine the previous result to show that PrivInfer accurately models differential privacy. Corollary 6.2 (Differential Privacy). If $ e :: tx :: σ | Φu Ñ M-D,δ ty :: τ | yŸ “ yŹ u then JeK is p, δq-differentially private w.r.t. adjacency relation JΦK. Notice that here we use only the lifting of the relation “. Recent works by Barthe et al. (2015a) and Barthe et al. (2016a) have shown how a notion of lifting similar to the one we study here relates to more general notions of coupling and how the lifting of different relations can be used to prove differential privacy. We leave exploring this direction for future works.

7.

Examples

In this section we show how we can use PrivInfer to guarantee differential privacy for bayesian learning by adding noise on the input, noise on the output using `1 norm, and noise on the output using f -divergences. We will show some of these approached on three classical examples from Bayesian learning: learning the bias of a coin from some observations (as discussed in § 2), learning of the mean of a Gaussian, and Bayesian linear regression. In all the example we will use pseudocode that can be easily mapped to the language presented in § 4.

Theorem 6.1 (Soundness). If G $ e1 „ e2 :: T , then for every valuation θ |ù G we have pJe1 Kθ , Je2 Kθ q P LT Mθ . The soundness theorem above give us a concrete way to reason about f -divergences. Corollary 6.1 (f -divergence). If $ e :: Mf,δ rty :: τ | yŸ “ yŹ us then for every pµ1 , µ2 q P JeK we have ∆f pµ1 , µ2 q ď δ. 9

2016/5/3

JCpφ1 , . . . , φn qKθ “ CpJφ1 Kθ , . . . , Jφn Kθ q

where we use tx :: T |“u as a shorthand for tx :: T | xŸ “ xŹ u. We will use this shorthand all along this section. The bulk of the example is the following function that recursively updates the prior distribution and learns the final distribution over the parameter.

Jf P FKθ “ Jf Kθ P F

’ ’ J∆D f pe1 , e2 q

’ ď δKθ “ ∆Jf Kθ pJe’ 1 Kθ , Je2 Kθ q ď JδKθ

’ ’ ’ Je’ 1 “ e2 Kθ “ Je1 Kθ “ Je2 Kθ

1. let rec learnBias dbn prior = match dbn with 2. | [] Ñ prior 3. | d::dbs Ñ observe 4. (fun r Ñ mlet z = ran bernoulli(r) in munit (d=z)) 5. (learMean dbs prior)

’ ’ ’ Je’ 1 ď e2 Kθ “ Je1 Kθ ď Je2 Kθ Ź J@ px : τ q. φKθ “ dPJτ K JφKθxd Ź J@ px :: T q. φKθ “ pd1 ,d2 qPLT Mθ JφKθ!xŸ ÞÑ d1 ) xŹ ÞÑ d2 Ž JD px : τ q. φKθ “ dPJτ K JφKθxd Ž JD px :: T q. φKθ “ pd1 ,d2 qPLT Mθ JφKθ!xŸ ÞÑ d1 )

The likelihood given in Line 4 is the formal version of the one we presented in § 2. The function learnBias can be typed in different ways depending on what is our goal. For this example we can assign to it the following type:

xŹ ÞÑ d2

where C stands for a C-boolean operator.

tl :: B list |“u Ñ MSD,0 tx :: r0, 1s |“u Ñ MSD,0 tx :: r0, 1s |“u

Figure 8: Relational interpretation of assertions

pd1 , d2 q P LT Mθ

d1 , d2 P Jr τK pd1 , d2 q P Lr τ Mθ

The reading of this type is that if learnBias takes two lists of observations that are equal and two prior that are equal, then we obtain two posterior that are equal. Thanks to this we can type the occurrence of observe in line 3-4 using a trivial assertion. Here we use the SD divergence but in fact this would also hold for any other f P F. In particular, this type allows us to compose it with addNoise using an mlet. This type also reflects the fact that the prior is public. We can then compose these two procedures in the following program:

JφKθ!xŸ ÞÑ d1 ) xŹ ÞÑ d2

pd1 , d2 q P Ltx :: T | φuMθ

f1 , f2 P J|T | Ñ |U |K @pd1 , d2 q P LT Mθ . pf1 pd1 q, f2 pd2 qq P LU Mθ!xŸ ÞÑ d1 ) xŹ ÞÑ d2

1. let main db a b eps = mlet noisyDB = (addNoise db eps) 2. in munit(infer (learnBias noisyDB (ran (beta(a,b)))))

pf1 , f2 q P LΠpx :: T q. U Mθ d1 , d2 P JDrr τ sK τ sMθ pd1 , d2 q P LDrr

µ1 , µ2 P JMr|T |sK Lf,δ pLT Mθ q µ1 µ2 pµ1 , µ2 q P LMf,δ rT sMθ

Notice that in line 2 we use infer for learning from the noised data. We can then assign to main the type tl :: B list | lŸ Φ lŹ u Ñ ta :: R` |“u Ñ tb :: R` |“u Ñ t :: R` |“u Ñ M-D,0 td :: Drr0, 1ss |“u

Figure 9: Relational interpretation of types

7.1

which guarantees us that the result is  differentially private. Notice that the result type is a polymonadic type over Drr0, 1ss. This because we are releasing the symbolic distribution.

Input perturbation

Input perturbation: Beta Learning Let’s start by revisiting the task of inferring the parameter of a Bernoulli distributed random variable given a sequence of private observations. We consider two lists of booleans with the same length in the adjacency relation Φ iff they differ in the value of at most one entry. We want to ensure differential privacy by perturbing the input. A natural way to do this, since the observations are boolean value is by using the exponential mechanism. We can then learn the bias from the perturbed data the parameter. The post-processing property of differential privacy ensures that we will learn the parameter in a private way. Let’s start by considering the quality score function for the exponential mechanism. A natural choice is to consider a function score:boolÑboolÑ{0,1} mapping equal booleans to 1 and different booleans to 0. Remember that the intended reading is that one of the boolean is the one to which we want to give a quality score, while the other is the one provided by the observation. The sensitivity of score is 1. Using this score function we can then create a general function for adding noise to the list of input: 1. 2. 3. 4.

Input perturbation: Dirichlet Distribution Now we can consider a generalization of the previous example. Consider a dice with k faces and suppose the result of rolling it n times is the sequence l “ rl1 , l2 , . . . , ln s. From these observations we want to learn the probability associated with each face of the dice. The likelihood of a specific sequence of independent draws can be described by a multinomial distribution, whose probability mass function is : P rl1 , . . . , lk s “

k ź

řn

pi j“1

Iplj “iq

i“1

where pi is the probability of rolling the i-th face of the die, and where the indicator function Ip¨q is 1 if the input is a true predicate and 0 otherwise. The prior and the posterior can be described by the Dirichlet distribution which is a family of conjugate priors for multinomial distributions. The Dirichlet distribution is a generalization of the beta distribution and is defined on the k ´ 1 dimensional simplex

let rec addNoise db eps = match db with | [] Ñ munit ([]) | y::yl Ñ mlet yn = (expMech eps score y) in mlet yln = (addNoise yl eps) in munit(yn::yln)

S k “ tpx1 , . . . , xk q P Rk |

k ÿ

xi “ 1u

i“1

Its probability density function is

To this function we can give the following type guaranteeing differential privacy. `

tl :: B listu | lŸ Φ lŹ u Ñ t :: R |“u Ñ M-D,0 tb :: B list |“u 10

k 1 ź vi x β p~v q i“1 i

2016/5/3

1. let rec learnMean dbn prior = match dbn with 2. | [] Ñ prior 3. | d::dbs Ñ observe (fun (r: real) Ñ 4. mlet z = ran normal(r, kv) in munit (d=z)) 5. (learnMean dbs prior) in 6. let main db hMean hVar eps delta = 7. mlet noisyDB = (addNoise db eps delta)in 8. munit(infer (learnMean noisyDB 9. (ran (normal(hMean,hVar)))))

where β is the generalized beta function, and ~v “ xv1 , . . . , vk y are the parameters. Similarly to the case of the toss of a coin and the beta distribution, in the Dirichlet case, the posterior given by Bayes theorem differ from the prior for an additional observation which is summed to the corresponding parameter. The goal of this example is to show how to privately learn the posterior distribution over p ~ “ xp1 , . . . , pk y from a list of observations `. We consider two lists `1 , `2 of values to be in the adjacency relation if they differ in at most one entry. We can use a score function similar to the one we used in the previous example which maps equal faces to 1 and different faces to 0: in particular the sensitivity would still be 1. Notice that we may also consider other score functions depending on the definition of accuracy we are interested into, e.g. we could give positive score to faces that are close to the rolled one. To implement this example in PrivInfer we need to slightly extend the grammar of types. We introduce a family of finite types rks for every k which corresponds to the set of values t1, . . . , ku. In our concrete example we will use the case k “ 3. We can use a function addNoise similar to the one we used in the previous example but generalized to multiple parameters. In our case we would have the following type:

Composing them we get the following type guaranteeing p, δqdifferential privacy. tl :: R list | lŸ Φ lŹ u Ñ ta :: R` |“u Ñ tb :: R` |“u Ñ t :: R` |“u Ñ tδ :: R` |“u Ñ M-D,δ td :: DrRs |“u

Input perturbation: Linear Regression We now consider a classical Bayesian example: linear regression. The interested reader can find out more about this example in the book by Bishop (2006). For simplicity we will consider the case of a first degree polynomial in R2 . Moreover, we assume the data consists of a list of real numbers yi with associated a list of given increasing xi . We assume each yi to be generated by the following model: yi “ w1 xi ` w0 ` γˆ , where γˆ is a normally distributed random variable with mean 0 and tl :: r3s listu | lŸ Φ lŹ u Ñ t :: R` |“u Ñ M-D,0 tb :: r3s list |“u. known variance kv which represents a noisy component. We set a joint prior over w “ pw0 , w1 q, namely a normal distribution with The program implementing the model would then be the following: parameters initially set to hMw P R2 and hVw P R4 and we learn the 1. let rec learnP dbn prior = match dbn with posterior distribution over w. For this example we will introduce a 2. | [] Ñ prior new symbolic distribution denoting the bivariate normal distribution 3. | d::dbs Ñ observe whose type is: bivariate ´ normalpe, eq : R2 ˆ R4 Ñ DrR ˆ Rs. 4. (fun r, s Ñ mlet z = ran multinomial(r,s) The first argument of the bivariate normal distribution is the mean, a 5. in munit (d=z)) two components real vector, while the second one denotes the covari6. (learnP dbs prior) ance matrix, in this case a 4 element real vector. Again, we ensure differential privacy by perturbing the input list using the Gaussian As expected, we could type this program as: mechanism with 1 as sensitivity. We consider the same adjacency tl :: r3s list |“u Ñ MSD,0 tx :: r0, 1s2 |“u Ñ relation Φ as in the previous example. The function addNoise is 2 the same function of the previous example. M tx :: r0, 1s |“u SD,0

1. let rec linearLearn db prior = match db with 2. | [] Ñ prior 3. | y::yl Ñ observe(fun ((a,b):(real*real))Ñ 4. mlet z= ran (normal (x_y*a+b) kv) in munit(y=z)) 5. (linearLearn yl prior) in 6. let main db hMw hVw eps delta = 7. mlet dbN = (addNoise db eps delta) in 8. munit (infer (linearLearn dbN 9. (ran (bivariate-normal(hMw, hVw))) 1))

and so we would get a differentially private computation. Notice that the predicate used by observe takes in input only 2 reals, instead of 3: this is is because since r, s are in a 2-dimensional simplex the third one is automatically determined. Input perturbation: Normal Learning An example similar to the previous one is learning the mean of a gaussian distribution with known variance: kv, from a list of real number observations—for instance some medical parameters like the level of LDL of each patient. We consider two lists of reals with the same length adjacent when the `1 distance between at most two elements (in the same position) is bounded by 1. To perturb the input we may now want to use a differen mechanism, for example we could use the Gaussian mechanisms—this may give reasonable results if we expect the data to come from a normal distribution. Also in this case, the sensitivity is 1. The addNoise function is very similar to the one we used in the previous example: 1. 2. 3. 4.

The code is very similar to the one of the previous example. The main difference is in the likelihood at line 4. For simplicity we assume the recursion to be on the list of values yi and that for each y we have an associated x_y. This program can be typed with the following type guaranteeing p, δq-differential privacy: tl :: list R | lŸ Φ lŹ u Ñ ta :: R2 |“u Ñ tb :: R4 |“u Ñ t :: R` |“u Ñ tδ :: R` |“u Ñ M-D,δ td :: D rR ˆ Rs |“u

let rec addNoise db eps delta = match db with | [] Ñ munit ([]) | y::yl Ñ mlet yn = (gaussMech (sigma eps delta) y) in mlet yln = (addNoise yl eps delta) in munit(yn::yln)

7.2

Noise on Output with `1 -norm

We present examples where the privacy guarantee is achieved by adding noise on the output. For doing this we need to compute the sensitivity of the program. In contrast, in the previous section the sensitivity was evident because directly computed on the input. As discussed before we can compute the sensitivity with respect to different metrics. Here we consider the sensitivity computed over the `1 -norm on the parameters of the posterior distribution.

The two differences are that now we also have delta as input and that in line 3 instead of the score function we have a function sigma computing the variance as in Definition 3.4. The inference function become instead the following.

11

2016/5/3

Output parameters perturbation: Beta Learning The main difference with the example in the previous section is that here we add Laplacian noise to the parameters of the posterior.

the program with type tl :: R list | lŸ Φ lŹ u Ñ thM :: R |“u Ñ thV :: R` |“u Ñ tkv :: R` |“u Ñ t :: R` |“u Ñ Ms-D,0 td :: DrRs |“u

1. let main db a b eps= 2. let d = infer (learnBias db (ran beta(a,b))) in 3. let (aP, bP) = getParams d in 4. mlet aPn = lapMech(eps, aP) in 5. mlet bPn = lapMech(eps, bP) in 6. munit beta(aPn, bPn)

7.3

In line 2 we use the function learnBias from the previous section, while in line 4 and 5 we add Laplace noise. The formal sensitivity analysis is based on the fact that the posterior parameters are going to be the counts of true and false in the data respectively summed up to the parameters of the prior. This reasoning is performed on each step of observe. Then we can prove that the `1 -norm sensitivity of the whole program is 2 and type the program with a type guaranteeing 2-differentially privacy.

tl :: B list | lŸ Φ lŹ u Ñ td :: Drτ s |“u Ñ tr :: R | |rŸ ´rŹ | ď ρu

tl :: B list | lŸ Φ lŹ u Ñ ta :: R` |“u Ñ tb :: R` |“u Ñ

where the bound ρ express its sensitivity. Now we can use as a score function the following program

t :: R` |“u Ñ M2-D,0 td :: Drr0, 1ss |“u

score (db, prior) out = -(H (infer (learnBias db prior)) out)

Output parameters perturbation: Dirichlet Learning Also this example differs from the Dirichlet example of the previous section in the fact that we add noise after the inference has been performed. Similarly to the 2 dimensional case, also in this case the `1 -norm sensitivity of the inference process is 2. Notice that even if we have k classes of observations a change in one row can always affect at most 2 counts by 1. The program in this case would be the following:

This quality score uses a function H computing the Hellinger distance between the result of the inference and a potential output to assign it a score. The closer out is to the real distribution (using Hellinger distance), the higher the scoring is. If we use the exponential mechanism with this score we achieve our goal of using the Hellinger to “calibrate the noise”. Indeed we have a program:

1. let main db a b c eps= 2. let d = infer (learnP db (ran dirichlet(a,b,c))) in 3. let (aP, bP, cP) = getParams d in 4. mlet aPn = lapMech(eps, aP) in 5. mlet bPn = lapMech(eps, bP) in 6. mlet cPn = lapMech(eps, cP) in 7. munit dirichlet(aPn, bPn, cPn)

let main prior obs eps = expMech score (prior, obs) eps To which we can assign type: MHD,0 tx :: r0, 1s |“u Ñ t` :: B list | `Ÿ Φ `Ź u Ñ t :: R` |“u Ñ Mρ-D,0 td :: Drr0, 1ss |“u Concretely, to achieve this we can proceed by considering first the code for learnBias:

to which we can assign the type:

1.let rec learnBias db prior = match dbn with 2.| [] Ñ prior 3.| d::dbs Ñ mlet rec = (learBias dbs prior) in observe 4. (fun r Ñ mlet z = ran bernoulli(r) in munit (d=z)) rec

tl :: r3s list | lŸ Φ lŹ u Ñ `

`

Noise on Output using f -divergences

We now turn to the approach of calibrating the sensitivity according to f -divergences. We will consider once again the example for learning privately the distribution over the parameter of a Bernoulli distribution, but differently from the previous section we will add noise to the output of the inference algorithm using the exponential mechanism with a score function using an f -divergence. So, we perturb the output distribution and not its parameters. We will use Hellinger distance as a metric over the output space of our differentially private program, but any other f -divergence could also be used. The quality score function for the exponential mechanism can be given a type of the shape:

`

ta :: R |“u Ñ tb :: R |“u Ñ tc :: R |“u Ñ t :: R` |“u Ñ M2-D,0 td :: Drr0, 1s2 s |“u

To have a bound for the whole learnBias we need first to give a bound to the difference in Hellinger distance that two distinct observations can generate. This is described by the following lemma.

which provide as guarantee 2 differential privacy. Output parameters perturbation: Normal Learning For this example we use the same adjacency relation of the example with noise on the input where in particular the number of observation n is public knowledge. The code is very similar to the previous one.

Lemma 7.1. Let d1 , d2 : B with d1 Φd2 . Let a, b P R` . Let Prpξq “ Betapa, bq. Then ∆HD pPrpξ | d1 q, Prpξ | d2 qq ď a 1 ´ π4 “ ρ.

1.let main db hM hV kV eps = 2. let mDistr = infer (learnMean db (ran normal(hM,kV))) in 3. let mean = getMean mDistr in 4. mlet meanN = lapMech(eps/s mean) in 5. let d = normal(meanN, uk) in munit(d)

Using this lemma we can then type the observe statement with the bound ρ. We still need to propagate this bound to the whole learnBias. We can do this by using the adjacency relation which imposes at most one difference in the observations, and the data processing inequality Theorem 3.1 guaranteeing that for equal observations the Hellinger distance cannot increase. Summing up, using the lemma above, the adjacency assumption and the data processing inequality we can give to learnBias the following type:

` ˘´1 where uk “ hV1 2 ` kvn2 . Notice that we only add noise to the posterior mean parameter and not to the posterior variance parameter since the latter doesn’t depend on the data but ony on pblic information. The difficulty for verifying this example is in the sensitivity analysis. By some calculations this can be bound hV by s “ kv`hV where kv is the known variance of the gaussian distribution whose mean we are learning and hV is the prior variance over the mean. We use this information in line 4 when we add noise with the Laplace mechanism. By using this information we can type

tl :: B list | lŸ Φ lŹ u Ñ MHD,0 tx :: r0, 1s |“u Ñ MHD,ρ tx :: r0, 1s |“u This ensures that starting from the same prior and observing l1 and l2 in the two different runs such that l1 Φ l2 we can achieve two beta distributions which are at distance at most ρ. Using some additional

12

2016/5/3

refinement for infer and H we can guarantee that score has the intended type, and so we can guarantee that overall this program is pρ, 0q-differential privacy. The reasoning above is not limited to the Hellinger distance. For instance the following lemma:

the model and the prior distribution. The final approach involves no noise during inference, but outputs samples from the posterior rather than the entire posterior distribution (Dimitrakakis et al. 2014; Zhang et al. 2016; Zheng 2015). This last approach is highly specific to the model and prior, and our system does not handle this method of achieving privacy, yet.

Lemma 7.2. Let d1 , d2 : B with d1 Φd2 . Let a, b P R` . Let Prpξq “ Betapa, bq. Then ∆SD pPrpξ | d1 q, Prpξ | d2 qq ď a 2p1 ´ π4 q “ ζ.

Formal verification for differential privacy. In parallel with the development of private algorithms, researchers in formal verification have proposed a wide variety of techniques for verifying differential privacy. For a comprehensive discussion, interested readers can consult the recent survey by Barthe et al. (2016b). Many of these techniques rely on the composition properties of privacy, though there are some exceptions (Barthe et al. 2016a). For a brief survey, the first systems were based on runtime verification of privacy (McSherry 2009). The first systems for static verification of privacy used linear type systems (Reed and Pierce 2010; Gaboardi et al. 2013). There is also extensive work on relational program logics for differential privacy (Barthe et al. 2012, 2013; Barthe and Olmedo 2013), and techniques for verifying privacy in standard Hoare logic using product programs (Barthe et al. 2014b). None of these techniques have been applied to verifying differential privacy of Bayesian inference. Our system is most closely related to HOARe2 , a relational refinement type system that was recently proposed by Barthe et al. (2015b). This system has been used for verifying differential privacy of algorithms, and more general relational properties like incentive compatibility from the field of mechanism design. We extend this approach to also address Bayesian inference, which was not supported by HOARe2 .

gives a type in term of statistical distance: tl :: B list | lŸ Φ lŹ u Ñ MSD,0 tx :: r0, 1s |“u Ñ MSD,ζ tx :: r0, 1s |“u The choice of which metric to use is ultimately left to the user. This example easily extends also to the Dirichlet example. Indeed, Lemma 7.1 can be generalized to arbitrary Dirichlet distributions: Lemma 7.3. Let k P Ně2 , d1 , d2 : rks list with d1 Φd2 . Let a1 , a2 , . . . , ak P R` . Let Prpξq “ a Dirichletpa1 , a2 , . . . , ak q. Then ∆HD pPrpξ | d1 q, Prpξ | d2 qq ď 1 ´ π4 “ ρ. Using this lemma we can assign to the following program: 1.let rec learnP db prior = match dbn with 2.| [] Ñ prior 3.| d::dbs Ñ mlet rec = (learnP dbs prior) in observe 4. (fun r sÑ mlet z = ran multinomial(r,s) in 5. munit (d=z)) rec

the type: tl :: r3s list | lŸ Φ lŹ u Ñ MHD,0 tx :: r0, 1s2 |“u Ñ MHD,ρ tx :: r0, 1s2 |“u

Probabilistic programming. Research in probabilistic programming has emerged early in the 60s and 70s, and is nowadays a very active research area. There is a long tradition of research in the formal verification of probabilistic programs starting from early works by Ramshaw (1979); Sharir et al. (1984); Kozen (1985) to more recent works by Morgan et al. (1996); McIver and Morgan (2005); Katoen et al. (2010); Gretz et al. (2013); Barthe et al. (2014a); Gretz et al. (2015). Our work can be seen as a further contribution in this direction. Among these works, the most related are Barthe et al. (2014a) and Gretz et al. (2015). The former present a relational type system for verifying cryptographic protocols. Our work can be seen as an extension of their work for Bayesian inference and differential privacy. Gretz et al. (2015) extends the approach of probabilistic predicate transformer of pGCL (McIver and Morgan 2005) to account for conditional distributions and observe. Our approach differs from their in the fact that we consider a functional language, while their language is imperative, in the fact that we consider relational reasoning, and in our focus on differential privacy. Relevant to our work is also the research in probabilistic programming for machine learning and statistics which has been very active in recent years. Many probabilistic programming languages have been designed for these applications, including WinBUGS (Lunn et al. 2000), IBAL (Pfeffer 2001) Church (Goodman et al. 2008), Infer.net (Minka et al. 2012), Tabular (Gordon et al. 2014), Anglican (Tolpin et al. 2015), Dr. Bayes (Toronto et al. 2015). We translated in our framework Tabular (Gordon et al. 2014) but also other languages could be translated in a similar way. Our goal is not to provide a new language but instead is to propose a framework where one can reason about differential privacy for such languages. Another related work is the one by Adams and Jacobs (2015) proposing a type theory for Bayesian inference. While technically their work is very different from our it share the same goal of providing reasoning principles for Bayesian inference.

Similarly to the previous example we can now add noise to the output of the inference process using the sensitivity with respect to the Hellinger distance and obtain a pρ, 0q-differential privacy guarantee.

8.

Related work

Differential privacy and Bayesian inference. Our system targets programs from the combination of differential privacy and Bayesian inference. Both of these topics are active areas of research, and their intersection is an especially popular research direction today. We briefly summarize the most well-known work, and refer interested readers to surveys for a more detailed development (Dwork and Roth (2014) for differential privacy, Bishop (2006) for Bayesian inference). Blum et al. (2005) and Dwork et al. (2006) proposed differential privacy, a worst-case notion of statistical privacy, in a pair of groundbreaking papers, initiating intense research interest in developing differentially private algorithms. The original works propose the Laplace and Gaussian mechanisms that we use, while the seminal paper of McSherry and Talwar (2007) introduces the Exponential mechanism. Recently, researchers have investigated how to guarantee differential privacy when performing Bayesian inference, a foundational technique in machine learning. Roughly speaking, works in the literature have explored three different approaches to guaranteeing differential privacy when the samples are private data. First, we may add noise directly to the samples, and then perform inference as usual (Williams and McSherry 2010). Second, we may perform inference on the private data, then add noise to the parameters themselves (Zhang et al. 2016). This approach requires bounding the sensitivity of the output parameters when we change a single data sample, relying on specific properties of

13

2016/5/3

9.

Conclusion

A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the sulq framework. In Proceedings of the Twenty-fourth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, June 13-15, 2005, Baltimore, Maryland, USA, pages 128–138, 2005. doi: 10.1145/1065167.1065184. URL http://doi.acm.org/10.1145/ 1065167.1065184.

We have presented PrivInfer, a type-based framework for differentially private Bayesian inference. Our framework allows to write data analysis as functional programs for Bayesian inference and add to noise to them in different ways using different metrics. Besides, our framework allows to reason about general f -divergences for Bayesian inference. Future directions include exploring the use of this approach to guarantee robustness for Bayesian inference and other machine learning techniques (Dey and Birmiwal 1994), to ensure differential privacy using conditions over the prior and the likelihood similar to the ones studied by Zheng (2015); Zhang et al. (2016), and investigating further uses of f -divergences for improving the utility of differentially private Bayesian learning. On the programming language side it would also be interesting to extend our framework to continuous distributions following the approach by Sato (2016). We believe that the intersection of programming languages, machine learning, and differential privacy will reserve us many exciting results.

K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. JMLR, 12, 2011. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1953048.2021036. I. Csiszár. Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizitat von markoffschen ketten. Magyar. Tud. Akad. Mat. Kutató Int. Közl, 8:85–108, 1963. I. Csiszár and P. Shields. Information theory and statistics: A tutorial. Foundations and Trends˝o in Communications and Information Theory, 1 (4):417–528, 2004. ISSN 1567-2190. doi: 10.1561/0100000004. URL http://dx.doi.org/10.1561/0100000004. D. K. Dey and L. R. Birmiwal. Robust bayesian analysis using divergence measures. Statistics & Probability Letters, 20(4):287 – 294, 1994. ISSN 0167-7152. doi: http://dx.doi.org/10.1016/0167-7152(94)90016-7. URL http://www.sciencedirect.com/science/article/pii/ 0167715294900167.

References

C. Dimitrakakis, B. Nelson, A. Mitrokotsa, and B. I. P. Rubinstein. Robust and private bayesian inference. In Algorithmic Learning Theory - 25th International Conference, ALT 2014, Bled, Slovenia, October 8-10, 2014. Proceedings, pages 291–305, 2014. doi: 10.1007/978-3-319-11662-4_ 21.

R. Adams and B. Jacobs. A Type Theory for Probabilistic and Bayesian Reasoning. ArXiv e-prints, Nov. 2015. G. Barthe and F. Olmedo. Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs. In Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part II, pages 49–60, 2013. doi: 10.1007/978-3-642-39212-2_8. URL http://dx.doi.org/10.1007/978-3-642-39212-2_8. G. Barthe, B. Köpf, F. Olmedo, and S. Zanella-Béguelin. Probabilistic relational reasoning for differential privacy. In POPL, 2012. G. Barthe, G. Danezis, B. Grégoire, C. Kunz, and S. Zanella Béguelin. Verified computational differential privacy with applications to smart metering. In CSF, pages 287–301, 2013. G. Barthe, C. Fournet, B. Grégoire, P.-Y. Strub, N. Swamy, and S. Zanella-Béguelin. Probabilistic relational verification for cryptographic implementations. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), pages 193– 206, 2014a. URL http://research.microsoft.com/en-us/um/ people/nswamy/papers/rfstar.pdf. G. Barthe, M. Gaboardi, E. J. Gallego Arias, J. Hsu, C. Kunz, and P.-Y. Strub. Proving differential privacy in Hoare logic. In IEEE Computer Security Foundations Symposium (CSF), Vienna, Austria, 2014b. URL http://arxiv.org/abs/1407.2988.

C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211– 407, 2014. doi: 10.1561/0400000042. URL http://dx.doi.org/10. 1561/0400000042. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, pages 265–284, 2006. doi: 10.1007/ 11681878_14. URL http://dx.doi.org/10.1007/11681878_14. C. Dwork, G. N. Rothblum, and S. P. Vadhan. Boosting and differential privacy. In FOCS 2010. IEEE, 2010. URL http://dx.doi.org/10. 1109/FOCS.2010.12. H. Ebadi, D. Sands, and G. Schneider. Differential privacy: Now it’s getting personal. In POPL 2015, pages 69–81, 2015. ISBN 978-1-4503-3300-9. URL http://dl.acm.org/citation.cfm?id=2676726. F. Eigner and M. Maffei. Differential privacy by typing in security protocols. In CSF, pages 272–286. IEEE, 2013. M. Gaboardi, A. Haeberlen, J. Hsu, A. Narayan, and B. C. Pierce. Linear dependent types for differential privacy. In POPL, pages 357–370. ACM, 2013.

G. Barthe, T. Espitau, B. Grégoire, J. Hsu, L. Stefanesco, and P.-Y. Strub. Relational reasoning via probabilistic coupling. In International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR), volume 9450, pages 387–401, 2015a. URL http: //arxiv.org/abs/1509.03476. G. Barthe, M. Gaboardi, E. J. G. Arias, J. Hsu, A. Roth, and P. Strub. Higherorder approximate relational refinement types for mechanism design and differential privacy. In Proceedings of the 42nd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, pages 55–68, 2015b. doi: 10.1145/2676726.2677000. URL http://doi.acm.org/10.1145/ 2676726.2677000. G. Barthe, M. Gaboardi, B. Grégoire, J. Hsu, and P.-Y. Strub. Proving differential privacy via probabilistic couplings. 2016a. URL http: //arxiv.org/abs/1601.05047.

N. D. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: a language for generative models. In UAI, pages 220–229, 2008. A. D. Gordon, M. Aizatulin, J. Borgström, G. Claret, T. Graepel, A. V. Nori, S. K. Rajamani, and C. V. Russo. A model-learner pattern for bayesian reasoning. In The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13, Rome, Italy - January 23 - 25, 2013, pages 403–416, 2013. doi: 10.1145/2429069.2429119. URL http://doi.acm.org/10.1145/2429069.2429119. A. D. Gordon, T. Graepel, N. Rolland, C. V. Russo, J. Borgström, and J. Guiver. Tabular: a schema-driven probabilistic programming language. In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January 20-21, 2014, pages 321–334, 2014. doi: 10.1145/2535838.2535850. URL http://doi.acm.org/10.1145/2535838.2535850.

G. Barthe, M. Gaboardi, J. Hsu, and B. Pierce. Programming language techniques for differential privacy. ACM SIGLOG News, 3(1):34–53, Feb. 2016b. ISSN 2372-3491. doi: 10.1145/2893582.2893591. URL http://doi.acm.org/10.1145/2893582.2893591. C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. ISBN 0387310738.

F. Gretz, J.-P. Katoen, and A. McIver. Prinsys – on a quest for probabilistic loop invariants. In International Conference on Quantitative Evaluation of Systems (QEST), pages 193–208, 2013. F. Gretz, N. Jansen, B. L. Kaminski, J.-P. Katoen, A. McIver, and F. Olmedo. Conditioning in probabilistic programming. In Mathematical Foundations of Programming Semantics (MFPS), 2015.

14

2016/5/3

M. Hardt, K. Ligett, and F. McSherry. A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pages 2348–2356, 2012. URL http://books.nips.cc/papers/files/ nips25/NIPS2012_1143.pdf.

Joint Conferences on Theory and Practice of Software, ETAPS 2015, London, UK, April 11-18, 2015. Proceedings, pages 53–79, 2015. doi: 10. 1007/978-3-662-46669-8_3. URL http://dx.doi.org/10.1007/ 978-3-662-46669-8_3. O. Williams and F. McSherry. Probabilistic inference and differential privacy. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada., pages 2451–2459, 2010. URL http://papers.nips.cc/paper/

M. Hicks, G. M. Bierman, N. Guts, D. Leijen, and N. Swamy. Polymonadic programming. In P. Levy and N. Krishnaswami, editors, Proceedings 5th Workshop on Mathematically Structured Functional Programming, MSFP 2014, Grenoble, France, 12 April 2014, volume 153 of EPTCS, pages 79–99, 2014. URL http://dx.doi.org/10.4204/EPTCS.153.

3897-probabilistic-inference-and-differential-privacy.

J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao. Privbayes: Private data release via bayesian networks. In SIGMOD ’14, pages 1423–1434. ACM, 2014. Z. Zhang, B. I. P. Rubinstein, and C. Dimitrakakis. On the differential privacy of bayesian inference. In Thirtieth AAAI Conference on Artificial Intelligence, AAAI-16, 2016. To appear. S. Zheng. The differential privacy of Bayesian inference, 2015. URL http: //nrs.harvard.edu/urn-3:HUL.InstRepos:14398533. Bachelor’s thesis, Harvard College.

J.-P. Katoen, A. McIver, L. Meinicke, and C. Morgan. Linear-invariant generation for probabilistic programs. In R. Cousot and M. Martel, editors, International Symposium on Static Analysis (SAS), Perpignan, France, volume 6337 of Lecture Notes in Computer Science, pages 390– 406. Springer, 2010. S. Katsumata. Parametric effect monads and semantics of effect systems. In S. Jagannathan and P. Sewell, editors, The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January 20-21, 2014, pages 633–646. ACM, 2014. ISBN 978-1-4503-2544-8. URL http: //dl.acm.org/citation.cfm?id=2535838. D. Kozen. A probabilistic PDL. J. Comput. Syst. Sci., 30(2):162–178, 1985. D. J. Lunn, A. Thomas, N. Best, and D. Spiegelhalter. Winbugs - A bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10(4):325–337, 2000. doi: 10.1023/A:1008929526011. URL http://dx.doi.org/10.1023/A:1008929526011. A. McIver and C. Morgan. Abstraction, Refinement, and Proof for Probabilistic Systems. Monographs in Computer Science. Springer, 2005. F. McSherry. Privacy integrated queries: an extensible platform for privacypreserving data analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009, pages 19–30, 2009. doi: 10.1145/1559845.1559850. URL http://doi.acm.org/10.1145/ 1559845.1559850. F. McSherry and K. Talwar. Mechanism design via differential privacy. In IEEE Symposium on Foundations of Computer Science (FOCS), Providence, Rhode Island, pages 94–103, 2007. URL http://doi. ieeecomputersociety.org/10.1109/FOCS.2007.41. T. Minka, J. Winn, J. Guiver, and D. Knowles. Infer.NET 2.5, 2012. MSR. http://research.microsoft.com/infernet. C. Morgan, A. McIver, and K. Seidel. Probabilistic predicate transformers. ACM Transactions on Programming Languages and Systems, 18(3):325– 353, 1996. A. Pfeffer. IBAL: A probabilistic rational programming language. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, IJCAI 2001, Seattle, Washington, USA, August 4-10, 2001, pages 733–740, 2001. L. H. Ramshaw. Formalizing the Analysis of Algorithms. PhD thesis, Stanford University, 1979. J. Reed and B. C. Pierce. Distance makes the types grow stronger: A calculus for differential privacy. In ICFP’10. ACM, 2010. T. Sato. Approximate Relational Hoare Logic for Continuous Random Samplings. ArXiv e-prints, Mar. 2016. M. Sharir, A. Pnueli, and S. Hart. Verification of probabilistic programs. SIAM Journal on Computing, 13(2):292–314, 1984. doi: 10.1137/ 0213021. URL http://dx.doi.org/10.1137/0213021. D. Tolpin, J. van de Meent, and F. Wood. Probabilistic programming in anglican. In Machine Learning and Knowledge Discovery in Databases European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part III, pages 308–311, 2015. doi: 10. 1007/978-3-319-23461-8_36. URL http://dx.doi.org/10.1007/ 978-3-319-23461-8_36. N. Toronto, J. McCarthy, and D. V. Horn. Running probabilistic programs backwards. In Programming Languages and Systems - 24th European Symposium on Programming, ESOP 2015, Held as Part of the European

15

2016/5/3