An Optimal Algorithm for Monte Carlo Estimation

3 downloads 0 Views 278KB Size Report
variable Z is maximized subject to a xed mean Z if Z takes on value 1 with probability. Z and 0 ... approximate Z with absolute error with probability at least 1 ? 2].
An Optimal Algorithm for Monte Carlo Estimation Paul Dagum 

Richard Karp y

Michael Luby z

Sheldon Ross x

Abstract A typical approach to estimate an unknown quantity  is to design an experiment that produces a random variable Z distributed in [0; 1] with E[Z ] = , run this experiment independently a number of times and use the average of the outcomes as the estimate. In this paper, we consider the case when no a priori information about Z is known except that is distributed in [0; 1]. We describe an approximation algorithm AA which, given  and , when running independent experiments with respect to any Z , produces an estimate that is within a factor 1 +  of  with probability at least 1 ? . We prove that the expected number of experiments run by AA (which depends on Z ) is optimal to within a constant factor for every Z . An announcement of these results appears in P. Dagum, D. Karp, M. Luby, S. Ross, \An optimal algorithm for Monte-Carlo Estimation (extended abstract)", Proceedings of the Thirtysixth IEEE Symposium on Foundations of Computer Science, 1995, pp. 142-149 [3].  Section on Medical Informatics, Stanford University School of Medicine, Stanford, CA 94305-5479.

Research supported in part by National Science Foundation operating grant IRI-93-11950. email: [email protected] y International Computer Science Institute, Berkeley, CA 94704. email: [email protected] z International Computer Science Institute, Berkeley, CA, and Computer Science Division, University of California at Berkeley, Berkeley, CA. Research supported in part by National Science Foundation operating grant CCR-9304722 and NCR-9416101, United States-Israel Binational Science Foundation grant No. 92-00226, and ESPRIT BR Grant EC-US 030. email: [email protected] x Department of Industrial Engineering and Operations Research, University of California at Berkeley, Berkeley, CA. Research supported in part by operating grant DMS-9401834. email: [email protected]

1

Key Words: stopping rule, approximation algorithm, Monte Carlo estimation AMS Classi cations: sequential estimation, stochastic approximation Abbreviated Title: Optimal Monte Carlo Estimation

2

1 Introduction The choice of experiment, or experimental design, forms an important aspect of statistics. One of the simplest design problems is the problem of deciding when to stop sampling. For example, suppose Z1 ; Z2 ; : : : are independently and identically distributed according to Z in the interval [0; 1] with mean Z . From Bernstein's inequality, we know that if N is xed proportional to ln(1=)=2 and S = Z1 +    + ZN , then with probability at least 1 ? , S=N approximates Z with absolute error . Often, however, Z is small and a good absolute error estimate of Z is typically a poor relative error approximation of Z . We say ~Z is an (; )-approximation of Z if

Pr[Z (1 ? )  ~Z  Z (1 + )]  1 ? : In engineering and computer science applications we often desire an (; )-approximation of Z in problems where exact computation of Z is NP-hard. For example, many researchers have devoted substantial e ort to the important and dicult problem of approximating the permanent of 0 ? 1 valued matrices [1, 4, 5, 9, 10, 13, 14]. Researchers have also used (; )-approximations to tackle many other dicult problems, such as approximating probabilistic inference in Bayesian networks [6], approximating the volume of convex bodies [7], solving the Ising model of statistical mechanics [11], solving for network reliability in planar multiterminal networks [15, 16], approximating the number of solutions to a DNF formula [17] or, more generally, to a GF[2] formula [18], and approximating the number of Eulerian orientations of a graph [19]. De ne  = (e ? 2)  :72

 = 4 ln(2=)=2 and let Z2 denote the variance of Z . De ne

Z = maxfZ2 ; Z g:

3

We rst prove a slight generalization of the Zero-One Estimator Theorem [12, 15, 16, 17]. The new theorem, the Generalized Zero-One Estimator Theorem, proves that if N =   Z =2Z

(1)

then S=N is an (; )-approximation of Z . To apply the Generalized Zero-One Estimator Theorem we require the values of the unknown quantities Z and Z2 . Researchers circumvent this problem by computing an upper bound  on Z =2Z , and using  in place of Z =2Z to determine a value for N in Equation (1). An a priori upper bound  on Z =2Z that is close to Z =2Z is often very dicult to obtain, and a poor bound leads to a prohibitive large bound on N . To avoid the problem encountered with the Generalized Zero-One Estimator Theorem, we use the outcomes of previous experiments to decide when to stop iterating. This approach is known as sequential analysis and originated with the work of Wald on statistical decision theory [22]. Related research has applied sequential analysis to speci c Monte Carlo approximation problems such as estimating the number of points in a union of sets [17] and estimating the number of self-avoiding walks [20]. In other related work, Dyer et al describe a stopping rule based algorithm that provides an upper bound estimate on Z [8]. With probability 1 ? , the estimate is at most (1 + )Z , but the estimate can be arbitrarily smaller than  in the challenging case when  is small. We rst describe an approximation algorithm based on a simple stopping rule. Using the stopping rule, the approximation algorithm outputs an (; )-approximation of Z after expected number of experiments proportional to =Z . The variance of the random variable Z is maximized subject to a xed mean Z if Z takes on value 1 with probability Z and 0 with probability 1 ? Z . In this case, Z2 = Z (1 ? Z )  Z , and the expected number of experiments run by the stopping-rule based algorithm is within a constant factor of optimal. In general, however, Z2 is signi cantly smaller than Z , and for small values of Z2 the stopping-rule based algorithm performs 1= times as many experiments as the optimal number. We describe a more powerful algorithm, the AA algorithm, that on inputs , , and independently and identically distributed outcomes Z1 ; Z2 ; ::: generated from any random variable Z distributed in [0; 1], outputs an (; )-approximation of Z after an expected 4

number of experiments proportional to   Z =2Z . Unlike the simple, stopping-rule based algorithm, we prove that for all Z , AA runs the optimal number of experiments to within a constant factor. Speci cally, we prove that if BB is any algorithm that produces an (; )-approximation of Z using the inputs , , and Z1 ; Z2 ; :::, then BB runs an expected number of experiments proportional to at least   Z =2Z . (Canetti, Evan and Goldreich prove the related lower bound (ln(1=)=2 ) on the number of experiments required to approximate Z with absolute error  with probability at least 1 ?  [2].) Thus we show that for any random variable Z , AA runs an expected number of experiments that is within a constant factor of the minimum expected number. The AA algorithm is a general method for optimally using the outcomes of MonteCarlo experiments for approximation|that is, to within a constant factor, the algorithm uses the minimum possible number of experiments to output an (; )-approximation on each problem instance. Thus, AA provides substantial computational savings in applications that employ a poor upper bound  on Z =2Z . For example, the best known a priori bound on  for the problem of approximating the permanent of size n is superpolynomial in n [13]. Yet, for many problem instances of size n, the number of experiments run by AA is signi cantly smaller than this bound. Other examples exist where the bounds are also extremely loose for many typical problem instances [7, 10, 11]. In all those applications, we expect AA to provide substantial computational savings, and possibly render problems that were intractable, because of the poor upper bounds on Z =2Z , amenable to ecient approximation.

2 Approximation Algorithm In Subsection 2.1, we describe a stopping rule algorithm for estimating Z . This algorithm is used in the rst step of the approximation algorithm AA that we describe in Subsection 2.2.

5

2.1 Stopping Rule Algorithm Let Z be a random variable distributed in the interval [0; 1] with mean Z . Let Z1 ; Z2 ; : : : be independently and identically distributed according to Z .

Stopping Rule Algorithm Input Parameters: (; ) with 0 <  < 1,  > 0. Let 1 = 1 + (1 + ).

Initialize N 0, S 0 While S < 1 do: N N + 1; S

Output: ~Z

S + ZN

1 =N

Stopping Rule Theorem: Let Z be a random variable distributed in [0; 1] with Z = E[Z ] > 0. Let ~Z be the estimate produced and let NZ be the number of experiments that the Stopping Rule Algorithm runs with respect to Z on input  and . Then,

(1) Pr[Z (1 ? )  ~Z  Z (1 + )] > 1 ?  (2) E[NZ ]  1 =Z The proof of this theorem can be found in Section 5.

2.2 Approximation Algorithm AA The (; )-approximation algorithm AA consists of three main steps. The rst step uses the stopping rule based algorithm to produce an estimate ^Z that is within a constant factor of Z with probability at least 1 ? . The second step uses the value of ^Z to set the number of experiments to run in order to produce an estimate ^Z that is within a constant factor of  with probability at least 1 ? . The third step uses the values of ^Z 6

and ^Z produced in the rst two steps to set the number of experiments and runs this number of experiments to produce an (; )-estimate of ~Z of Z . Let Z be a random variable distributed in the interval [0; 1] with mean Z and variance Z2 . Let Z1 ; Z2 ; : : : and Z10 ; Z20 ; : : : denote two sets of random variables independently and identically distributed according to Z .

Approximation Algorithm AA Input Parameters: (; ), with 0 <   1 and 0 <   1. p p Let 2 = 2(1 + )(1 + 2 )(1 + ln( 23 )= ln( 2 ))  2 for small  and . Step 1: Run the Stopping Rule Algorithm using Z1 ; Z2 : : : with input parameters p minf1=2; g and =3. This produces an estimate ^Z of Z . Step 2: Set N = 2  =^Z and initialize S 0. For i = 1; : : : ; N do: S S + (Z20 i?1 ? Z20 i )2 =2. ^Z maxfS=N; ^Z g. Step 3: Set N = 2  ^Z =^2Z and initialize S

For i = 1; : : : ; N do: S ~Z S=N .

S + Zi .

0.

Output: ~Z AA Theorem: Let Z be any random variable distributed in [0; 1], let Z = E[Z ] > 0 be the mean of Z , Z2 be the variance of Z , and Z = maxfZ2 ; Z g. Let ~Z be the approximation produced by AA and let NZ be the number of experiments run by AA

with respect to Z on input parameters  and . Then,

(1) Pr[Z (1 ? )  ~Z  Z (1 + )]  1 ? , 7

(2) There is a universal constant c0 such that Pr[NZ  c0   Z =2Z ]  . (3) There is a universal constant c0 such that E[NZ ]  c0Z =2Z . We prove the AA Theorem in Section 6.

3 Lower Bound Algorithm AA is able to produce a good estimate of Z using no a priori information about Z . An interesting question is what is the inherent number of experiments needed to be able to produce an (; )-approximation of Z . In this section, we state a lower bound on the number of experiments needed by any (; )-approximation algorithm to estimate Z when there is no a priori information about Z . This lower bound shows that, to within a constant factor, AA runs the minimum number of experiments for every random variable Z . To formalize the lower bound, we introduce the following natural model. Let BB be any algorithm that on input (; ) works as follows with respect to Z . Let Z1 ; Z2 ; : : : denote independently and identically distributed according to Z with values in the interval [0; 1]. BB runs an experiment, and on the N th run BB receives the value ZN . The measure of the running time of BB is the number of experments it runs, i.e., the time for all other computations performed by BB is not counted in its running time. BB is allowed to use any criteria it wants to decide when to stop running experiments and produce an estimate, and in particular BB can use the outcome of all previous experiments. The estimate that BB produces when it stops can be any function of the outcomes of the experiments it has run up to that point. The requirement on BB is that is produces an (; )-approximation of Z for any Z . This model captures the situation where the algorithm can only gather information about Z through running random experiments, and where the algorithm has no a priori knowledge about the value of Z before starting. This is a reasonable pair of assumptions for practical situations. It turns out that the assumption about a priori knowledge can be substantially relaxed: the algorithm may know a priori that the outcomes are being 8

generated according to some known random variable Z or to some closely related random variable Z 0 , and still the lower bound on the number of experiments applies. Note that the approximation algorithm AA ts into this model, and thus the average number of experiments it runs with respect to Z is minimal for all Z to within a constant factor among all such approximation algorithms. Lower Bound Theorem: Let BB be any algorithm that works as described above on input (; ). Let Z be a random variable distributed in [0; 1], let Z be the mean of Z , Z2 be the variance of Z , and Z = maxfZ2 ; Z g. Let ~Z be the approximation produced by BB and let NZ be the number of experiments run by BB with respect to Z . Suppose that

BB has the the following properties:

(1) For all Z with Z > 0, E[NZ ] < 1. (2) For all Z with Z > 0, Pr[Z (1 ? )  ~Z  Z (1 + )] > 1 ? : Then, there is a universal constant c > 0 such that for all Z , E[NZ ]  c  Z =2Z .

We prove this theorem in Section 7.

4 Preliminaries for the Proofs We begin with some notation that is used hereafter. Let 0 = 0 and for k > 0 let k =

k X i=1

(Zi ? Z )

(2)

For xed ;  0, we de ne the random variables k+ = k ? ? k;

(3)

k? = ?k ? ? k:

(4)

9

The main lemma we use to prove the rst part of the Stopping Rule Theorem provides bounds on the probabilities that the random variables k+ and k? are greater than zero. We rst form the sequences of random variables ed ; ed ; : : : and ed ? ; ed ? ; : : : for any real valued d. We prove that these sequences are supermartingales when 0  d  1 and  dZ2 , i.e., for all k > 0 + 0

+ 1

0

1

E[edk jedk? ; : : : ; ed ]  edk? ; +

and similarly,

+

+ 0

1

+

1

E[edk? jedk?? ; : : : ; ed ? ]  edk?? : 1

1

0

We then use properties of supermartingales to bound the probabilities that the random variables k+ and k? are greater than zero. For these and subsequent proofs, we use the following two inequalities:

Inequality 4.1 For all , e  1 + . Inequality 4.2 Let  = (e ? 2)  :72. For all with j j  1, 1 + + 2 =(2 + )  e  1 + +   2 :

Lemma 4.3 For jdj  1, E[edZ ]  edZ +d Z . 2 2

Proof: Observe that E [edZ ] = edZ E [ed(Z ?Z ) ]. But from Inequality 4.2,

ed(Z ?Z )  1 + d(Z ? Z ) + d2 (Z ? Z )2 : Taking expectations and applying Inequality 4.1 completes the proof.

2

Lemma 4.4 For 0  d  1, and for  dZ2 , the sequences of random variables

ed ; ed ; : : : and ed ? ; ed ? ; : : : form supermartingales. + 0

+ 1

1

0

Proof: For k  1,

and thus,

edk = edk?  e?d  ed(k ?k? ) ; +

+

1

1

E[edk jedk? ; : : : ; ed ] = edk?  e?d E[ed(k ?k? )]: +

+

1

+

+ 0

10

1

1

Similarly, for k  1

E[edk? jedk?? ; : : : ; ed ? ] = edk??  e?d E[e?d(k ?k? )]: 1

1

1

0

But k ? k?1 = Zk ? Z and thus from Lemma 4.3,

E[ed(k ?k? ) ]  ed Z ; 2

1

and

2

E[e?d(k ?k? )]  ed Z : 2 2

1

Thus, for  dZ2 ,

E[edk jedk? ; : : : ; ed ]  edk?  ed(dZ ? )  edk? ; +

and

+

1

+

+ 0

2

1

+

1

E[edk? jedk?? ; : : : ; ed ? ]  edk??  ed(dZ ? )  edk?? : 1

2

1

0

1

2

Lemma 4.5 follows directly from the properties of conditional expectations and of martingales.

Lemma 4.5 If 0; : : : ; k is a supermartingale then for all 0  i  k E[ij0 ]  0 : 2 We next prove Lemma 4.6. This lemma is the key to the proof of the rst part of the Stopping Rule Theorem. In addition, from this lemma we easily prove a slightly more general version of the Zero-One Estimator Theorem.

Lemma 4.6 For any xed N > 0, for any  2Z , Pr[N =N  ]  e and

?N 2

Pr[N =N  ? ]  e 11

Z

4

?N 2 Z

4

(5)

; ;

(6)

Proof: Recall the de nitions of N+ and N? from Equations (3) and (4). Let = 0. Then, the left-hand side of Equation (5) is equivalent to Pr[N+  0] and the left-hand side of Equation (6) is equivalent to Pr[N?  0]. Let 0 = N=2 and 0 = =2. For 0  i  N , let i0+ = i ? 0 ? 0 i and i0? = ?i ? 0 ? 0 i. Thus, N0+ = N+ and N0? = N? . We now give the remainder of the proof of Equation (5), using N0+ , and omit the remainder of the analogous proof of Equation (6) which uses N0? in place of N0+ . For any N > 0, Pr[N0+  0] = Pr[edN0+  1]  E[edN0+ ]:

Set d = 0 =(Z ) = =(2Z ). Note that  2Z implies that d  1. Note also that 0 since Z  Z2 , 0  dZ2 . Thus, by Lemma 4.4, ed 0 ; : : : ; edN is a supermartingale. Thus, by Lemma 4.5 +

+ 0

0+ 0+ [edN ed0 ]

E Since ed 0 is a constant, + 0

j



0+ ed0

=e

? 0 0 Z

=e

?N 2 Z

4

:

E[edN0 jed 0 ] = E[edN0 ]; +

+ 0

+

2

completing the proof of Equation (5).

We use Lemma 4.6 to generalize the Zero-One Estimator Theorem [17] from f0; 1g-valued random variables to random variables in the interval [0; 1]. Generalized Zero-One Estimator Theorem: Let Z1 ; Z2 ; : : : ; ZN denote random variables independent and identically distributed according to Z . If  < 1 and

N = 4 ln(2=)  Z =(Z )2

then

"

Pr (1 ? )Z 

N X i=1

Zi =N

 (1 + )Z

#

> 1 ? :

Proof: The proof follows directly from Lemma (4.6), using = Z , noting that Z  2Z and that N  (Z )2 =(4Z ) = ln(2=): 2

12

5 Proof of the Stopping Rule Theorem We next prove the Stopping Rule Theorem. The rst part of the proof also follows directly from Lemma 4.6. Recall that  = (e ? 2)  :72 and 1 = 1 + (1 + ) = 1 + 4 ln(2=)(1 + )=2 . Stopping Rule Theorem: Let Z be a random variable distributed in [0; 1] with Z = E[Z ] > 0. Let ~Z be the estimate produced and let NZ be the number of experiments that the Stopping Rule Algorithm runs with respect to Z on input  and . Then,

(1) Pr[Z (1 ? )  ~Z  Z (1 + )] > 1 ?  (2) E[NZ ]  1 =Z Proof of Part (1): Recall that ~Z = 1 =NZ . It suces to show that

Pr[NZ < 1=(Z (1 + ))] + Pr[NZ > 1=(Z (1 ? ))]  : We rst show that Pr[NZ < 1 =(Z (1+ ))]  =2. Let L = b1 =(Z (1+ ))c. Assuming that Z (1 + )  1, the de nition of 1 and L implies that 4 ln(2=) : (7) L 2  Z

Since NZ is an integer, NZ < 1 =(Z (1 + )) if and only if NZ  L. But NZ  L if and only if SL  1 . Thus,

Pr[NZ < 1=(Z (1 + ))] = Pr[NZ  L] = Pr[SL  1 ]: Let = 1 =L ? Z . Then,

Pr[SL  1] = Pr[SL ? Z L ? L  0] = Pr[L=L  ]: Noting that Z   2Z , Lemma (4.6) implies that

Pr[L=L  ]  e

?L 2 Z

4

e

?L(Z )2 Z

4

:

Using inequality (7) and noting that Z  Z , it follows that this is at most =2. 13

The proof that Pr[NZ > 1 =(Z (1 ? ))]  =2 is similar.

2

Proof of Part (2): The random variable NZ is the stopping time such that

1  SNZ < 1 + 1: Using Wald's Equation [22] and E[NZ ] < 1, it follows that

E[SNZ ] = E[NZ ]Z and thus,

1 =Z  E[NZ ] < (1 + 1)=Z :

2

Similar to the proof of the rst part of the Stopping Rule Theorem, we can show that

Pr[NZ > (1 + )1=Z ]  =2; (8) and therefore with probability at least 1 ? =2 we require at most (1 + )1 =Z experiments to generate an approximation. The following lemma is used in the proof of the AA Theorem in Section 6.

Stopping Rule Lemma:

(1) E[1=~Z ] = O(1=Z ). (2) E[1=~2Z ] = O(1=2Z ). E[1=~Z ] = O(1=Z ) follows directly from Part (2) of the Stopping Rule Theorem and the de nition of NZ . E[1=~2Z ] = O(1=2Z )

Proof of the Stopping Rule Lemma:

can be easily proved based on the ideas used in the proof of Part (2) of the Stopping Rule Theorem. 2

6 Proof of the AA Theorem AA Theorem: Let Z be any random variable distributed in [0; 1], let Z be the mean of Z , Z2 be the variance of Z , and Z = maxfZ2 ; Z g. Let ~Z be the approximation 14

produced by AA and let NZ be the number of experiments run by AA with respect to Z on input parameters  and . Then,

(1) Pr[Z (1 ? )  ~Z  Z (1 + )]  1 ? , (2) There is a universal constant c0 such that Pr[NZ  c0   Z =2Z ]  . (3) There is a universal constant c0 such that E[NZ ]  c0Z =2Z . Proof of Part (1): From the Stopping Rule Theorem, after Step (1) of AA, Z (1 ? p)  ^   (1 + p) holds with probability at least 1 ? =3. Let  = 2(1 + p)2 . Z Z p p We show next that if Z (1 ? )  ^Z  Z (1 + ), then in Step (2) the choice of 2 guarantees that ^Z  Z =2. Thus, after Steps (1) and (2), ^Z =^2Z  Z =2Z with probability at least 1 ? =3. But by the Generalized Zero-One Estimator Theorem, for N = (1 + ln( 23 )= ln( 2 ))  Z =2Z  (1 + ln( 23 )= ln( 2 ))  ^Z =^2Z  2  ^Z =^2Z , Step (3) guarantees that the output ~Z of AA satis es Pr[Z (1 ? )  ~Z  Z (1+ )]  1 ? 2=3. For all i, let i = (Z2i?1 ? Z2i )2 =2, and observe that, E[ ] = Z2 . First assume that p Z2  Z , and hence, Z = Z2 . If Z2  2(1 ? )Z then from the Generalized Zerop One Estimator Theorem, after at most (2=(1 ? ))(1+ln( 32 )= ln( 2 ))    =Z  2  =^Z experiments, Z =2  S=N  3Z =2 with probability at least 1 ? 2=3. Thus ^Z  Z =2. p p If Z  Z2  2(1? )Z then Z  Z2 =(2(1? )), and therefore, ^Z  ^Z  Z =2. Next, assume that Z2  Z , and thus, Z = Z . But Steps (1) and (2) guarantee p that ^Z  ^Z  Z (1 ? ), with probability at least 1 ? =3. Proof of Part (2):

AA may fail to terminate after O(  Z =2Z ) experiments either

because Step (1) failed with probability at least =2 to produce an estimate ^Z such p p p that Z (1 ? )  ^Z  Z (1 + ), or, because in Step (2), for Z2  2(1 ? )Z , ^Z = S=N and, S=N is not O(Z ) with probability at least 1 ? =2. But Equation 8 guarantees that Step (1) of AA terminates after O(  Z =2Z ) experiments with probability at least 1 ? =2. In addition, we can show, similarly to Lemma 4.6, that if Z2  2Z then Pr[S=N  4Z ]  e?NZ =2 : Thus, for N  2  =Z , we have that Pr[S=N  4Z ]  =2. 15

Proof of Part (3): Observe that from the Stopping Rule Theorem, the expected number of experiments in Step (1) is O(ln(1=)=(Z )). From the Stopping Rule Lemma, the expected number of experiments in Step (2) is O(ln(1=)=(Z )). Finally, in Step (3) observe that E[^Z =^2Z ] = E[^Z ]E[1=^2Z ] since ^Z and ^Z are computed from disjoint sets

of independently and identically distributed random variables. From the Stopping Rule Lemma E[1=^2Z ] is O(ln(1=)=2Z ). Furthermore, observe that E[^Z ]  E[S=N ]+ E[^Z ]. But E[S=N ] = Z2 and E[^Z ] = O(Z ). Thus, if Z2  Z then Z = Z2 and E[^Z ] = O(Z2 ) = O(Z ). If Z2  Z then Z = Z and E[^Z ] = O(Z ) = O(Z ). Thus, the expected number of experiments in Step (3) is O(ln(1=)  Z =(Z )2 ) 2.

7 Proof of Lower Bound Theorem Lower Bound Theorem: Let BB be any algorithm that works as described above on input (; ). Let Z be a random variable distributed in [0; 1], let Z be the mean of Z , Z2 be the variance of Z , and Z = maxfZ2 ; Z g. Let ~Z be the approximation produced by BB and let NZ be the number of experiments run by BB with respect to Z . Suppose that

BB has the the following properties: (1) For all Z with Z > 0, E[NZ ] < 1.

(2) For all Z with Z > 0, Pr[Z (1 ? )  ~Z  Z (1 + )] > 1 ? . Then, there is a universal constant c > 0 such that for all Z , E[NZ ]  c  Z =2Z . Let fZ (x) and fZ 0 (x) denote two given distinct probability mass (or in the continuous case, the density) functions. Let Z1 ; Z2 ; ::: denote independent and identically distributed random variables with probability density f (x). Let HZ denote the hypothesis f = fZ and let HZ 0 denote the hypothesis f = fZ 0 . Let denote the probability that we reject HZ under fZ and let denote the probability that we accept HZ 0 under fZ 0 . The sequential probability ratio test minimizes the number of expected sample size both under HZ and HZ 0 among all tests with the same error probabilities and . Theorem 7.1 states the result of the sequential probability ratio test. We prove the result for completeness, although similar proofs exist [21]. 16

Theorem 7.1 If T is the stopping time of any test of HZ against HZ 0 with error probabilities and , and EZ [T ]; EZ 0 [T ] < 1, then  1?  1 EZ [T ]  ! ln + (1 ? ) ln 1 ? Z and

  1 ? 1 EZ 0 [T ]  ! 0 (1 ? ) ln + ln 1 ? ; Z

where !Z = EZ [ln(fZ 0 (x)=fZ (x))] and !Z 0 = EZ 0 [ln(fZ 0 (x)=fZ (x))]. Proof: For the independent and identically distributed random variables Z1 ; Z2 ; :::, let k (Zk ) = ln(fZ 0 (Zk )=fZ (Zk )). De ne k+ = k +    1 and k? = k+. For stopping time T , we get from Wald's rst identity

EZ [T+] = EZ [T ]EZ [1] and

EZ 0 [T?] = EZ 0 [T ]EZ 0 [1]:

Next, let denote the space of all inputs on which the test rejects HZ , and let c denotes its complement. Thus, by de nition, we require that PrZ [ ] = and PrZ [ c ] = 1 ? . Similarly, we require that PrZ 0 [ ] = 1 ? and PrZ 0 [ c] = . From the properties of expectations, we can show that

EZ [T+] = EZ [T+j ]PrZ [ ] + EZ [T+j c]PrZ [ c]; and we can decompose EZ 0 [T? ] similarly. Let  = EZ [T+ j ] and observe that from Inequality 4.1, EZ [eT ? j ]  1. Thus, +

  ln EZ [eT j ]: +

But

EZ [eT j ] = EZ [eT I ]=PrZ [ ]; +

+

where I denotes the characteristic function for the set . Thus, since T Y 0 eT = ffZ ((ZZi)) ; i=1 Z i +

17

we can show that

EZ [eT I ] = PrZ 0 [ ]; +

and nally,

EZ [T+j ]  ln EZ [eT j ] = ln 1 ? : +

Similarly, we can show that

EZ [T+j c]  ln 1 ? ; EZ 0 [T?j ]  ln 1 ? ; and, Thus,

EZ 0 [T?j c]  ln 1 ? : ?EZ [T ]EZ [0 ]  ln 1 ? + (1 ? ) ln 1 ? ;

and we prove the rst part of the lemma. Similarly,

?EZ 0 [T ]EZ 0 [0 ]  (1 ? ) ln 1 ? + ln 1 ? ;

proves the second part of the lemma.

2

Corollary 7.2 If T is the stopping time of any test of HZ against HZ 0 with error probabilities and such that + =  then

EZ [T ]  ? 1!?  ln 2 ?  Z

and

EZ 0 [T ]  1!? 0 ln 2 ?  Z

Proof: If + =  then

(1 ? ) ln 1 ? + ln 1 ? = ? ln 1 ? ? (1 ? ) ln 1 ?

achieves a minimum at = = =2. Substitution of = = =2 completes the proof.

2

18

Lemma 7.5 proves the Lower Bound Theorem for Z2  Z . We begin with some de nitions. Let  = Z ? Z and for any 0  d  1, let = E0 [ed ]: De ne  = d ? ln and

fZ 0 (x) = fZ (x)  e :

Lemma 7.3 1 + d2Z2 =(2 + )   1 + d2 Z2 . Proof: Since = EZ [ed ] we use Inequality 4.2 to show that 1 + d + d2  2 =(2 + )  ed  1 + d + d2  2 : 2

Taking expectations completes the proof.

Lemma 7.4 2dZ2 =((2 + ) )  EZ 0 []  2dZ2 = . Proof: Since EZ 0 [ ] = ?1 0 , where 0 denotes the derivative of with respect to d, the proof follows directly from Lemma 7.3.

2

Lemma 7.5 If Z2  Z then for  < 1, E[NZ ]  (1 ? )(1 ? )2 ln( 2?  )  Z2 =((2 + )2Z )2 ,

Proof: Let T denote the stopping time of any test of HZ against HZ 0 . Note that EZ 0 [Z ] ? EZ [Z ] = EZ 0 [Z ] ? Z = EZ 0 []. If we set d = Z =Z2 then, by Lemmas 7.3 and 7.4, EZ 0 [Z ] ? Z > Z =(2 + ). Thus, to test HZ against HZ0 , we can use the BB with input  such that Z (1 +  )  Z (1 + =(2 + ))(1 ?  ) < Z 0 (1 ?  ). Solving for  we obtain   =(2(2 + ) + ). But Corollary 7.2 gives a lower bound on the expected number of experiments E[NZ ] run by BB with respect the Z . We observe that

?!Z = EZ [ ] = ln  d2Z2 ; where the inequality follows from Lemma 7.3. We let d = Z =Z2 and substitute  = 2(2 + ) =(1 ?  ) to complete the proof. 2 We now prove the Lower Bound Theorem that holds also when Z2 < Z . We de ne the density fZ 0 (x) = fZ (x)  (1 ? Z ) + Z : 19

Lemma 7.6 If Z  1=4 then EZ 0 [Z ] ? Z  Z =4. Proof: Observe that EZ 0 [Z ] = (1 ? Z )EZ [Z ] + Z =2 and therefore EZ 0 [Z ] ? Z = Z (1=2 ? Z ). 2

h

i

Lemma 7.7 ?EZ ln ffZZ0((xx))  1?ZZ Proof: Observe that Z ? ln ffZ 0((xx))  ? ln(1 ? Z )  1 ? Z

Z

:

Taking expectations completes the proof.

2

Lemma 7.8 If Z  1=4 then E[NZ ]  (1 ? )(1 ? ) ln( 2?  )=(16Z ). Let T denote the stopping time of any test of HZ against HZ 0 . From Lemma 7.6, and since Z  41 , EZ 0 [Z ] ? EZ [Z ] > Z =4. Thus, to test HZ against HZ0 , we can use the BB with input  such that Z (1+  )  Z (1+ =4)(1 ?  ) < Z 0 (1 ?  ). Solving for  we obtain   =(8 + ). But Corollary 7.2 gives a lower bound on the expected number of experiments E[NZ ] run by BB with respect the Z . Next observe that, by Lemma 7.7, ?!Z in Corollary 7.2 is at most 2Z . Substitution of  = 8 =(1 ?  ) completes the proof. 2

Proof:

Proof of Lower Bound Theorem: Follows from Lemma 7.5 and Lemma 7.8.

2

References [1] A. Broder, \How hard is it to marry at random? (On the approximation of the permanent)", Proceedings of the Eighteenth IEEE Symposium on Theory of Computing, 1986, pp. 50{58. Erratum in Proceedings of the Twentieth IEEE Symposium on Theory of Computing, 1988, pp. 551. [2] R. Canetti, G. Even, O. Goldreich, \Lower bounds for sampling algorithms for estimating the average ", Technical Report # 789, Department of Computer Science, Technion, Nov. 1993. 20

[3] P. Dagum, D. Karp, M. Luby, S. Ross, \An optimal algorithm for Monte-Carlo estimation (extended abstract)", Proceedings of the Thirtysixth IEEE Symposium on Foundations of Computer Science, 1995, pp. 142{149 [4] P. Dagum, M. Luby, M. Mihail, U. Vazirani, \Polytopes, permanents and graphs with large factors", Proceedings of the Twentyninth IEEE Symposium on Foundations of Computer Science, 1988, pp. 412-421 [5] P. Dagum, M. Luby, \Approximating the Permanent of Graphs with Large Factors," Theoretical Computer Science, Part A, Vol. 102, 1992, pp. 283-305. [6] P. Dagum, M. Luby, \An optimal approximation algorithm for Bayesian inference" Arti cial Intelligence Journal, 1997, In press. [7] M. Dyer, A. Frieze, R. Kannan, \A random polynomial time algorithm for approximating the volume of convex bodies", Proceedings of the Twenty rst IEEE Symposium on Theory of Computing, 1989, pp. 375-381. [8] M. Dyer, A. Frieze, R. Kannan, A. Kapoor, L. Kerkovic, U. Vazirani, \A Mildly Exponential Time Algorithm for Approximating the Number of Solutions to a Multidimensional Knapsack Problem", Combinatorics, Probability and Computing, 1993, Vol. 2, pp. 271-284. [9] M. Jerrum, \An analysis of a Monte Carlo algorithm for estimating the permanent", Proceedings of the Third Conference on Integer Programming and Combinatorial Optimization, 1993, pp. 171{182. [10] M. Jerrum, A. Sinclair, \Conductance and the rapid mixing property for Markov Chains: the approximation of the permanent resolved", Proceedings of the Twentieth IEEE Symposium on Theory of Computing, 1988, pp. 235-243. [11] M. Jerrum, A. Sinclair, \Polynomial-time approximation algorithms for the Ising model", SIAM Journal on Computing, 22, 1993.

21

[12] M. Jerrum, L. Valiant, V. Vazirani, \Random generation of combinatorial structures from a uniform distribution", Theoretical Computer Science, 1986, Volume 43, pp. 169-188. [13] M. Jerrum, U. Vazirani, \A mildly exponential approximation algorithm for the permanent", Proceedings of the Thirtythird IEEE Symposium on Foundations of Computer Science, 1992, pp. 320-326. [14] N. Karmarkar, R. Karp, R. Lipton, L. Lovasz and M. Luby, \A Monte-Carlo Algorithm for Estimating the Permanent", SIAM J. on Computing, 1993, Vol. 22, No. 2, pp. 284-293. [15] R. Karp and M. Luby, \Monte Carlo algorithms for planar multiterminal network reliability problems", Journal of Complexity, 1985, Volume 2, pp. 45-64. [16] R. Karp and M. Luby, \Monte Carlo algorithms for enumeration and reliability problems", Proceedings of the Twentyfourth IEEE Symposium on Foundations of Computer Science, 1983, pp. 56-64. [17] R. Karp and M. Luby and N. Madras, \Monte Carlo approximation algorithms for enumeration problems", Journal of Algorithms, 1989, Volume 10, pp. 429-448. [18] M. Karpinski, M. Luby, \Approximating the Number of Solutions to a GF[2] Formula," Proceedings of the Second ACM/SIAM Symposium on Discrete Algorithms, 1991, and Journal of Algorithms, Vol. 14, No. 2, March 1993, pp. 280-287. [19] M. Mihail, P. Winkler, \On the number of Eulerian orientations of a graph", Proceedings of the Third ACM/SIAM Symposium on Discrete Algorithms, 1992, pp. 138{145. [20] D. Randall, A. Sinclair, \Testable algorithms for self-avoiding walks", Proceedings of the Fifth ACM/SIAM Symposium on Discrete Algorithms, 1994, pp. 593-602. [21] D. Siegmund, Sequential Analysis, 1985, Springer-Verlag, New York. [22] A. Wald, Sequential Analysis, 1947, Wiley, New York. 22