Estimating prevalence using composites - Springer Link

2 downloads 0 Views 917KB Size Report
We are interested in estimating the fraction of a population that possesses a certain trait, such as the presence of a chemical contaminant in a lake. A composite ...
Environmental and Ecological Statistics 8, 213±236, 2001

Estimating prevalence using composites  N , G A N A P AT I P. PAT I L , S I LV E S T R E C O LO and C H A R L E S TA I L L I E Center for Statistical Ecology and Environmental Statistics, Department of Statistics, The Pennsylvania State University, 421 Thomas Building, University Park, PA 16802, USA Received June 1998; Revised December 2000 We are interested in estimating the fraction of a population that possesses a certain trait, such as the presence of a chemical contaminant in a lake. A composite sample drawn from a population has the trait in question whenever one or more of the individual samples making up the composite has the trait. Let the true fraction of the population that is contaminated be p. Classical estimators of p, such as the MLE and the jackknife, have been shown to be biased. In this study, we introduce a new shrinking estimator which can be used when doing composite sampling. The properties of this estimator are investigated and compared with those of the MLE and the jackknife. Keywords: Burrows estimator, composite sampling, jackknife estimator, shrinking estimator 1352-8505 # 2001

Kluwer Academic Publishers

1. Introduction The problem studied in this paper is that of the estimation of prevalence. Consider a population whose members may or may not have a certain trait. A good example is the presence or absence of a chemical contaminant. Let p be the proportion with the trait and k the composite size. If the presence and absence of the trait are coded as 1 and 0, respectively, then p is the population mean and at ®rst glance it might appear that we should estimate p by using X, the average of the individual responses. Here, however, the composite response is not the mean of the individual responses. Instead, the composite is free of the trait exactly when each individual sample is free of the trait. Thus, the composite response is the maximum of the individual responses. Because of this nonlinearity, the maximum likelihood estimator of p, based upon n composites, is biased and the bias cannot be completely removed. Let k be the composite sample size. Then, the information per composite is not proportional to k. Asymptotically, for large n, the information increases with k, then reaches a maximum at k ˆ kopt , and then falls as k increases beyond kopt . (For ®nite sample sizes, Burrows (1987) has found that, in some instances, the reciprocal of the mean square error can be a multimodal function of k.) For k4kopt , the drop off in information can be precipitous, even to the point where it is better to make n measurements on individual sampling units rather than n measurements on composites of size k. Thus, in designing a composite sampling study, it is wise to ensure that k  kopt and our analytical investigations will be limited to this range. 1352-8505 # 2001

Kluwer Academic Publishers

ColoÂn, Patil, Taillie

214

This paper examines three estimators of prevalence based on composite samples: 1. The maximum likelihood estimator (MLE). 2. The jackknife estimator. 3. A class of shrinking estimators which shrink the MLE toward zero. It turns out that the MLE is positively biased and shrinking can reduce the bias as well as simultaneously reduce the mean square error (MSE) of estimation. The trick is not to shrink too much, so we consider a family of estimators with two adjustable parameters, denoted by b and c, that regulate the degree of shrinking. A particular member of the family, with bˆcˆ

k

1 2k

;

has been previously proposed by Burrows (1987). The parameter c has only a small effect on the performance of the shrinking estimator and ®rst appears in the third order terms of the asymptotic expansions of the bias and MSE in powers of 1n. Actually, c was included only so that the Burrows estimator would be a special case. It will be shown that the parameter b has a greater impact upon performance. It appears in the lowest order term 1n of the bias and the second order term n12 of the MSE. This suggests two criteria for choosing b. First, one may try to reduce the magnitude of the bias which leads to Burrows choice, b ˆ k2k1, and completely eliminates the lowest order term in the bias, uniformly in p. Second, one may try to minimize the second order term in the MSE (the lowest order term equals the asymptotic variance and is the same for all the estimators). No single choice of b will minimize uniformly in p. A compromise choice of b is proposed (the 75 rule) that works well over a broad range of p, provided k ˆ kopt . The above considerations are based on the asymptotic expansions, so it becomes necessary to assess the performance of the estimators for ®nite sample sizes, n. Analytical results are not feasible, but the random variables involved are all transformations of binomially distributed variates, so the moments can be computed ``exactly'' without recourse to simulations. The computational algorithms for the small sample moments of the jackknife estimator require some rather intricate derivations which will be described below.

2. Large sample approximations Let p be the prevalence of the trait for the individual samples, k be the composite sample size, p ˆ pk be the prevalence of the trait across all possible composites of size k. Let n be the number of composites. Since the trait is absent from the composite if and only if it is absent from the k individual samples, the parameters p and p are related by the formula 1

p ˆ …1

or pˆ1

…1

p†k

…1†

1

p†k : H…p†:

We call H… ? † the prevalence transformation. Fig. 1 shows the graph of H for different

Estimating prevalence using composites

215

Figure 1. The prevalence transformation H…p† for k ˆ 1; 2; 4; 8.

values of k. From the ®gure, we see when k41 the prevalence transformation is monotone increasing, convex, and highly nonlinear when p is large. For future reference, we note that 1 H …1† …p† ˆ …1 k H …2† …p† ˆ

k

1

p†k 1

k2

…1

1

40; 1

2

p†k

40:

2.1 Bias and MSE of the maximum likelihood estimator ^ so the MLE for p is The MLE for p is the sample proportion p p^ ˆ H…^ p†: Since H is nonlinear, the MLE p^ is biased, and the bias can be substantial for ®nite sample sizes. Using the Taylor series expansion, we have p^ ˆ H…p† ‡ …^ p

p†H …1† …p† ‡ …^ p

2

p† H …2† …p†=2 ‡    :

But, p ˆ H…p†; so this gives p^

p ˆ …^ p

p p†H…1† …p† ‡ …^

p†2 H …2† …p†=2 ‡    :

Taking expectations, the bias of the MLE is given by Bias…^ p† ˆ

? X iˆ1

gi

Hi …p† ; i! i

where gi ˆ E…^ p p†i and H …i† ˆ d dH…p† i p . We may rearrange terms to obtain the asymptotic expansion in powers of 1n,   …4† H …2† …p† H …3† …p† 1 2 H …p† ‡ g3 ‡ 3g2 ‡O 3 ; Bias…^ p† ˆ g 2 2 3! 4! n where

ColoÂn, Patil, Taillie

216 g1 ˆ 0; p…1

g2 ˆ

p† n

p…1

g3 ˆ

;

p†…1 n2

2p†

:

After simpli®cation we have,

  d1 d2 1 Bias…^ p† ˆ ‡ 2 ‡ O 3 ; n n n

where  d1 ˆ

 k 1 1 …1 p†k ; 2k2 …1 p†k 1

and d2 ˆ

…k

1†…2k

1†…1

1

p†k 24k4

2

p…4k

3p ‡ kp†

:

The mean square error MSE is given by 2

MSE…^ p† ˆ g2 …H …1† …p†† ‡ g3 H…1† …p†H …2† …p†      2 1 2 3 …2† …3† …1† H …p† ‡ H …p†H …p† ‡ O 3 : ‡ g2 4 n Observe that MSE… p^† can be expressed in the form   A B 1 MSE…^ p† ˆ ‡ 2 ‡ O 3 ; n n n where Aˆ

1

p†k

…1

p†k

k2 …1

2

;

and Bˆ

…k

1†…1

2

p†k

2

p…k…4 ‡ 3p† 4k4

7p†

:

It is interesting to observe that the asymptotic variance of p^ due to compositing is varc … ^p† ˆ ‰H …1† …p†Š2 var…^ p† ˆ ‰H …1† …p†Š2

p…1 p† ; nc

where n ˆ nc is the number of composites analyzed. In the individual sampling case, the variance is vari … ^p† ˆ

p…1 p† ni

Estimating prevalence using composites

217

where ni is the number of individuals analyzed. If measurement instead of sample acquisition is the primary cost factor, the relative cost of the two sampling procedures can be measured by the ratio of the samples needed to achieve the same variance for the two designs. So, by the above equations, we have that the asymptotic relative cost of compositing compared with individual sampling is RC ˆ

k

nc 1 …1 p† ˆ : ni k2 p…1 p†k 1

From this equation, we note that RC51 indicates nc 5ni in favor of the compositing over individual sampling. In addition, observe that the relative cost satis®es the following: * *

RC? 1k as p?0, and RC ? ? as p?1 (unless k ˆ 1).

Therefore, neither sampling design is uniformly better than the other and the compositing tends to be better for small p while individual sampling is better for large p. In Fig. 2, we have plotted the asymptotic relative cost RC versus p for various values of k. From the lower envelope of these curves, we can determine the optimal value composite sample of size kopt . In practice one has to determine an appropriate value of k by using some preliminary assessment of p. If this preliminary assessment of p is too small, then k will be overestimated and the achieved performance of compositing may be seriously degraded, perhaps to the point of being worse than individual sampling. Therefore, in the preliminary design stage, it is better to overestimate p and underestimate k. Table 1 shows the values of p for which the optimal value of k makes a transition from kopt ˆ k to ^k & 0:80. This is kopt ˆ k ‡ 1. When the optimal value of k (kopt ) is used, we observe that p close to the region of high bias due to the nonlinearity or curvature of H (see Fig. 1). From the table, we also observe that kopt gets large very fast as p ? 0. In fact, kopt *1:594=p when p is small. Often kopt may be too large for practical use in contrast

Figure 2. The asymptotic cost of compositing relative to individual sampling as a function of the true prevalence p for k ˆ 1; 2; 3; 4; 20; 100.

ColoÂn, Patil, Taillie

218

Table 1. Values of p ˆ pk where the optimal composite sample size makes a transition from kopt ˆ k to kopt ˆ k ‡ 1. When p is slightly larger than pk , the optimal composite sample size is k; when p is slightly smaller than pk , the optimal sample size is k ‡ 1. The composite prevalence pk ˆ H 1 … pk † corresponding to pk is also tabulated. K

pk

pk

1 2 3 4 5 6 7 8 9 50 100 500

0.667 0.475 0.367 0.299 0.252 0.218 0.192 0.171 0.155 0.031 0.016 0.003

0.667 0.725 0.747 0.759 0.776 0.771 0.774 0.777 0.779 0.794 0.795 0.796

p with the Dorfman (1943) procedure where kopt ˆ O…1= p† as p?0. In these cases, a suboptimal but more practical choice of k5kopt can still give large performance gains over individual sampling (see Fig. 2)

2.2 Shrinking estimator ^ toward For k 4 1, H is monotone increasing, so we can reduce the Bias in p^ by shrinking p zero before applying the prevalence transformation. Since H…p† is ¯atter on the left this should also reduce the variance and the MSE. This suggests that we estimate p as H…^ p †,  ^ ˆ a^ where p p with 0  a  1. See Fig. 3.

Figure 3. Shrinking prevalence transformation H…ap† for k ˆ 2 and for different a.

Estimating prevalence using composites

219

Since the bias goes to zero as 1n, the constant a has to depend upon the sample size n. In particular, we must have a ? 1 as n??. A natural choice is   b b bc 1 ˆ1 ‡ 2 ‡O 3 : aˆ1 …n ‡ c† n n n p † ˆ H…a^ p† using the Taylor series expansion as We can express p^ ˆ H…^ p p^ ˆ H…p† ‡ …a^

p†H0 …p† ‡ …a^ p

p†2 H00 …p†=2 ‡    :

k

If we let tk ˆ E…a^ p

p† , we can express the bias as ? X

H i …p† : i! iˆ1 h i 2 Similarly, expanding MSE ˆ E …^ p H…p†† gives  i 2 X X ? X H …p† H i …p† H j …p†  : t2i ‡2 ti ‡ j MSE… p^ † ˆ i! i! j! i4j iˆ1  b 1 Putting a ˆ 1 n‡c ˆ 1 bn ‡ bc n2 ‡ O n3 , we have     b bc 1 t1 ˆ ‡ 2 p‡O 3 ; n n n     2b b2 2 1 t2 ˆ 1 g2 ‡ 2 p ‡ O 3 ; n n n   3b 1 g2 p ‡ O 3 ; t3 ˆ g3 n n   1 t4 ˆ 3g22 ‡ O 3 : n    p † ˆ An ‡ Bn2 ‡ O n13 , where So the bias for p^ can be written as Bias…^ 1    p…1 p†k 1 1 1  A ˆ 1 ; b‡ 2 k k    1 …1 p†k 1 1 1 ˆ ; b‡ 2 k k…1 p†k 1 Bias… p^ † ˆ

ti

and B ˆ

1 …1 24k4

1

p†k

6‰ 3p ‡ k…4

2

p 2…6b

3k2 …4 ‡ 3p ‡ 4b2 p ‡ 2k3 …4 ‡ p ‡ 6b2 p

5†p† 4b…p ‡ 2†† 12b…1

c ‡ cp††Š:

ColoÂn, Patil, Taillie

220 Similarly, the mean square can be written as   A B 1 MSE ˆ m ‡ m2 ‡ O 3 ; n n n where Am ˆ

2

p…1

p†k

1

k2

ˆ

1 k2 …1



k

k

2

…1 p†

and 2 1 2 …1 p†k p‰7p ‡ 2k……6b 5†p 4k4 ‡ k2 …4 ‡ 3p ‡ 4b2 p 4b…2 ‡ p††Š:

Bm ˆ



We see that the lowest order term in the bias can be eliminated by the choice b ˆ k 2k 1. In applications, however, the MSE may be more important than the bias, and we may want to know how we can reduce the MSE. The low order term in MSE is the asymptotic variance, and there is nothing we can do to reduce this. The second order term depends on b but not on c and is quadratic in b, so we can easily minimize it to obtain 1 1 bopt ˆ ‡ 2 p

3 : 2k

Unfortunately, this expression depends upon p so there is no uniformly best choice of b. From earlier results, if the optimal k is used, then pk &0:80 (0:75pk 50:8). This is a region of potentially high bias due to the curvature of the prevalence transformation H. We can study the effect of the value of b on the MSE by examining three rules for choosing b: * * *

3 75 rule: b75 ˆ 11 6 2k. 3 80 rule: b80 ˆ 74 2k . Eliminate bias rule or Eb: bEb ˆ k 2k 1.

Figure 4. 75 rule, 80 rule, and eliminate bias rule (Eb).

Estimating prevalence using composites

221

The 75 rule and the 80 rule are obtained by putting p ˆ 0:75 and p ˆ 0:80 in the above expression for bopt . From Fig. 4, we observe that the 75 rule and 80 rule shrink the estimator more than the Eb rule. This may introduce a negative bias, but may also reduce the variance and thereby reduce the MSE. To see how the different rules affect the MSE, we compare the ratio of the second order term to the ®rst order term in the Taylor expansion of the MSE. This ratio, divided by the sample size, is the percentage increase in the MSE due to the second term relative to the ®rst order term (asymptotic variance). Thus, ratio ˆ

Bm Am

Note that

  ratio 1 ‡O 2 : MSE=asyvar ˆ 1 ‡ n n

where asyvar is the asymptotic variance. Also note that a negative ratio is possible and desirable. From Fig. 5, we see that there is no practical difference between the 75 rule and 80 rule. Both of these yield a smaller MSE than does the eliminate bias rule. So we compare the 75 rule with the Eliminate bias rule by de®ning MSEjump ˆ ratio…Eb rule†

ratio…75 rule†:

When MSEjump is divided by the sample size and then multiplied by 100, the result is the percentage increase in the MSE that results from using the eliminate bias rule instead of the 75 rule. In Fig. 6, MSEjump is plotted versus p for different values of k. Note that the vertical axis values should be multiplied by 100 n in order to get the percentage increase in MSE. Therefore, if we are willing to accept some bias (in the form of underestimate on average), then the MSE can be reduced by choosing the 75 rule, that is, selecting 3 b ˆ b75 ˆ 11 6 2k. For n ˆ 100 and k5kopt , the reduction in the MSE is between 2 and 5%.

Figure 5. Ratio comparison for the 75 rule, 80 rule, and eliminate bias rule.

ColoÂn, Patil, Taillie

222

Figure 6. MSEjump versus p.

We can investigate which values of c are better when the 75 rule is used by de®ning nratio ˆ

MSE3 ; MSE1

where MSE1 and MSE3 are the ®rst and third order terms in the expansion of MSE[SHRINKING] respectively. Thus, nratio75 is the nratio when b ˆ b75 is used. It can be shown that nratio75 ˆ R ‡ S  c; where R is given by Rˆ

1 54k4 …1 30k2 …3

2



…54k…p

2†p ‡ 9p2 ‡ k4 … 66 ‡ 318p

19p ‡ 13p2 † ‡ 2k3 …81

245p2 †

396p ‡ 289p2 ††

and S is given by Sˆ

…9

11k†…4p 3† : 9k…1 p†

Note that nratio75 is linear in c. We pick out the slope to determine the sign. Letting nratio75Slope be the slope of the nratio75 and using the fact that p ˆ 1 …1 p†k in Fig. 7 we plot the slope versus p for various values of k to determine the sign. From Fig. 7, we see that the slope vanishes in the general vicinity of the optimal p corresponding to k. For p close to this optimal prevalence, it will not make much difference how we choose c because its coef®cient is close to zero. The range of practical interest is 05p5popt . In this

Estimating prevalence using composites

223

Figure 7. nratio75Slope versus p for various k.

range the coef®cient of c is positive, so the MSE can be reduced by making c small. Although, c could be negative as long as n ‡ c is positive, it seems unnatural to use negative c. Thus, examination of the third order term suggests that c ˆ 0 is reasonable choice when the 75 rule is adopted.

2.3 Burrows estimator ^b ˆ n ‡n b p ^ ˆ …1 n ‡b b†^ The Burrows estimator is p p, where b ˆ k 2k 1 ; so that it is a special case of the shrinking estimator family with c ˆ b. Observe that the ®rst order term from the asymptotic expansion of the bias is eliminated, so the bias becomes   Bb 1 Bias…^ pb † ˆ 2 ‡ O 3 ; n n where Bb ˆ

…1

k2 †…1

The mean square error is

1

p† 2 ‡ k …p 24k3

2†p

  Amb Bmb 1 MSE…^ pb † ˆ ‡ 2 ‡O 3 ; n n n

where

:

ColoÂn, Patil, Taillie

224 Amb ˆ Bmb ˆ

2 k

1

…1

p† k2

p

…k

1†2 …1 p†k 2k4

; 2

2 2

p

:

Burrows (1987) has shown that the bias and the mean squared error are smaller than for the maximum likelihood estimator.

2.4 Jackknife estimator The jackknife MLE of p is p^jack ˆ

n 1X ‰n^ p n eˆ1

…n

1†^ pe Š

where p^e is the MLE of p when observation number e is deleted from the data set. …i† Let hi ˆ H i!…p† ; i ˆ 1; 2; 3; . . . ; . The Taylor expansion of p^jack about p is ! n 1 X n… p^ p† …n 1†… p^e p† p^jack p ˆ n eˆ1 ˆ n… p^

n



n

n 1X … p^e

p†:

eˆ1

But, p^



? X iˆ1

hi …^ p

p†i

and p^e



n X iˆ1

hi …^ pe

i

p† ;

^e is the MLE when observation number e is deleted from the data set. Thus, where p ! n n X n 1X i i hi n…^ p p† …^ p p† : p^jack p ˆ n eˆ1 e iˆ1 Taking expectations gives the bias of the jackknife estimator as   ? X n 1 Mn 1;i hi nMn;i Bias‰ jackŠ ˆ n iˆ1 where the Mn;i is the ith central moment of the binomial distribution with parameters n and p. To obtain the mean square error of the jackknife estimator, we need to square the above expansion of p^jack p. So, let us write

Estimating prevalence using composites n 1X

n

p†i

Ti ˆ n…^ p

n

eˆ1

225 p†i

…^ pe

which gives p^jack

? X



iˆ1

hi Ti :

Squaring and taking the expectations gives p†2 Š;

MSE‰ jackŠ ˆ E‰…^ pjack MSE‰ jackŠ ˆ

? X ? X iˆi jˆ1

hi hj E‰Ti Tj Š:

…2†

But, p†i ‡ j

Ti Tj ˆ n2 …^ p

n X



e0

‡



n X eˆ1

…n 

…n

n

1

…^ p

ˆ1

n

i



…^ pe

p†i …^ p e0

p† …^ pe 0

2 X n X n e ˆ 1 e0 ˆ 1

p†i …^ p

…^ pe

p†j

j

p†j :

…3†

We break the last term into two sums, where e ˆ e0 and where e 6ˆ e0 . The last term then becomes 

n

1 n

2 X n …^ pe



i‡j

‡

 n

1

2

n

eˆ1

X e;e0 ˆ 1;e6ˆe0

…^ pe

p†i …^ pe 0

p†j :

Now, let p Mn;i ˆ E…^

p†i

…1†

p† …^ p

…2†

p†i …^ pe 0

i

Mn;i;j ˆ E‰…^ pe pe Mn;i;j ˆ E‰…^

j

p† Š p†j Š

where e 6ˆ e0 . Taking the expectation of both sides of Equation (3) gives E‰Ti Tj Š ˆ n2 Mn;i ‡ j ‡

…n

1† n

n…n 2

Mn

…1†

…1†

1†…Mn;i;j ‡ Mn;j;i † 1;i ‡ j

‡

…n

1† n

3

…2†

Mn;i;j :

Finally , we need to compute M…1† and M…2† . But,

…4†

ColoÂn, Patil, Taillie

226 ^ p

x1 ‡    ‡ xn n



n

ˆ

^e Note that p …^ p

1 …x1

n n

ˆ

p† ‡    ‡ …xn p† n z‚‚‚‚‚‚}|‚‚‚‚‚‚{ p† ‡    ……xe p†† ‡    ‡ …xn p† xe p ‡ n 1 n

1 n

…^ pe



p† ‡

xe

…x1

p n

:

p is independent of xe p. Applying the binomial theorem, !  j m j j n 1m X j m 1 j m p† ˆ …^ pe p† …xe p† ; n n mˆ0 m ! j 1 X j m m j m …n 1† …^ ˆ j pe p† …xe p† : n mˆ0 m i

p† and taking the expectations gives ! j 1 X j ˆ j …n 1†m Mn 1;m ‡ i M1; j m : n mˆ0 m

Multiplying by …^ pe …1†

Mn;i; j

…5†

…2†

Next, we look at Mn;i; j . Now ^e p



n n

2 …^ p 0 1 e;e

p† ‡

^ e0 p



n n

2 …^ p 0 1 e;e

p† ‡

1 n

1

…xe0

p†;

…xe

p†;

1 n

1

^e;e0 is the MLE of p with observation numbers e and e0 (with e 6ˆ e0 ) deleted. where p Therefore, p†i …^ pe0

…^ pe ˆ

i i X

1 …n

j X

j

mˆ0

m

ˆ

p† j

…n

pe;e0 2†l …^

pe;e0 2†m …^

p†m …xe

1†i l ˆ 0 ! …n

!

l

j i X X

1

i

!

1†i ‡ j l ˆ 0 m ˆ 0 l

…n

j

…2† Mn;i; j

ˆ

1 …n

j i X X

i

1†i ‡ j l ˆ 0 m ˆ 0 l

p†j

p†i

l

1 …n

1† j

m

!

m

Taking expectations gives

p†l …xe0

!

pe;e0 2†l ‡ m …^

…n

j m

p†l ‡ m …xe0

p†i

l

p†j

…xe

m

! …n

2†l ‡ m Mn

2;l ‡ m M1;i

l M1; j

m:

…6†

The Equations (2), (4), (5), and (6) give the formal expression for MSE‰ jackŠ in terms of the central moments Mn;i ; Mn 1;i ; Mn 2;i ; M1;i of the binomial distribution. Next we need to study the order of these expressions in 1n to determine how many terms have to be carried along.

Estimating prevalence using composites

227

Since there are no adjustable parameters in the jackkni®ng estimator, it should be adequate to obtain MSE‰ jackŠ to within O…n13 †. Note that three of the four terms in Equation 2 (4) have  multiplicative factors that are O…n †. Therefore the others must be computed to 1 O n5 . E‰Ti Tj Š is O

Theorem 1.

1 n3



if i ‡ j48.

The proof of this theorem is a consequence of Equations (4)±(6) and the following lemma. Lemma  (a) Mn;i ˆ O n1t where t ˆ ceil…2i †. Here, ceil‰xŠ rounds x upward to an integer. …1† (b) Mn;i;j ˆ O n15 if i ‡ j48.  More speci®cally, the nth term in Equation (5) is O n15 if m52i ‡ j 8, thus the sum in (5) can be started at m ˆ max…0; 2j ‡ i 8†. …2† (c) Mn;i;j ˆ O n15 if i ‡ j48.  More speci®cally, the …l; m† term in Equation (6) is O n15 if l ‡ m52…i ‡ j† 8. Thus, the double sum in (6) can be computed as i X

X

lˆ0

m ˆ max…0;2…i ‡ j†

: 8



Proof: (a) Straightforward. (b) The relevant factor in the mth term of Equation (5) is …n



m

nj By (a), Mn

Mn

1;m ‡ i :

1;m ‡ i



d ˆ j ‡ ceil

is O

m‡i 2

…7†

! 1 ceil m ‡ i 2



n

so (7) has order

m:

m‡i 1 Write ceil…m‡i 2 † ˆ 2 ‡ em‡i where e ˆ em‡i is 0 if m ‡ i is even and 2 if m ‡ i is odd. Then

i d ˆj‡ ‡e 2 Thus (7) is O i j‡ ‡e 2

1 n5



m : 2 if d44, i.e., if

m 44; 2

which will be true if j‡

i 2

m 44: 2

…8†

But the last will be true for all terms in (5) if it is true for m ˆ 0; 1; . . . ; j: But this will be so if it is true for the largest m, i.e., for m ˆ j. Thus (5) is O n15 if.

ColoÂn, Patil, Taillie

228 j‡

i 2

j 44 2

or i ‡ j48 Also, we see that we can ignore those m satisfying (8) when we compute the sum in (5). Thus, we can start the sum at m ˆ 2j ‡ i 8 ˆ j …8 …i ‡ j††: (c) Similar reasoning. & Efron (1982) has shown that the bias of the jackknife estimator can be written as   Bj 1 Bias…^ pjack † ˆ 2 ‡ O 3 ; n n where Bj ˆ

…1

k†…1

2k†…1

1

p†k 24k4

2

p…4k

3p ‡ kp†

:

Observe that the jackknife estimator eliminates the ®rst order bias from an estimator, as is well known. Also, the mean square error of the jackknife estimator has the form   Amj Bmj 1 ‡ 2 ‡O 3 ; MSE ˆ n n n where Amj ˆ Bmj ˆ

…k

p†2=k k2

…1

2

1

p 2

1† …1 p†k 2k4

; 2 2

p

:

Note that Bmj ˆ Bmb , so the MSE for the Burrows and jackknife estimators are equal to second order term.

2.5 Comparisons For all the three estimators, we ®nd that the bias and the mean square error have the forms   A B 1 bias ˆ ‡ 2 ‡ O 3 ; n n n   Am B m 1 ‡ 2 ‡O 3 ; MSE ˆ n n n where Am is the same for all three estimators (Am =n is the asymptotic variance). For the bias expansion, the shrinking estimator and the jackknife estimator have different values of B, but the form of B does not involve the constant c of the shrinking estimator. Also B

Estimating prevalence using composites

229

Figure 8. rterm2 versus p.

depends upon p and neither the shrinking estimator nor jackkni®ng is uniformly better than the other in minimizing B. Observe that the asymptotic bias expansion, for the Burrows and the jackknife estimator both eliminate the ®rst order term. Therefore it is useful to compare the second terms by de®ning rterm2 ˆ

B : p

When rterm2 is divided by n2 and then multiplied by 100, the result is the relative ( percent) bias due to second term. From Fig. 8, we have that the rterm2 for the jackknife is always negative and rterm2 for Burrows estimator is always positive. But, with a sample size as small as n ˆ 10 and a sensible choice of k, the contribution to the bias is only 0.10% to 0.50% for Burrows estimator and 0.20% to 4.5% for the jackknife estimator. Thus, the jackknife estimator tends to underestimate and the Burrows estimator tends to overestimate p, at least asymptotically. It is interesting to observe that the MSE expression for both the jackknife estimator and Burrows estimator are equal up to the second order term in the asymptotic expansion. So one would expect that when n is large enough, the MSE for both estimators would be similar. In the next section, we study the behavior of these estimators for small sample sizes.

3. Exact small sample computation of bias and MSE 3.1 Maximum likelihood estimator We already have p^ ˆ H…^ p† ˆ H…ny† where y*Binomial…n; p†. Therefore bias and the mean square error can be computed as

ColoÂn, Patil, Taillie

230 bias(MLE) ˆ E‰…^ p ˆ where b…y; n; p† ˆ

n y



p†Š;

n    X y H n yˆ0

p†n

py …1

y

. Similarly,

p†2 ;

MSE(MLE) ˆ E…… p^ ˆ

 p b…y; n; p†;

n    X y H n yˆ0

2 p b…y; n; p†:

3.2 Shrinking estimator b We have a ˆ 1 n‡c where b,c are given. Remember that the shrinking estimator is de®ned by p^s ˆ H…a ny†. Then, the bias and the mean square error in the shrinking case are

bias(SHRINKING) ˆ E‰… p^s ˆ

n   X y H a n yˆ0

p†Š

 p b…y; n; p†

and 2

MLE(SHRINKING) ˆ E‰… p^s

p† Š ˆ

n   X y H a n yˆ0

2 p b…y; n; p†:

3.3 Jackknife estimator We have seen earlier that p^jack

p ˆ n… p^

n



n

n 1X … p^e

p†;

eˆ1

where p^e is the MLE when observation number e is deleted from the data set. Thus, the bias can be computed as bias… jack† ˆ n bias…MLE; n†

…n

1† bias… MLE; n

1†:

For the mean square error computation, we have …pjack

p†2 ˆ n2 … p^

p†2

2…n



n X eˆ1

‡

‡

 n

1 n

 n

e ˆ 1 e0 ˆ 1e0 6ˆe

1 n

2 X n n X 2 X n eˆ1

… p^e

… p^e

p†2 :

… p^e

p†… p^

p†… p^e0

p† p†

Estimating prevalence using composites

231

The mean square error for the jackknife then becomes MSE… jack† ˆ n2 MSE1 ‡

…n



MSE2

n

2n…n

1† MSE3 ‡

1†3

…n

MSE4;

n

where MSE1 ˆ MSE…MLE; n†; MSE2 ˆ MSE…MLE; n

1†;

MSE3 ˆ E‰… p^e

p†… p^

p†Š

MSE4 ˆ E‰… p^e

p†… p^e0

p†Š:

We need to calculate MSE3 and MSE4. We have   y ‡    ‡ y  y^ y n p^ ˆ H 1 ˆH e‡ e ; n n n

 where y^e ˆ …y1 ‡    ‡ yn † ye , and p^e ˆ H ny^e1 . We use iterated conditional expectation and the fact that y^e and ye are independent to obtain E‰…^ pe

p†…^ p  y^e

p†Š ˆ E‰E‰Uj^ ye ŠŠ;  e pŠ‰H y^e ‡y pŠ. Since ye *Binomial…1; p†, where U ˆ ‰H n 1 n         y^e y^ y^ ‡ 1 E‰Uj^ ye Š ˆ H p …1 p†H e ‡ pH e n n 1 n

 p

and, E‰…^ pe

p†…^ p

p†Š ˆ

Similarly, p^e ˆ H Thus,



nX1 yˆ0

y^e;e0 ‡ye0 n 1

E‰Uj^ ye;e0 Š ˆ

H



E‰…^ pe p†…^ p e0 h   y^ ‡y where U ˆ H e;en 1 e 0



0

n



y 1

 p …1

and p^e0 ˆ H



  y‡1 p†H ‡ pH n n y

y^e;e0 ‡ye n 1



 p b…y; n

where y^e;e0 ˆ y1 ‡    ‡ yn

ye

0

    y^e;e0 ‡ ye E H n 1

p†…^ p

p†Š ˆ

nX2

…1

yˆ0

ye0 .

p†Š ˆ E‰E‰Uj^ ye;e0 ŠŠ ih   i y^ ‡y p H e;en 1 e p . Since y^e;e0 ; ye ; ye0 are independent, we have  2 pj^ ye;e0



 ˆ

…1

p†H

   y^e;e0 y^e;e0 ‡ 1 ‡ pH n 1 n 1

Thus, E‰…^ pe

1; p†:

   y  y‡1 p†H ‡ pH n 1 n 1

p

2

b…y; n

2; p†:

2 p

:

232

ColoÂn, Patil, Taillie

Figure 9. Relative bias expressed as a percent (BIASper) for small samples n ˆ i ‡ 2…3  n  20† with p ˆ 0:75.

Figure 10. Relative bias expressed as a percent (BIASper) for n ˆ i ‡ 14…15  n  80† with p ˆ 0:75.

Figure 11. MSEs for MLE, Burrows, 75 rule and jackknife n ˆ i ‡ 2…3  n  80† with p ˆ 0.75.

Estimating prevalence using composites

233

Figure 12. MSEper of the estimators with p ˆ 0:75 and n ˆ i ‡ 2 …3  n  20†.

3.4 Comparisons for small sample size The relative bias (expressed as a percent) of an estimator of p is given by BIASper ˆ 100 

E…estimator† p

p

:

In Figs. 9±14, it is important to read the captions because the horizontal axis of the ®gure often does not represent the sample size n. The horizontal axis in Figs. 9±14 is represented by i and the relationship between n and i always described in the caption. Looking at Fig. 9, it is clear that for n less than 20, the Burrows estimator is more effective at the bias reduction than the jackknife, 75 rule and MLE.

Figure 13. MSEper of the estimators with p ˆ 0:75 and n ˆ i ‡ 29 …30  n  80†.

234

ColoÂn, Patil, Taillie

Figure 14. DMSEper for p ˆ 0:70 (left) and p ˆ 0:80 (right) with n ˆ i ‡ 2 …3  n  100†. Negative values of DMSEper indicate that 75 rule is more effective than the Burrows estimator in reducing MSE.

In Fig. 10, there are 4 plots which, for a given k, depict the bias of the four estimators for larger sample sizes. In each plot, we can see that for n greater than 20, the performance of the jackknife and the Burrows estimator in bias reduction is almost identical and always better than the 75 rule and the MLE. Fig. 11 shows how MSE is affected by sample size. As n becomes large, the MSEs of all four estimators converge to the asymptotic variance. In order to distinguish among the estimators, we de®ne

Estimating prevalence using composites MSEper ˆ MSE(estimator) asyvar  100

235

asyvar

where asyvar is the asymptotic variance. The quantity MSEper is the relative difference per observation between the MSE and the asymptotic variance. In Fig. 12, we observe for small samples …n520†, that the MLE and jackknife estimators have large values of MSEper indicating that they are ineffective in reducing the MSE. Fig. 13 shows that MSEper for the jackknife and Burrows estimator are approximately equal for n greater than 50. This is consistent with our earlier results showing these two estimators had the same MSE to second order in n. We can compare MSEper for the Burrows and 75 rule de®ning DMSEper ˆ MSEper…75 rule†

MSEper…Burrows†:

Note that DMSEper ˆ nMSE…75 rule† MSE…Burrows† asyvar  100: In Fig. 14, there are 8 plots. The four plots on the right represent DMSEper for p ˆ 0:7 and the four plots on the left represent the same thing but for p ˆ 0:8. These values of p were chosen because for k ˆ kopt we expect pk to be between 0.7 and 0.8. In all 8 plots, we see that for n greater than 10, the 75 rule achieves a 10 to 30% reduction in MSE per observation compared with the Burrows estimator. The anomalous behavior of DMSE per for very small sample sizes indicates that the asymptotic theory leading to the 75 rule is inapplicable in this range.

Notes Prepared with partial support from the Statistical Analysis and Computing Branch, Environmental Statistics and Information Division, Of®ce of Policy, Planning, and Evaluation, United States Environmental Protection Agency, Washington, DC under a Cooperative Agreement Number CR-825506-01-0. The contents have not been subjected to Agency review and therefore do not necessarily re¯ect the views of the Agency and no of®cial endorsement should be inferred.

References Burrows, P.M. (1987) Improved estimation of pathogen transmission rates by group testing. Phytopathology, 77(2), 363±65. Dorfman, R. (1943) The detection of defective members of large populations. Annals of Mathematical Statistics, 14, 436±40. Efron, B.E. (1982) The Jackknife, the Bootstrap and Other Resampling Plans, SIAM, Philadelphia, PA. Johnson N.L. and Kotz, S. (1969) Distributions in Statistics: Discrete Distributions, Wiley, New York. Rhode, C.A. (1976) Composite Sampling. Biometrics, 32, 273±82. Wolfram, S. (1990) Mathematica, Addison-Wesley, Redwood City, CA.

236

ColoÂn, Patil, Taillie

Biographical Sketches Silvestre ColoÂn is an Instructor in the Department of Mathematics at the University of Puerto Rico. G.P. Patil is Distinguished Professor of Mathematical Statistics and Director of the Center for Statistical Ecology and Environmental Statistics in the Department of Statistics at the Pennsylvania State University. C. Taillie is Senior Research Associate in the Center for Statistical Ecology and Environmental Statistics in the Department of Statistics at the Pennsylvania State University.