Sample Size and Power Calculation in Comparing Diagnostic ...

2 downloads 0 Views 100KB Size Report
*National Institute of Child Health and Human Development, USA, **University of Alabama at Birmingham, USA. (Received September 2002; accepted July ...
Journal of Applied Statistics, Vol. 31, No. 1, 49–59, January 2004

Sample Size and Power Calculation in Comparing Diagnostic Accuracy of Biomarkers with Pooled Assessments AIYI LIU*, ENRIQUE F. SCHISTERMAN* & ERIC TEOH** *National Institute of Child Health and Human Development, USA, **University of Alabama at Birmingham, USA (Received September 2002; accepted July 2003) A When testing of a biomarker is costly, pooling of samples becomes a useful and efficient alternative (Faraggi et al., 2003). In this paper, we develop procedures for sample size and power calculations for planning a study comparing the accuracy of biomarkers in diagnosis of diseases with pooled samples. Explicit formulas are derived for several important pooling strategies. The effects of pooling samples on sample size and power of the test are also discussed. K W: Area under the curve, diagnostic biomarkers, sensitivity, specificity, receiver operating characteristic curves

Introduction Receiver operating characteristic (ROC) curves have been shown to be a useful tool for evaluating the effectiveness of biomarkers in diagnosis of diseases (Reiser & Faraggi, 1997; Su & Liu, 1993; Wieand et al., 1989). Suppose X and Y are two continuous random variables representing the values of a diagnostic biomarker among a diseased group and a non-diseased group, with distributions F and G, respectively. The diagnostic test will be considered positive if the value of the biomarker is greater than a cut-off value c. The ROC curve associated with the biomarker is obtained by plotting the biomarker’s sensitivity (or true positive ¯ (c), where F¯ó1ñF rate) F¯(c), versus 1ñits specificity (or false positive rate) G ¯ ó1ñG, as c takes all possible values. The area under the ROC curve is and G then used as a measure of the usefulness of the biomarker in discriminating diseased and non-diseased subjects. When values of the biomarker are observed from non-diseased and diseased samples, the ROC curve and the area under it can be estimated using both parametric and non-parametric methods (see, for example, Hanley & McNeil, 1982; Bamber, 1975; Reiser & Guttman, 1986; Wieand et al., 1989; Mee, 1990; and Zou et al., 1997). Correspondence Address: E. F. Schisterman, Division of Epidemiology, Statistics and Prevention Research, National Institute of Child Health and Human Development, 6100 Executive Blvd, Room 7B05, Bethesda, MD 20852, USA. Email: [email protected] 0266-4763 Print/ 1360-0532 Online/04/010049-11 © 2004 US Federal Government DOI: 10.1080/0266476032000148948

50

A Liu et al.

In many situations, the biomarkers are developed in laboratory settings. Evaluating their effectiveness in early detection and prevention of chronic and acute diseases, however, can become very costly when they are applied on human subjects. Such high cost may limit the number of tests that can be performed on the biomarkers, while the number of diseased and non-diseased samples may be relatively large. Under this situation, Faraggi et al. (2003) proposed using pooled blood samples to evaluate the area under the ROC curve of a new biomarker. Liu & Schisterman (2003) later developed test procedures for comparing the effectiveness of biomarkers when samples are pooled. Sample size and power calculations are important for planning studies involving hypothesis testing. In the context of diagnostic testing, several authors have investigated methods of calculation of power and sample size for comparing areas under two or more ROC curves when the tests are performed on individual subjects (Obuchowski & McClish, 1997; Obuchowski, 1998). Formulas for computing sample size and power, however, have not been developed in the case when pooled samples are used to evaluate the effectiveness of biomarkers. In this paper, we derive formulas for sample size and power for planning studies comparing the accuracy of biomarkers in diagnosis of diseases with pooled samples. In the next section, we propose a test statistic and derive its asymptotic distribution. In the third section we obtain formulas for power and sample size, and then discuss two important special cases. The effects of pooling on sample size and power are discussed in the fourth section. Some discussion is given in the final section.

A Test Statistic Suppose blood samples from diseased and non-diseased subjects are randomly grouped into sets, as discussed by Faraggi et al. (2003). The blood samples in a set are pooled together and the value of the biomarker is measured from these pools. Since the measurements are generally per unit of volume, we assume that the measurement for a pooled set is the average of the individual biomarker values in that set. This type of assumption is reasonable for stable biomarkers for some diseases (Weinberg & Umbach, 1999). For simplicity, we consider two biomarkers, denoted as marker i(ó1,2), whose values follow a normal distribution Xi~N(k ,p2 ) for non-diseased individuals iN iN and Yi~N(k ,p2 ) for diseased individuals. The Xs are assumed to be independiD iD ent of the Ys. The correlation coefficient is denoted as o between X1 and X2, N and o between Y1 and Y2. The two pairs of observation, (X1, X2) from a D non-diseased individual and (Y1, Y2) from a diseased individual, are not observed; instead, averages of biomarker values of several individuals are recorded. The area under the ROC curve w.r.t. biomarker i is



c A ó' i , (ió1,2) i g i

(1)

Comparing Diagnostic Accuracy of Biomarkers

51

where, throughout, ' and { denote the standard normal distribution and density function, respectively, and (2) c ók ñk , g2óp2 òp2 i iD iN i iN iD Such re-parameterization reduces the number of parameters and retains independence of the parameter estimates, and hence makes it easier to compute the correlation between two estimated areas (see also the Appendix). The null hypothesis of interest is H :dóA ñA ó0. For simplicity, we consider 0 1 2 only the two-sided alternative H :dÖ0. a For each biomarker i(ó1,2), let X , jó1, . . . , n , be the observed values in the ij i jth pool with size p from N non-diseased individuals, and Y , kó1, . . . , m , be ij ik i the observed values of the kth pool with size q from M diseased individuals. ik (Hence, by definition, Nó&ni p and Mó&mi q for each i.) Then j1 ij k1 ik









1 1 X ~N k , p2 , jó1, . . ., n Y ~N k , p2 , jó1, . . ., m ij iN p iN i ik iD q iD i ij ik

(3)

From standard linear model theory (Rao, 1973), means and variances are estimated by 1 ni 1 ni kˆ ó ; p X , pˆ 2 ó ; p (X ñkˆ )2 iN N ij ij iN n ij ij iN i j1 j1

(4)

1 mi 1 mi ; q Y , pˆ 2 ó ; q (X ñkˆ )2 kˆ ó iD M ij ij iD m ij ij iD i j1 j1

(5)

and

yielding an estimate of A

i



cˆ ˆ ó' i A i gˆ i

(6)

where cˆ ókˆ ñkˆ , gˆ 2ópˆ 2 òpˆ 2 i iD iN i iN iD

(7)

ˆ ñA ˆ , and Define dˆ óA 1 2 ˆ )ñ2cov(A ˆ ,A ˆ ) ˆ )òvar(A (8) S2óvar(dˆ )óvar(A 1 2 1 2 Then, asymptotically, dˆ ~(d,S2). H is hence rejected at the significance level of 0 , where for any cé[0,1], (Zc)óc. Note that in the planning a if Ddˆ /SD[Z 1?2 stage of a study, we assume that S is known or can be ‘guessed’ from previous data. For data analysis at the end of the study, however, S will be estimated from data collected during the course of the study.

52

A Liu et al.

Power and Sample Size Calculation In planning a comparative study of biomarkers, a sufficient number of samples are required so that the statistical test has the desired power to detect a certain difference of interest in the biomarkers’ diagnostic accuracy. For studies involving pooled assessments, the power and sample size depend on how the samples are pooled. Thus, we need to pre-specify the sequence of pooling sizes p and q . ij ij One popular pooling strategy is to use a common size for all sets of pooled samples (see two special cases below). Let dód be the smallest meaningful difference between the ROC areas of 1 two biomarkers. Then d , N, the number of non-diseased samples, M, the number 1 of diseased samples, and b, the power of the test given in the previous section satisfy the equation: d 1 (9) Só Z òZ 1?2 @ where S is given by equation (8). Therefore, given any three of the four design parameters, N, M, d and b, the fourth one can be determined by solving the 1 above equation. In particular, the power of the test is given by bó'



d 1ñZ 1?2 S



(9@)

It remains to find S, the standard error of dˆ . Below we present an expression of S in terms of the means, variances and correlation coefficients of the two normal distributions. Detailed derivation of the expression is given in the Appendix (see also Liu & Schisterman, 2003). Let c and d be the number of common nonjk jk diseased and diseased individuals in the jth pooling for marker 1 and the kth pooling for marker 2, respectively. Then asymptotically

         

2 c 1 {2 i S2ó ; g2 g i i i1



c2 n ñ1 m ñ1 1 1 i p2 ò p2 ò i p4 ò i p4 N iN M iD 2g4 n2 iN m2 iD i i i

c c 2 ñ { 1 { 2 g g g g 1 2 1 2



1 1 o p p ò o p p N N 1N 2N M D 1D 2D



(10)

p2 p2 o2 c2 c c 1N 2N N jk ñ1 ; ò 1 2 2g2g2 n n p p 1 2 1 2 jWn1 kWn2 1j 2k



d2 p2 p2 o2 jk ñ1 ; ò 1D 2D D m m q q 1 2 1j 2k jWm1 kWm2

For the proof, see the Appendix. For pre-specified values of k , k , k , k , p , p , p , p , o and o — 1N 2N 1D 2D 1N 2N 1D 2D N D these parameters determine the difference d between the two areas—the sample 1 sizes N, M, and power b can then be obtained by using equations (9) and (10). We discuss two important study designs below.

Comparing Diagnostic Accuracy of Biomarkers

53

Biomarkers Tested on the Same Pooled Samples with Equal Size Suppose averaged measurements are collected for the two markers from the same pooled individuals with each pooling having the same number p of individuals from the healthy population, and q from the diseased population, respectively. Let n be the number of pooling for healthy population, and m be that for the diseased population. Then n ón ón, p óp óp, Nónp, and c óp if jók and 1 2 1j 2k jk 0 if jÖk. Similarly, m óm óm, q óq óq, Mómq, and d óq if jók and 0 1 2 1j 2k jk if jÖk. For this design, equation (10) reduces to

       

2 c 1 {2 i S2ó ; g2 g i i i1



c2 nñ1 1 1 mñ1 p2 ò p2 ò i p4 ò p4 pn iN qm iD 2g4 n2 iN m2 iD i

c c 2 ñ { 1 { 2 g g g g 1 2 1 2



1 1 o p p ò o p p pn N 1N 2N qm D 1D 2D

(11)



(nñ1) p2 p2 o2 (mñ1) p2 p2 o2 c c 1N 2N N ò 1D 2D D ò 1 2 2g2g2 n2 m2 1 2 The sample size N and M for this design can then be obtained by solving equation (9), together with equation (11). In general, this involves finding the positive root of a quadratic equation; if both roots are positive, then some judgement is needed. Note that for relatively large n and m, nñ1Bn and mñ1Bm. Approximate estimates of the sample size can then be obtained by replacing nñ1 and mñ1 by n and m respectively in equation (11). This yields close forms for the sample size. We have





(Z òZ )2 1 1? @ (v òpw )òv )òqw (12) N D D d2 j N 1 where jóN/M is the ratio of the number of non-diseased and diseased samples, and Mó

     

  

 

2 c 1 {2 i v ó; N g2 g i i i1

pc2p4 p2 ò i iN iN 2g4 i

c c 2 ñ { 1 { 2 o p p N 1N 2N g g g g 1 2 1 2

2 c 1 v ó; {2 i D g2 g i i i1

qc2p4 p2 ò i iD iD 2g4 i

c c 2 ñ { 1 { 2 o p p D 1D 2D g g g g 1 2 1 2



c c c c c c p2 p2 o2 c c p2 p2 o2 w óñ 1 2 1N 2N N { 1 { 2 , w óñ 1 2 1D 2D D { 1 { 2 N D g3g3 g g g3g3 g g 1 2 1 2 1 2 1 2

One Biomarker Tested on Pooled Samples and the Other on Individual Samples Suppose biomarker 1 is tested on n sets of pooled healthy samples, each set with size p, and m sets of pooled diseased samples, each set with size q. Marker 2 is

54

A Liu et al.

tested on each of the Nónp healthy samples, and Mómq diseased samples, separately. Then n ón, n óN, p óp, p ó1, c ó1 if ( jñ1)pò1OkOjp and 0 1 2 1j 2k jk otherwise, for healthy population, and m óm, m óM, q óq, q ó1, d ó1 if 1 2 1j 2k jk ( jñ1)qò1OkOjq and 0 otherwise, for the disease population. For this design, equation (10) becomes

       

2 c 1 {2 i S2ó ; g2 g i i i1



c2 n ñ1 m ñ1 1 1 i p2 ò p2 ò i p4 ò i p4 N iN M iD 2g4 n2 iN m2 iD i i i

c c 2 ñ { 1 { 2 g g g g 1 2 1 2



1 1 o p p ò o p p N N 1N 2N M D 1D 2D

(13)



(nñ1) p2 p2 o2 (mñ1) p2 p2 o2 c c 1N 2N N ò 1D 2D D ò 1 2 2g2g2 nN mM 1 2

Finding the sample size again involves solving quadratic equations. Similar to equation (12), an estimate of the sample size is given by



(Z òZ )2 1 1? @ (v òw )òv òw N D D d2 j N 1





(14)

where j, v , v , w , and w are defined as in equation (12). N D N D Effects of Pooling on Power and Sample Size We mentioned earlier that pooling of samples is appealing when the number of assays is limited due to cost or other reasons. In this section, we show that, for a fixed number of assays, designs with pooling of samples increase the power of the test, as compared with the design that tests the biomarkers on individual samples. We assume that the biomarkers can only be tested on n assays for healthy samples and m assays for diseased samples. We shall call such a design that tests the two biomarkers on each of the n healthy and m diseased individuals the naive design. This can be viewed as a special case of the previous subsections. From equation (11) we have, for the naive design,

       

2 c 1 {2 i S2ó ; g2 g i i i1



c2 nñ1 1 1 mñ1 p2 ò p2 ò i p4 ò p4 n iN m iD 2g4 n2 iN m2 iD i

c c 2 ñ { 1 { 2 g g g g 1 2 1 2

1 1 o p p ò o p p n N 1N 2N m D 1D 2D



2(nñ1) p2 p2 o2 2(mñ1) p2 p2 o2 c c 1N 2N N ò 1D 2D D ò 1 2 4g2g2 n2 m2 1 2

 (15)

Comparing Diagnostic Accuracy of Biomarkers

55

We compare the naive design with the design given earlier. For simplicity, we assume the values of the two biomarkers for both healthy (X1,X2) and diseased individuals (Y1,Y2) follow bivariate normal distributions with E(X1)ó E(X2)ó0, E(Y1)ók , E(Y2)ók , var(X1)óvar(X2)ó1, var(Y2)ó 1D 2D h2var(Y1), and corr(X1,X2)ócorr(Y1,Y2)óo. Values of 0.65, 0.75, and 0.85 of the area A w.r.t. the second biomarker (X2,Y2) are compared to a fixed 2 value of 0.6 of the area A w.r.t. the first biomarker (X1,Y1). The pooling 1 strategy used here assumes equal number of healthy individuals as diseased individuals and constant pool size, p. For h2ó0.5, 1.0 and oó0.5, 0.75, Table 1 presents the power of the test for the two designs, using the asymptotic formula of power (9@) with S given in equations (11) and (15). The level of statistical significance is set to be 0.05. The table shows that designs with pooled samples have higher power than the naive design, which is not surprising because more samples are involved for the pooling designs. As expected, the power of the test increases as the pooling size increases. We also notice the power increases as the correlation coefficient o increases, indicating that more correlated measurements generally yield higher power. To evaluate the accuracy of the asymptotic formulas of power, we also present the power of the test based on simulations. Note that for given h, A and A , 1 2 k and k can be computed using equation (1). Using a built-in S-Plus 1D 2D function, rmvnorm, 5000 simulations were generated from bi-variate normal distributions with the desired parameters, under both the null hypothesis of equal area and the alternative hypothesis with the selected difference in area. The power of the test is computed as the proportion that Ddˆ D exceeds the corresponding critical values, where dˆ is defined as in equation (8). Critical values are determined so that the power is 0.05 when the null hypothesis is true. Values of the simulated power of the test are also presented in Table 1. Overall, the asymptotic formulas for power and sample size we developed in the paper are quite accurate. In most cases, the simulated power is very close to the power computed from the asymptotic formulas. The formula is slightly conservative in that it consistently yields a power lower than the simulated power. This should not be a concern, however, for relatively large studies. Discussion Budget constraints are a common problem in biomedical research and can hamper the ability of investigators to evaluate a research hypothesis. In those circumstances, pooling blood samples is a good alternative for hypothesis testing. In this paper, we introduced procedures to calculate sample size and power. These can be used in the design of biomarker studies for comparison in the area under the ROC curve under the pooled samples setting. We evaluated the implementation of these formulas in a simulation study. We showed that pooling blood samples greatly increases statistical power. The asymptotic formula seems to be accurate, although it slightly underestimates the power as compared to simulated data. We also provided sample size and power formulas for different settings of pooling strategies (i.e. balanced and unbalanced). A reasonable but untestable assumption about pooling is the additive property

56

Table 1. Effects of pooling samples on power and sample size

nó20

*naive design Aó0.65

h2ó0.5

oó0.50 oó0.75

h2ó1.0

oó0.50 oó0.75

Aó0.75

h2ó0.5

oó0.50 oó0.75

h2ó1.0

oó0.50 oó0.75

Aó0.85

h2ó0.5

oó0.50 oó0.75

h2ó1.0

oó0.50 oó0.75

nó50

pó1*

pó2

pó3

pó4

pó1*

pó2

pó3

pó4

simulation formula simulation formula simulation formula simulation formula

0.0860 0.0822 0.1266 0.1196 0.0786 0.0823 0.1142 0.1211

0.1182 0.1198 0.1878 0.1855 0.1182 0.1192 0.1814 0.1853

0.1860 0.1527 0.2574 0.2415 0.1522 0.1510 0.2628 0.2379

0.1686 0.1823 0.3030 0.2894 0.1920 0.1824 0.3128 0.2894

0.1402 0.1445 0.2350 0.2348 0.1336 0.1448 0.2346 0.2382

0.2228 0.2349 0.4040 0.3899 0.2554 0.2334 0.4414 0.3891

0.3096 0.3140 0.5382 0.5093 0.3278 0.3096 0.5584 0.5015

0.4188 0.3825 0.6518 0.5996 0.4144 0.3746 0.6516 0.5850

simulation formula simulation formula simulation formula simulation formula

0.3776 0.4162 0.6392 0.6406 0.3936 0.4119 0.6506 0.6356

0.6496 0.6459 0.8978 0.8562 0.6518 0.6317 0.9018 0.8401

0.8170 0.7749 0.9682 0.9321 0.8000 0.7548 0.9752 0.9150

0.8906 0.8489 0.9900 0.9631 0.8878 0.8489 0.9944 0.9631

0.7690 0.7879 0.9668 0.9553 0.7604 0.7825 0.9690 0.9530

0.9766 0.9571 0.9994 0.9974 0.9720 0.9512 0.9996 0.9964

0.9960 0.9896 1.0000 0.9997 0.9960 0.9864 1.0000 0.9995

0.9992 0.9969 1.0000 1.0000 0.9992 0.9953 1.0000 0.9999

simulation formula simulation formula simulation formula simulation formula

0.8198 0.8545 0.9752 0.9622 0.8182 0.8462 0.9750 0.9575

0.9782 0.9768 0.9996 0.9977 0.9756 0.9712 0.9998 0.9964

0.9976 0.9949 1.0000 0.9997 0.9970 0.9923 1.0000 0.9994

0.9996 0.9985 1.0000 1.0000 0.9996 0.9985 1.0000 0.9999

0.9970 0.9974 1.0000 1.0000 0.9972 0.9969 1.0000 0.9999

1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

A Liu et al.

A1ó0.6 A2óA

Comparing Diagnostic Accuracy of Biomarkers

57

of blood samples. Future research is needed to evaluate the robustness of these formulas for departure from this assumption. A computer algorithm in S-Plus is available from the authors upon request. Acknowledgements The authors of the paper contributed equally. The paper was completed during the third author’s internship at NICHD in the Summer of 2002. References Bamber, D. C. (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph, Journal of Mathematical Psychology, 12, pp. 387–415. Faraggi, D., Reiser, B. & Schisterman, E. F. (2003) ROC curve analysis for biomarkers based on pooled assessments, Statistics in Medicine, 15, pp. 2515–2527. Hanley, J. A. & McNeil, B. J. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, pp. 29–36. Liu, A. & Schisterman, E. F. (2003) Comparison of diagnostic accuracy of biomarkers with pooled assessments, Biometric Journal, 45, pp. 631–644. Mee, R. W. (1990) Confidence intervals for probabilities and tolerance regions based on a generalization of the Mann-Whitney Statistic, Journal of the American Statistical Association, 85, pp. 793–800. Obuchowski, N. A. (1998) Sample size considerations in studies of test accuracy, Statistical Methods in Medical Research, 7, pp. 371–392. Obuchowski, N. A. & McClish, D. K. (1997) Sample size determination for diagnostic accuracy studies involving binormal ROC curve parameters, Statistics in Medicine, 16, pp. 1529–1542. Rao, C. R. (1973) Linear Statistical Inference and its Applications, 2nd edn (New York: Wiley). Reiser, B. & Faraggi, D. (1997) Confidence intervals for the generalized ROC criterion, Biometrics, 53, pp. 644–652. Reiser, B. & Guttman, I. (1986) Statistical inference for Pr(Y\X): The normal case, Technometrics, 28, pp. 253–257. Su, J. Q. & Liu, J. S. (1993) Linear combinations of multiple diagnostic markers, Journal of the American Statistical Association, 88, pp. 1350–1355. Weinberg, C. R. & Umbach, D. M. (1999) Using pooled exposure assessment to improve efficiency in case-control studies, Biometrics, 55, pp. 718–726. Wieand, S., Gail, M. H., James, B. R. & James, K. L. (1989) A family of non-parametric statistics for comparing diagnostic markers with paired or unpaired data, Biometrika, 76, pp. 585–592. Zou, K. H., Hall, W. J. & Shapiro, D. E. (1997) Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests, Statistics in Medicine, 16, pp. 2143–2156.

58

A Liu et al.

Appendix. Proof of Equation (10) Here we provide a detailed derivation of equation (10). ˆ in equation (6). Apply Taylor’s expansion (first order) at Consider estimate A i (c ,g2) to obtain i i c c 1 ˆ ñA ó A { i cˆ ñc ñ i (gˆ 2ñg2) (A1) i i ˙ g i i 2g2 i i g i i i

 



Note that cˆ and gˆ 2 are independent. It follows that i i c2 c ˆ )ó 1 {2 i var(cˆ )ò i var(gˆ 2) var(A i g2 i 4g4 i g i i i

   

c 1 ó {2 i g2 g i i



(A2)



c2 var(kˆ )òvar(kˆ )ò i (var(pˆ 2 )òvar(pˆ 2 )) iN iD 4g4 iN iD i

Since

 

 

n pˆ 2 1 kˆ ~N k , p2 , i iN~s2 iN iN N iN ni1 p2 iN m pˆ 2 1 i iD~s2 kˆ ~N k , p2 , iD iD M iD mi1 p2 iD we have

 

c ˆ )ó 1 {2 i var(A i g2 g i i



c2 n ñ1 m ñ1 1 1 1 p2 ò p2 ò i p4 ò i p4 N iN M iD 2g4 n2 iN m2 iD i i i



ˆ ). Again utilizing (A1) we have ˆ ,A It remains to find cov(A 1 2 c c c c ˆ ,A ˆ )ó 1 { 1 { 2 cov(A cov(cˆ ,cˆ )ò 1 2 cov(gˆ 2 ,gˆ 2) 1 2 g g 1 2 4g2g2 1 1 g g 1 2 1 2 1 2

       

c c 1 ó { 1 { 2 g g g g 1 2 1 2

cov(kˆ ,kˆ )òcov(kˆ ,kˆ ) 1N 2N 1D 2D

(A3)

 (A4)



c c ò 1 2 (cov(pˆ 2 ,pˆ 2 )òcov(pˆ 2 ,pˆ 2 )) 1N 2N 1D 2D 4g2g2 1 2

Let c and d be defined as in equation (10), that is, there are c individuals jk jk jk that appear both in X and X , and d individuals that appear both in Y and 1j 2k jk 1j Y , respectively. These indexes determine the correlation between two pooled 2k assessments from the two biomarkers. We have

Comparing Diagnostic Accuracy of Biomarkers

59

c d cov(X ,X )ó jk o p p , cov(Y ,Y )ó jk o p p 1j 2k p p N 1N 2N 1j 2k q q D 1D 2D 1j 2k 1j 2k

(A5)

1 1 cov(kˆ ,kˆ )ó p p o , cov(kˆ ,kˆ )ó p p o 1N 2N N 1N 2N N 1D 2D M 1D 2D D

(A6)

yielding

Note that





ni 2 1 ni 1 ; p (X ñk ) pˆ 2 ó ; p (X ñk )2ñ iN n ij ij iN ij ij iN nN i j1 i j1

(A7)

Using the fact that if Z and Z are jointly normally distributed with zero mean, 1 2 unit variances, and correlation coefficient o, then cov(Z ,Z2)ó0, and 1 2 cov(Z2 ,Z2)ó2o2, we obtain 1 2 c2 2p2 p2 jk ñ1 o2 ; cov(pˆ 2 ,pˆ 2 )ó 1N 2N (A8) 1N 2N N n n p p 1 2 jWn1 kWn2 1j 2k









and similarly, d2 2p2 p2 jk ñ1 o2 ; cov(pˆ 2 ,pˆ 2 )ó 1D 2D 1D 2D D m m q q 1 2 jWm1 kWm2 1j 2k

(A9)

ˆ ), which, ˆ ,A Substituting these terms in equation (A4) yields a formula for cov(A 1 2 together with (A3) yields equation (10).