Nonparametric estimation of heterogeneity variance for the

1 downloads 0 Views 168KB Size Report
Given population heterogeneity, we propose a nonparametric moment ... These incur maximum likelihood estimation assuming a normal distribution as hetero- ... equal variances of the response variables in the two groups within each study.
Biometrika (2000), 87, 3, pp. 619–632 © 2000 Biometrika Trust Printed in Great Britain

Nonparametric estimation of heterogeneity variance for the standardised difference used in meta-analysis  HNING B UWE MALZAHN, DANKMAR BO Department of Epidemiology, Institute for Social Medicine, Free University of Berlin, Fabeckstr. 60–62, 14195 Berlin, Germany [email protected]

[email protected]

 HEINZ HOLLING Institute of Psychology, University of Mu¨nster, Fliednerstraße 21, 48149 Mu¨nster, Germany [email protected]

S The standardised difference has become a frequently used measure of the effect of interest in meta-analysis. Given population heterogeneity, we propose a nonparametric moment estimator for the heterogeneity variance in the corresponding random effects model. The advantages of this approach are threefold. First, it recognises that the specific variances for the individual studies themselves are given in the form of estimates. Secondly, the simple structure of a moment estimator leads to numerical, closed-form expressions. Thirdly, the new estimator appears to behave better than other known estimators. Simulation-free, exact comparisons of the new estimator with the Hedges estimator are provided in terms of bias, variance and mean squared error. Furthermore, by means of a simulation study with unbalanced study sizes we compared the new estimator both with the Hedges estimator and the DerSimonian–Laird estimator. Some key words: Moment estimator; Noncentral t-distribution; Population heterogeneity; Random effects model.

1. I In many scientific fields of social science and medical science, the standardised difference has become a frequently used measure of the effect of interest. This might be the effect of a new therapy compared with a standard therapy or the effect of gender in career achievement. Recently, interest has developed in investigating small or vague effects by means of meta-analysis. Various problems arise in meta-analysis, such as comparability of studies in terms of different methods and procedures for collecting and measuring values, the choice of patients or trial participants, definition and scaling of the variables to measure, and so on. These aspects relate and lead inevitably to the investigation of problems of heterogeneity. If we consider the standardised difference as our effect measure, there exists an overall mean for this effect which is the mean m of the distribution G, say. This G distribution is called the heterogeneity distribution, and expresses population heterogeneity. The individual study values h for this effect are random, being drawn from i the distribution G. In this model, the mean m is often ‘the parameter of interest’. The G

620

U. M, D. B   H. H

uncertainty in the h is expressly modelled through G, and is thus included in inferential i statements about m . G As parameter heterogeneity leads to an increase in the variance of the overall estimator, statistical inference has to be adjusted for the additional variance, usually called the heterogeneity variance. Depending on the assumptions of the underlying model, several methods and concepts can be used to estimate the variance of G, the heterogeneity variance t2. These incur maximum likelihood estimation assuming a normal distribution as heterogeneity distribution, as in Hardy & Thompson (1996) or Biggerstaff & Tweedie (1997), refined maximum likelihood methods for models including possible publication bias (Hedges & Vevea, 1996), and Bayesian approaches (Abrams & Sanso, 1998; Larose & Dey, 1997). In this contribution the focus is on a different method. In § 5, a nonparametric estimator for the heterogeneity variance is developed which is based on a moment approach. The properties of this estimator will be investigated in § 6. Some suggestions for bias corrections will also be given. In § 8 the estimator is compared to two conventional estimators, the Hedges estimator and the DerSimonian–Laird estimator, both analytically and through a simulation study. 2. T   For i=1, . . . , k we consider sets of independent random variables X , . . . , X ~N( mX, s2 ), Y , . . . , Y ~N( mY, s2 ) i1 ini i X,Y;i i1 imi i X,Y;i corresponding respectively to the treatment and control groups. Note that we assume equal variances of the response variables in the two groups within each study. The parameter of interest is the standardised difference of the mean values h=(mX−mY)/s . We X,Y assume that for the study specific value h =( mX −mY )/s there is an estimator h@ for i i i X,Y;i i the ith study. Furthermore, we assume that these estimators are calculated in the usual manner: X 9 −Y9 i h@ = i i s i with X W ni X , and analogously for Y9 . 9 i =n−1 i i j=1 ij The variance in the ith study is assumed to be estimated as the pooled sample variance:

A

B

1 {(n −1)s2 +(m −1)s2 }. (1) s2 = i X,i i Y,i i n +m −2 i i Within the context of the random effects model the study specific values h are interpreted i themselves as realisations of a random variable h: h~G, with E (h)=m , var (h)=t2. G G G 3. B      We define q )(n m )/(n +m ) and N )n +m −2. Throughout the relevant literature, i i i i i i i i it is usually supposed that the s2 are known for each study, but for many practical X,Y;i problems this assumption appears problematic (Biggerstaff & Tweedie, 1997). Here, we take s@ 2 =s2 as an estimator and include its variability in the derivation of our estimator. X,Y;i i In the model introduced in § 2, we use the random variables V )(q1/2/s )(X 9 −Y9 ) and i i X,Y;i i i W )(N /s2 )s2 . Then, independently, i i X,Y;i i V ~N(q1/2 h , 1), W ~x2 . (2) i i i i Ni

Meta-analysis

621

It follows from (2) that q1/2 (X V 9 i −Y9 i ) i = i Z= i (W /N )D s i i i is noncentral t-distributed with N degrees of freedom and noncentrality parameter i h q1/2 : Z ~t (h q1/2 ). Consequently, we have that E(Z )=H(N /2)q1/2 h and i i i Ni i i i i i i N N i i + −{H(N /2)}2 q h2 , (3) var(Z )= i i i i N −2 N −2 i i with H(x)=xD[C(x−D)/C(x)]. By analogy with Hedges & Olkin (1985, p. 81), the bias-corrected version of the estimator of the standardised difference is given as

A

B CA

B

D

d ={H(N /2)}−1h@ =Z /{H(N /2)q1/2 }. i i i i i i We have for fixed h in study i that i m (h ))E(d |h )=h , (4) i i i i i N N i i s2 (h ))var(d |h )={H(N /2)}−2 {H(N /2)}−2−1 h2 . (5) + i i i i i i i q (N −2) N −2 i i i For N in the set of positive integers greater than 3 we have that H(N/2)>1 for all N, H(N/2) is monotonically decreasing in N, lim H(N/2)=1, H(N/2)10, N2 and H(N/2)100. Also

CA

1−{H(N/2)}2

A B

B

N−2 8N2−N+2 j >0 N 16N(N−1)2

D

(6)

for all N>0; N=n+m−2 in our application. As already mentioned above, we consider the ‘effect size’ parameter h in turn as heterogeneous, having a distribution G. Recall that t2 is the heterogeneity variance. The model with fixed effects, in which there is a unique true value for the parameter, is included as a special case with t2=0 and represents homogeneity. In the next section, we consider the model introduced under (2)–(5) in more general terms, interpreting the study parameter h for each study i=1, . . . , k as a realisation i of the random variable h~G with density or, in the discrete case, probability mass function g(.).

4. A   We have to distinguish between the conditional distribution of the random variable d given a fixed study-specific parameter value h and the distribution of the theoretical parameter h in the population of study parameters. Thus, the unconditional probability density function for the estimator d={H(N/2)}−1(X 9 −Y9 )/s is given by f (d)= ∆ f (d |h)g(h) dh with the mean value m ) D

P

df (d) dd=m . G

(7)

U. M, D. B   H. H

622

We can decompose the variance of the estimator as var(d)=

P

var(d |h)g(h) dh+

P

{m(h)−m }2g(h) dh= D

P

s2(h)g(h) dh+t2.

(8)

The first part is the mean variance expected to occur within the studies, and the second part is just the heterogeneity variance.

5. N   t2 We would like both to avoid any parametric assumption about G and to be able to estimate a functional, the variance, of this heterogeneity distribution without needing to estimate the distribution itself by taking advantage of (8). From (5), with E (h2)=var (h)+{E (h)}2=t2+m2 , G G G G it follows that

P

s2(h)g(h) dh=

C

D

N N + −1 (t2+m2 ). G {H(N/2)}2q(N−2) {H(N/2)}2(N−2)

(9)

Subsequently we will denote this average conditional variance E {var(d | h)} by n2. G Combining (8) and (9) and rearranging, we obtain t2={N(N/2)}2

A B

C A B

D

1 N−2 N−2 var(d)− − 1− {H(N/2)}2 m2 . G N q N

(10)

Equation (10) will motivate a nonparametric estimator t@ 2. Recall that for each study number i the quantities N and q are known. To estimate the first term of (10), it seems i i reasonable to use a modified version of the usual empirical variance of the study estimators, having regard to the different numbers of degrees of freedom, N . We can estimate i the mean value of the effect estimator in the overall population by m@ =(1/k) W k d or, D i=1 i if given estimates n@ 2 of the study-specific variances for d , the pooled estimator i i m@ =W n@ −2 d /W n@ −2. In the fixed effects model for known study-specific variances, the D i i i pooled mean is the best unbiased linear estimator for the first moment, and should be used if the data indicate at most a small heterogeneity variance. In the case of large heterogeneity, the arithmetic mean should be preferred because the ‘true’ weights within the pooled estimator are poorly estimated by noniterative procedures. Finally in (10) we estimate [1−{(N−2)/N}{H(N/2)}2]m2 by the mean value of the corresponding study G specific realisations. As a result of (4) we have that m =m , and from (6) it follows that G D N −2 i K )1−{H(N /2)}2 >0 (i=1, . . . , k), (11) i i N i

A

B

leading to nonparametric estimation of the heterogeneity variance by t@ 2=

A B 1 k−1

1 k 1 1 k k ∑ (1−K )(d −m@ )2− ∑ − ∑ K d2 . i i D i i k q k i=1 i=1 i i=1

(12)

Meta-analysis

623

If we use the approximation H(N /2)j1, for i=1, . . . , k, an obvious procedure even in i the case of moderately large individual studies, we obtain

A B

k (N −2)(di −m@ D )2 1 k 1 2 k d2i ∑ i − ∑ − ∑ . (13) N k q k N i i=1 i=1 i i=1 i However, unless specified otherwise, our considerations refer to the accurate version (12). Here we note that it is possible to generalise the principle of derivation for the estimator (12). For a specific problem the critical issue is to derive analogues of expressions (4) and (5). We therefore need explicit expressions for the conditional expectation m (h ) and the i i conditional variance s2 (h ) of the study estimators. We can apply our principle provided i i that E{s2(h)} has a structure allowing a separation of t2, as in the case of data for the standardised mortality ratio or proportion data; see Bo¨ hning (1999, § 4.1) and an as yet unpublished report by D. Bo¨ hning and J. Sarol. t@ 2=

1 k−1

6. T   t@ 2 6·1. Mean value Lengthy but elementary calculations yield the mean value of t@ 2 in (12). If we write q )(q , . . . , q )T , N )(N , . . . , N )T , (k) 1 k (k) 1 k then E(t@ 2)=B (N , q )+C (N , q )m2 +D (N , q )t2, (14) k (k) (k) k (k) (k) G k (k) (k) where B , C and D are coefficients depending on k, N and q . Here and in subsequent k k k (k) (k) arguments, we use the relations E(dl )=q−l/2 {H(N /2)}−lE(Zl ) i i i i with E(Zl )=E {E(Zl |h )}, for iµN. In particular, i G i i E {E(Z |h )}=q1/2 H(N /2)E (h )=q1/2 H(N /2)m , G i i i i G i i i G N qN N qN i i i E (h2 )= i i i (t2+m2 ). (15) E {E(Z2 |h )}= + + G i i G i G N −2 N −2 N −2 N −2 i i i i If the explicit form of the coefficients above is known, we know also the structure of the corresponding unbiased estimator tA 2, where

A

B A

B

A

B A

B

1 (t@ 2−B −C m2 ). (16) k k G D k Since m =m we estimate m2 by means of m@ 2 =m@ 2 ={(1/k) W d }2, or by the square of the G D G G D i pooled estimator described in § 5, and obtain the modified proposal tA 2=

1 {t@ 2−B −C ( m@ )2}. (17) k k D D k Unfortunately, B , C and D in (14) are of a complex structure. However, if we take k k k H(N /2) j 1, for i=1, . . . , k, and consider (13), then we obtain i B (N , q )m2 +D E(t@ 2)=BB (N , q )+C B (N , q )t2, k (k) (k) k (k) (k) G k (k) (k) tA 2)

624

U. M, D. B   H. H

with

q A B r A B A Bq A Brq A Br A B q A B r A B A Bq A Brq A Br A B

1 1 N k Ni −2 k 1 k i BB = ∑ −(k−1) , ∑ + ∑ k (k−1)k2 q (k−1)k2 q (N −2) N i i=1 i=1 i i=1 i i 1 k−2 1 1 k N 1 k N −2 i i B = + + C ∑ ∑ k k−1 k k−1 k N −2 k N i i i=1 i=1 N 1 k N −2 2 1 k 1 k k 1 i i − ∑ − +∑ , ∑ −∑ k N k2 2 N −2 N1/2 N i i i=1 i i=1 i=1 i=1 i 1 k−2 1 1 k N 1 k N −2 i i B = + + D ∑ ∑ k k−1 k k−1 k N −2 k N i i i=1 i=1 1 k N i − . (18) ∑ k2 N −2 i i=1 This means that we need essentially to calculate six sums. Exact calculations for simple situations with N ¬N and simulation studies for general, more realistic situations indicate i that the bias of t@ 2 in (12) may be neglected within a broad range of first four moments corresponding to possible heterogeneity distributions G. Therefore, the more expensive, bias-corrected version (17) is not used. Note that any estimator t@ 2 taking on negative values is an inadmissible estimator of t2; see for instance Lehmann & Casella (1998, p. 323). Within the simulations and exact moment calculations in § 8 we therefore use the truncated version of the estimator, t@ 2 )max{0, t@ 2}. tr

(19)

6·2. Note on calculation of the variance Consider the estimators introduced in (12) and (13). First we note that in principle we can calculate the moments of d using (15) for l=1, 2, . . . , provided the corresponding i moments of G exist. For N >4 this leads to i B (N /2)(3q−1 m +m(3) ), E(d3 )=H (20) i i i G G N2 i E(d4 )={H(N /2)}−4 (3q−2 +6q−1 m2 +6q−1 t2+m(4) ), i i i i G i G (N −2)(N −4) i i B (x)={C(x)}2{C(x−3 )}−2(x−3 )−3. We can then where m(l) =E (hl), for lµN, m =m(1) , H G G G G 2 2 calculate var(t@ 2), particularly for unequal N , where exact calculations are very unwieldy. i 7. T H    DS–L  Obvious competitors of our estimator in (12) are those of Hedges (Hedges & Olkin, 1985, Ch. 9) and DerSimonian & Laird (1986). The Hedges estimator is

A B

1 k 1 1 k k ∑ (d −d: )2− ∑ − ∑ K B d2 , i i i k q k i=1 i=1 i i=1 where K B =(8N2 −N +2)/{16N (N −1)2}. i i i i i 1 t@ 2 ) He k−1

(21)

Meta-analysis

625

The non-truncated DerSimonian–Laird estimator is W k n−2 (d −mA )2−(k−1) i D tA 2 ) i=1 i , (22) DS S −(S /S ) 1 2 1 where S =W wl , for l=1, 2, and mA =(W w d )/(W w ), the pooled mean, where w =n−2. D i i i i i l i One normally assumes that the study-specific n2 are known, in which case tA 2 is unbiased. i DS However, our approach, which we think more realistic, is that we have to use estimates, n@ 2 , for the n2 , and to include this fact into the modelling. Thus, for a fair comparison we i i must use n@ 2 in (22). For our problem, s2 (h ))var(d |h ) is given by (5). i i i i i As in § 5, we denote by n2 the mean E {var(d |h )} of the conditional variance with i G i i respect to the heterogeneity distribution; see (9). By s2 we denote the unconditional i variance; see (8). The best unbiased linear estimator for the overall mean m =m is G D obtained with weights w =s−2, and if the conditional variance does not depend on h i i i then s2 (h )¬n2 . If additionally n2 is known, it is usual to substitute n−2 for s−2 in the i i i i i i estimator for m , m@ , say, and the S . We then obtain an estimator t@ 2 and adjusted D D;0 l 0 weights w* =(n2 +t@ 2 )−1 to create m@ . This procedure motivates an iterative algorithm, i i 0 D;1 but we do not pursue this here. We note that neither the formula for the conditional variance nor the formula for the mean of the conditional variance with respect to the heterogeneity distribution is practicable. In data analysis the h and the first two moments of the heterogeneity distribution i are all unknown. An obvious estimator for n2 , see (5) and (9), is i N N i i n@ 2 =H−2 H−2 −1 d2 , (23) + i i q (N −2) i i N −2 i i i where H =H(N /2). i i Thus, our practical version of the non-truncated DerSimonian–Laird estimator is

qA

B

r

W k (n@ 2 )−1(d −m@ )2−(k−1) D i , t@ 2 = i=1 i DS SC −(SC /SC ) 2 1 1

(24)

where

A

k k SC = ∑ (n@ −2 )l, m@ = ∑ n@ −2 d l i D i i i=1 i=1

BNA

B

k ∑ n@ −2 . i i=1

(25)

8. C  8·1. An example: T he eVect of aminophylline on pulmonary function The example refers to data from a meta-analysis on the effect of aminophylline in severe acute asthma based on 13 studies involving spirometry measurements; see Littenberg (1988) and Petitti (1994, Ch. 8). This example falls well within our framework because the spirometry measures reported were not the same measures in each of the studies. The reported mean differences were therefore standardised according to (1); see Table 1, which contains the data relevant to our calculations. The fifth column contains the commonly used rough estimates wA =nA −2 =2(N +1)/(8+d2 ), for the weights, needed to calculate i i i i m@ (p) , the pooled mean of the d ; see Hedges (1982) and Rosenthal & Rubin (1982). The D i sixth column lists the weights w@ =n@ −2, where n@ 2 are given in (23). Thus, nA 2 and n@ 2 are two i i i i i competing estimators of n2 . i

626

U. M, D. B   H. H Table 1. Data from 13 studies on the eVect of aminophylline on pulmonary function (L ittenberg, 1988) Study number 1 2 3 4 5 6 7 8 9 10 11 12 13

n +m i i 20 50 48 24 29 20 23 13 23 51 61 66 40

X −Y i i −0·3268 −12·8000 −0·546 −0·7014 −0·2266 −40·9700 −0·0496 28·6000 6·1530 3·2130 0·3600 0·0201 −0·0116

d i −0·43 −0·04 −0·84 −1·67 −1·03 −2·41 −0·08 0·26 2·93 0·51 0·72 0·03 −0·02

wA

i 4·89 12·50 11·03 4·45 6·40 2·90 5·75 3·22 2·77 12·35 14·32 16·50 10·00

w@

i 4·7103 12·3609 10·8248 4·1346 6·1565 2·5517 5·5841 3·0188 2·4446 12·1817 14·1343 16·3666 9·8610

For these data, it follows that d: =−0·1592, mA (p) =−0·070, m@ (p) =−0·0654, va@ r (d)= D D a 1·6383, vaA r (d)=1·6469, va@ r (d)=1·6478, and as estimates for the heterogeneity variance p p we obtain t@ 2 =1·4544, t@ 2 (w A )=1·4626, t@ 2 (w@ )=1·4635, t@ 2 =1·4403, t@ 2 (v A )=0·6956 and a p p He DS t@ 2 (w@ )=0·6481. Since only the study size margins n +m were available, we made the DS i i assumption n =m for even totals and m =n +1 for odd totals in order to calculate the i i i i q . Here, va@ r (d), respectively vaA r (d) and va@ r (d), denote the estimates for the total varii a p p ance of the effect size using the arithmetic and the pooled mean, the latter with weights wA , or w@ ; and t@ 2 , t@ 2 (wA ) and t@ 2 (w@ ) denote the corresponding values of the estimator (12). i i a p p The large values of t@ 2 , t@ 2 , t@ 2 and t@ 2 relative to the total variance of the d indicate a a p He DS i heterogeneity variance greater than zero. Formal tests will depend on both the model used and the special structure of the estimator; we do not pursue this here. In any case no mistake is made by assuming the random effects model for the standardised difference, which in this case measures the effect of aminophylline on pulmonary function, and estimating the heterogeneity variance. For small sample sizes in particular, the commonly used estimates wA are merely crude estimates for {var(d |h )}−1. The analysis was therefore i i i also conducted by means of the estimates for the study variances given in (23). We consider four methods, all using the pooled mean: Method 1: The estimator (12) for t2 with nA 2 as estimator for n2 . i i Method 2: The DerSimonian–Laird estimator for t2 with nA 2 . i Method 3: The estimator (12) for t2 with n@ 2 from (23) as estimator for n2 . i i Method 4: The DerSimonian–Laird estimator for t2 with n@ 2 . i It is standard (DerSimonian & Laird, 1986; Cooper & Hedges, 1994, pp. 308–18; Cochran, 1983) to use the point estimate for t2 when estimating the overall mean m , that G is m@ (adj) =(W w* d )/(W w* ) with w* =(t@ 2+n2 )−1, if necessary substituting an estimate for D i i i i i n2 . If the study variances n2 are known, this is the weighted least squares estimator of m . i i G The above four methods generate four versions of t@ 2 and m@ (adj). It is also common to D neglect the variability of the estimator w* and to approximate the variance of m@ (adj) by i D va@ r*)(W w* )−1. If we also assume that the estimator m@ (adj) is normally distributed, then i D m@ (adj) ±c (va@ r*)D supplies a 100(1−a)% confidence interval for m =m , where c D 1−(a/2) G D 1−(a/2)

Meta-analysis

627

denotes the {1−(a/2)}-quantile of the standard normal distribution. Simulation calculations show that, in the case of our application, this normality assumption is justifiable. Table 2 summarises results. Table 2. Point estimates and confidence intervals for the aminophylline data m@ (p) D −0·0700 −0·0700 −0·0654 −0·0654

Method 1 2 3 4

m@ (adj) D −0·1596 −0·1575 −0·1600 −0·1572

va@ r*

95% 

t@ 2

0·1254 0·0660 0·1262 0·0630

(−0·8536, +0·5344) (−0·6610, +0·3460) (−0·8562, +0·5362) (−0·6492, +0·3348)

1·4626 0·6956 1·4635 0·6481

Since the estimators sA 2 and s@ 2 lead to very similar estimates for the n2 , in each case the i i i resulting point estimates and confidence intervals for the overall mean are very similar. Returning to Table 1, note that the estimates of t2 vary with the method used. In this case t@ 2 m . @ @ He,tr new He,tr new,tr He,tr new,tr Varying the study size. We chose k=10, M =M =0·5, so that t2=0·25, M =0·25 1 2 3 and M =0·625. This corresponds to the heterogeneity distribution N(0·5, 0·25). We varied 4 the study size by taking n =m ¬n, with n=5, 6, . . . , 10, 12, . . . , 20, 25, . . . , 40, 50, 60, i i 80, 100 or 150. For this scheme, we have the following results; see Fig. 1(a), ( b). (I) We have that |b |, |b |, v and v are monotonically decreasing in n. new He new He (II) The relative improvement obtained by our estimator is most evident for small n, between 5 and 10, that is N between 8 and 18. The differences between t@ 2 and He t@ 2 become smaller, but still discernible, for large n. new

Meta-analysis

629

(III) For all n examined, m >m . He new (IV) Statements (I) and (II) are also valid if we compare the truncated estimators. For all n considered, m /m >1. @ @ He,tr new,tr (a)

(b)

0·18

0·090

Mean squared error

Mean squared error

0·16 0·14 Hedges estimator

0·12

0·10 New proposal 0·08 0·06

0·085

Hedges estimator New proposal

0·080 0·075 0·070 0·065

5

7

9

11

13 15 17

19 21

Size of the groups: n=m

25

30

35

40

50

60

80 100

Size of the groups: n=m

Fig. 1. Exact comparisons: the dependence of the mean squared error on group size n: (a) from n=5 to n=20, (b) from n=25 to n=100.

Varying the number of studies. Note first from (26) that the mean and therefore the bias of our estimator do not depend on k; the same applies to the Hedges estimator. We fixed n =m ¬n=30, and the M and therefore t2 as above. We took k=5, 6, . . . , 10, 12, . . . , 20, i i l 25 or 30. In addition to the general remark above we want to mention the following results. As we increased k from 5 to 6, v and v both increased but then decreased He new monotonically for increasing k. For all k, v >v , but not by much, with v /v within He new He new the range from 1·1667, for k=6, to 1·1845, for k=30. 8·3. A simulation study comparing the new estimator with the Hedges and DerSimonian–L aird estimators Recall that, for real data analysis with estimated weights, both the numerator and the denominator of t@ 2 are stochastic, so that t@ 2 is no longer unbiased. In the model introDS DS duced in §§ 3–4 we can calculate E(d2 )=E {E(d2 |h )} and consequently an explicit fori G i i mula for the bias of n@ 2 : i N N i i Bias(n@ 2 )=H−2 q−1 H−2 −1 i i i N −2 i N −2 i i N 2 i (28) + H−2 −1 ( m2 +t2) G i N −2 i (4N −7)(N −1) (N −1)2 i i i j2 q−1 +4 (m2 +t2) (29) (4N −1)2(N −2) i (4N −1)2(N −2)2 G i i i i for moderate or large N . This expression is clearly positive for N 2, that is for n +m 4, i i i i and increases with m2 and t2. This means that the w@ underestimate the true weights G i although a convexivity argument yields E(w@ −1 )>{E(w@ )}−1. It appears that t@ 2 in (24) i i DS

A B qA B qA B r

r

630

U. M, D. B   H. H

has a negative bias that increases strongly in magnitude with the actual t2. As expected, this bias vanishes in our simulation study if the w@ are replaced by the true w . In our i i simulations we generated a meta-analysis from a set of k=15 fictitious studies. We chose unequal study sizes, in contrast to § 7: N varied from 11 to 64, and the ratio n /m from i i i 11 to 35 . The mean of the heterogeneity distribution was m =0·5. We conducted the study G 29 13 for t2=0·09, t2=0·25, t2=1·0 and t2=4·0, as well as for t2=0·0 to see what happens in the case of homogeneity. The following procedure was iterated B=400 times. For i=1, . . . , k we generated h independently according to N( m , t2), and from each h we i G i generated d according to H(N )q1/2 d ~t (q1/2 h ). Then we calculated the estimates i i i i Ni i i t@ 2 , t@ 2 and t@ 2 from d , . . . , d . From the B iterations we calculated the sample means, new He DS 1 k biases, variances and mean squared errors for the three estimators. In all cases the new estimator had a smaller mean squared error than that of the Hedges estimator although the absolute value of the bias of the Hedges estimator was usually smaller than that of the new estimator. Use of the arithmetic mean of the d led to a i greater absolute value of bias and a smaller variance compared to the versions with the pooled mean. The most interesting aspect was that the DerSimonian–Laird estimator had an extremely small variance compared to the other estimators; the absolute value of the bias was evidently greater, and increased dramatically with t2. The values of t@ 2 and t@ 2 were only slightly influenced by the kind of weights used, new He because the weights enter in the structure of t@ 2 and t@ 2 only via the calculation of the new He pooled mean of the d . Consequently if the arithmetic mean of the d is used they do not i i enter at all. The low empirical variance of t@ 2 is caused mainly by an uncontrollable DS enlargement of the denominator by the application of biased quantities within the structure of the estimator. This also underlies the underestimation of the true heterogeneity variance, that causes the absolute value of the bias to grow rapidly for large t2. To throw some light on this, consider the properties of the w@ =(n@ 2 )−1 from (23). These properties depend i i on N and q as well as on the actual values of m and t2. It appears that the distribution i i G of the w@ is highly skewed, with w@ having a coefficient of variation that is small for small i i N and large t2. i For n@ 2 =(w @ )−1 we can show that i i N N N i i i E(n@ 2 )=H−4 H−2 −1 ( m2 +t2), (30) q−1 +H−2 i i (N −2)2 i i G i N −2 N −1 i i i N 2 N2 i i H−2 −1 var(n@ 2 )=H−4 i i i (N −2)2 N −2 i i N −1 N −2 i i q−1 (q−1 +m2 +t2)+ m(4) −( m2 +t2)2 , (31) × 2 i G G N −4 i N −4 G i i where m(4) =E (h4). We conclude that the denominator of the DerSimonian–Laird estiG G mator is biased, and the bias strongly depends on t2.

qA

q A B

A

Bq A B r B r A B

r

9. D The calculations in § 7, which indicate a slight but uniform superiority of our estimator over the Hedges estimator, were done for simplicity on the basis of n =m ¬n, to overcome i i the analytical complexity. There is no doubt that this is not a realistic assumption if we

Meta-analysis

631

have in mind practical problems of meta-analysis. Nevertheless, we believe that the results offer good support for the new estimator, more generally. For the case of different study sizes n , m for i=1, . . . , k, our estimator is designed to reflect the theoretical structure of i i the expression for the heterogeneity variance through natural estimators. Our explanation for the bad behaviour of the DerSimonian–Laird estimator for large t2 is as follows. In our calculations for t2 and for the pooled mean, we plugged in DS estimated weights w@ =n@ −2, see (23), for the true weights w =n−2 from (9). When we i i i i repeated all relevant calculations using the weights w (h )={s2 (h )}−1, see (3), the biasi i i i ing effect was weaker; for t2=0·09, 0·25 and 1·0 it was reduced to 1 , 1 and 3 , compared 3 2 4 with the method using w@ . When we used the true weights w the biasing effect vanished. i i Thus, compared to the new estimator and the Hedges estimator, it appears that, under the model assumptions of §§ 2–3, the DerSimonian–Laird estimator is very sensitive to the weights and therefore the estimated study-specific variances. From (28)–(31) we require for reliability of t@ 2 that the N be generally large and both t2 and DS i var (h2))m(4) −(m2 +t2)2 be not too large. Under these circumstances, we believe that G G G the DerSimonian–Laird estimator can be recommended because it is approximately unbiased, has a small variance and thus is hard to improve. However, we believe that it should otherwise be used only with the utmost caution.

A This research was done under support of the German Federal Ministry for Education and Science, Research and Technology. The authors are very grateful to the editor and the reviewers for their valuable comments.

R A, K. & S, B. (1998). Approximate Bayesian inference for random effects meta-analysis. Statist. Med. 17, 201–19. B, B. J. & T, R. L. (1997). Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis. Statist. Med. 16, 753–68. B, D. (1999). Computer-Assisted Analysis of Mixtures and Applications. Boca Raton: Chapman and Hall/CRC. C, W. G. (1983). Adjustments in analysis. In Planning and Analysis of Observational Studies, Ed. L. E. Moses and F. Mosteller, pp. 102–8. New York: Wiley. C, H. & H, L. V. (1994). T he Handbook of Research Synthesis. New York: Russell Sage Foundation. DS, R. & L, N. (1986). Meta-analysis in clinical trials. Contr. Clin. T rials 7, 177–88. H, R. J. & T, S. G. (1996). A likelihood approach to meta-analysis with random effects. Statist. Med. 15, 619–31. H, L. V. (1982). Estimation of effect size from a series of independent experiments. Psychol. Bull. 92, 490–9. H, L. V. & O, I. (1985). Statistical Methods for Meta-Analysis. San Diego, CA: Academic Press. H, L. V. & V, J. L. (1996). Estimating effect size under publication bias. J. Educ. Behav. Statist. 21, 299–333. J, N. L. & K, S. (1994). Continuous Univariate Distributions. New York: Wiley. L, D. T. & D, D. K. (1997). Grouped random effects models for Bayesian meta-analysis. Statist. Med. 16, 1817–29. L, E. L. & C, G. (1998). T heory of Point Estimation. New York: Springer-Verlag. L, B. (1988). Aminophylline treatment in severe, acute asthma: A meta-analysis. J. Am. Med. Assoc. 259, 1678–84.

632

U. M, D. B   H. H

P, D. P. (1994). Meta-Analysis, Decision Analysis and Cost-EVectiveness Analysis. Oxford: Oxford University Press. R, R. & R, D. B. (1982). Further meta-analysis procedures for assessing cognitive gender differences. J. Educ. Behav. Statist. 74, 708–12.

[Received June 1998. Revised December 1999]