Confidence intervals for the overlapping coefficient: the normal equal ...

1 downloads 0 Views 124KB Size Report
distributions with equal variances. We review and compare two procedures for this confidence interval based on the non-central t- and F-distributions.
The Statistician (1999) 48, Part 3, pp. 413±418

Con®dence intervals for the overlapping coef®cient: the normal equal variance case Benjamin Reiser and David Faraggi University of Haifa, Israel [Received May 1997. Revised March 1999] Summary. The overlapping coef®cient, de®ned as the common area under two probability density curves, is used as a measure of agreement between two distributions. It has recently been proposed as a measure of bioequivalence under the name proportion of similar responses. Con®dence intervals for this measure have been considered for the special case of two normal distributions with equal variances. We review and compare two procedures for this con®dence interval based on the non-central t- and F -distributions. Our comparison is based on both theoretical considerations and a simulation study. Data on a marker from a study of recurrence of breast cancer are used to illustrate the methodology. Keywords: Measure of similarity; Non-central F ; Non-central t; Proportion of similar responses

1. Introduction Recently Rom and Hwang (1996) addressed the problem of testing the equivalence of two populations on the basis of the proportion of similar responses (PSR). They concentrated on the common area of two normal distributions with equal variance and referred to this area as the PSR. They suggested statistical inference procedures on PSR based on the non-central t-distribution. In examining the common area of two distributions as a measure of their similarity, Rom and Hwang (1996) have rediscovered a concept which has a long history (see Bradley (1985) and Inman and Bradley (1989)). In general this area, which is usually called the overlapping coef®cient OVL, can be written as …1 minf f 1 (X ), f 2 (X )g dX (1) ÿ1

where f 1 (X ) and f 2 (X ) are the probability density functions of the two distributions. OVL provides a useful and important measure of agreement between two distributions, e.g. in bioequivalence problems (Rom and Hwang, 1996) or in measuring the effectiveness of cloud seeding (Inman and Bradley, 1994). OVL ranges from 0 to 1 with 1 indicating that the two distributions are identical whereas a value of 0 shows that there is no overlap. Inman and Bradley (1994) assumed normal densities with common variances and provided con®dence intervals for OVL based on the non-central F-distribution. They suggested an approximation using the noncentral t-distribution, which is the same as the Rom and Hwang (1996) approach, and further simpli®ed it by using the normal distribution as an approximation to the non-central t-distribution. They provided simulation results for con®dence intervals on OVL based on both the non-central t and the normal approximation. Address for correspondence: Benjamin Reiser, Department of Statistics, University of Haifa, Haifa 31999, Israel. E-mail: [email protected] & 1999 Royal Statistical Society

0039±0526/99/48413

414

B. Reiser and D. Faraggi

Inman and Bradley (1994) further pointed out that OVL is invariant under one-to-one differentiable transformations of the random variable. Consequently non-normal data can often be handled by a suitable normalizing transformation. In this paper we investigate further the properties of con®dence intervals for OVL and compare the non-central t- and F-approaches. We show that both these approaches are not exact and discuss their properties. We provide both simulation and theoretical results to show that con®dence intervals that are based on the non-central F-distribution perform very well even for small sample sizes, except for the special case where the two distributions are exactly the same. We conclude with an illustration of the use of OVL as a measure of the similarity between the distributions of an early stage breast cancer marker for patients with and without recurrence. 2. Con®dence intervals for the overlapping coef®cient Let X 1i  N ( ì1 , ó 2 ) and X 2 j  N ( ì2 , ó 2 ) be independent normal random variables, i ˆ 1, . . ., n1 , j ˆ 1, . . ., n2 . For example, X 1i are measurements of a marker on a disease group and X 2i are the measurements of the same marker on the control group. It is easy to show that OVL ˆ 2 Ö(ÿjäj=2)

(2)

where ä ˆ (ì1 ÿ ì2 )=ó .p Let t ˆ (X 1 ÿ X 2 )=S (1=n1 ‡ 1=n2 ) where ( ) n1 n2 P P 2 2 2 (X 1i ÿ X 1 ) ‡ (X 2 j ÿ X 2 ) S ˆ (n1 ‡ n2 ÿ 2) iˆ1

jˆ1

so that t  t n1 ‡ n2 ÿ2 (ç) where ç ˆ ä=í is the non-centrality parameter and p í ˆ f(n1 ‡ n2 )=n1 n2 g: By solving numerically the equations probft m‡ nÿ2 (ç) < tg ˆ 1 ÿ á=2,

(3)

probft m‡ nÿ2 (ç) < tg ˆ á=2

(4)

for ç and ç respectively, the resulting interval (ç, ç) provides an exact 1 ÿ á con®dence interval for ç and consequently for ä. However, since OVL is not a monotone function of ä but rather of jäj we approximate the con®dence interval for jäj and hence for OVL by considering the following three cases: (a) case I, 0 , ç , ç, the 1 ÿ á con®dence interval for jäj is í(ç, ç); (b) case II, ç , ç , 0, the 1 ÿ á con®dence interval for jäj is í(jçj, jçj); (c) case III, ç , 0 , ç, the 1 ÿ á con®dence interval for jäj is íf0, max(jçj, ç)g. This is the approach followed by Rom and Hwang (1996), who did not note that this procedure is an approximation. Inman and Bradley (1994) did point out that this is an approximation and examined its coverage properties by simulation. They further compared it with an approximation of the non-central t-distribution by the normal distribution. Their simulation study indicates that the two-sided con®dence intervals based on the non-central t-distribution have a coverage that is close to the nominal 1 ÿ á value for jäj ˆ 0 and jäj > 1, even for sample sizes as small as 25. For moderate values of jäj the coverage is conservative for small sample sizes and can be larger than 97% when the nominal coverage probability is 95%. Their simulations further indicate that their normal distribution based approximation performs as well as (and almost identically with) the

Con®dence Intervals for the Overlapping Coef®cient

415

non-central t-distribution based method and consequently may be preferred for simplicity of computation. Alternatively t 2  F1, n1 ‡ n2 ÿ2 (ç2 ); by solving numerically the equations probfF1, m‡ nÿ2 (ç2 ) < t 2 g ˆ 1 ÿ á=2,

(5)

(6) probfF1, m‡ nÿ2 (ç2 ) < t 2 g ˆ á=2 p p for ç2 and ç2 respectively, the resulting interval í( ç2 , ç2 ) provides a 1 ÿ á con®dence interval for jäj. It is important to note that this con®dence interval is not an exact con®dence interval either even though Inman and Bradley (1994) referred to it as such. If probfF1, m‡ nÿ2 (0) < t 2 g

(7)

is less than 1 ÿ á=2 or á=2 then there is no solution for equation (5) or (6) respectively and the bound ç2 or ç2 is taken to be 0. As a consequence of this constraint the resulting con®dence interval is not exact. To examine the non-central F-approach further and to compare it with the interval obtained from the non-central t-distribution we carried out an extensive simulation study described in the next section. It should be noted that the inversion of the non-central t- and F-distributions to obtain con®dence intervals for the non-centrality parameters has appeared in various contexts in the literature. See for example Reiser and Guttman (1986), Reiser and Faraggi (1994, 1997) and Lam (1987).

3. Simulation study An extensive simulation study was carried out for ä ˆ 0, 0.0001, 0.125, 0.25, 0.5, 1, á ˆ 0:05, 0.10, m ˆ n ˆ 10, 25, 50, 75. 2000 simulations were carried out for every combination of the parameter values given above for both the non-central t- and the F-methods. The observed percentage of cases in which the con®dence intervals contained the true value of OVL was noted and is denoted by CP in Tables 1 and 2. Negative values of ä were not considered owing to the symmetry of the problem with respect to the sign of ä. The computations were carried out in GAUSS (Aptech Systems, 1994). Inman and Bradley (1994) reported only on the simulated coverage of the two-sided intervals. Tables 1 and 2 give in addition the proportion of cases falling below and above the lower and upper con®dence bounds respectively. We denote these proportions by LT and RT respectively. Further, the average length of the simulated con®dence intervals is given in the column headed AL. A plus sign is placed next to each estimated probability whose 95% con®dence interval (based on a binomial sample of 2000 simulated data sets) does not include the targeted nominal value. For example in Table 1 a plus sign indicates that a 95% con®dence interval did not include the nominal value 0.95 for CP and 0.025 for both LT and RT. The simulation results for the coverage of the two-sided con®dence intervals based on the noncentral t-distribution are very similar to those reported by Inman and Bradley (1994). For both ä close to 0 and ä large the simulated coverages are close to their nominal values. For moderate ä they are signi®cantly larger than their nominal values, indicating the conservative nature of this method for these cases. From both Table 1 and Table 2 we see the undesirable lack of symmetry in the proportion of cases falling on each side of the con®dence intervals for ä small and moderate. In fact, we see that frequently all the cases falling outside the interval are on the left-hand side. These observations

416

B. Reiser and D. Faraggi

Table 1. 2000 simulated results{ m

10 25 50 75 10 25 50 75 10 25 50 75 10 25 50 75 10 25 50 75 10 25 50

n

10 25 50 75 10 25 50 75 10 25 50 75 10 25 50 75 10 25 50 75 10 25 50

ä

0 0 0 0 0.00001 0.00001 0.00001 0.00001 0.125 0.125 0.125 0.125 0.25 0.25 0.25 0.25 0.5 0.5 0.5 0.5 1 1 1

Results for the non-central t-distribution

Results for the non-central F-distribution

CP

LT

AL

CP

LT

0.962‡ 0.950 0.953 0.956 0.955 0.948 0.948 0.961‡ 0.974‡ 0.981‡ 0.982‡ 0.975‡ 0.985‡ 0.983‡ 0.978‡ 0.966‡ 0.979‡ 0.980‡ 0.952 0.953 0.955 0.946 0.949

0.038‡ 0.050‡ 0.047‡ 0.045‡ 0.045‡ 0.052‡ 0.052‡ 0.039‡ 0.026 0.019 0.018 0.025 0.015 0.017 0.022 0.034 0.021 0.020 0.028 0.020 0.024 0.029 0.023

1.24 0.769 0.541 0.449 1.23 0.782 0.549 0.447 1.26 0.795 0.579 0.478 1.30 0.842 0.642 0.543 1.43 0.999 0.762 0.640 1.76 1.18 0.834

0.980‡ 0.972‡ 0.970‡ 0.983‡ 0.953 0.956 0.940 0.955 0.956 0.958 0.955 0.955 0.962‡ 0.953 0.951 0.943 0.954 0.956 0.951 0.953 0.948 0.946 0.949

0.022 0.028 0.030 0.017 0.022 0.024 0.025 0.023 0.019 0.017 0.017 0.025 0.015 0.017 0.022 0.034 0.021 0.020 0.028 0.025 0.024 0.029 0.023

RT 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0.020 0.022 0.021 0.025 0.028

RT 0‡ 0‡ 0‡ 0‡ 0.025 0.020 0.035 0.022 0.025 0.025 0.028 0.020 0.023 0.030 0.027 0.023 0.024 0.024 0.021 0.022 0.028 0.025 0.028

AL 1.01 0.880 0.617 0.426 1.17 0.744 0.515 0.423 1.20 0.757 0.555 0.462 1.24 0.821 0.637 0.539 1.40 0.997 0.767 0.643 1.77 1.18 0.834

{1 ÿ á ˆ 0:95.

can be explained theoretically. Denoting by t1ÿá=2 n‡ mÿ2 the 1 ÿ á=2 percentile point of the central t n‡ mÿ2 -distribution, it is clear that case III occurs if and only if 1ÿá=2 ÿt1ÿá=2 n‡ mÿ2 < t < t n‡ mÿ2 :

(8)

Consequently, if ä ˆ 0, case III occurs with probability 1 ÿ á. Whenever case III occurs the resulting con®dence interval includes 0, the true value of ä. For both case I and case II the lower bound of the con®dence interval must be positive and thus the true value (ä ˆ 0) falls on the left of the interval. The combined probability that case I and case II occur is á. Thus we see that for ä ˆ 0 the non-central t-distribution based method must give the correct two-sided coverage of 1 ÿ á which, however, is entirely one sided. For jäj large, the probability of obtaining condition (8) is very small and approaches 0 with increasing jäj. We then observe either case I or II depending on the sign of ä. Solving equations (3) and (4) guarantees both the correct coverage and symmetry for intervals on ä and consequently on jäj since for cases I and II the transformation from ä to jäj is monotone. For jäj moderate, case III will occur with a probability which is not negligible but less than 1 ÿ á. Consequently since for case III the transformation from ä to jäj is not monotone the coverage of the non-central tprocedure will be conservative for moderate jäj. The con®dence interval based on the non-central F-distribution gives simulated coverages that are close to their nominal values except for the case ä ˆ 0. For this case the coverage is conservative and is in fact 1 ÿ á=2 instead of 1 ÿ á. This interval is asymmetric with all the cases that are outside the interval falling below the lower bound. According to expression (7) the lower

Con®dence Intervals for the Overlapping Coef®cient

417

Table 2. 2000 simulated results{ m

10 25 50 75 10 25 50 75 10 25 50 75 10 25 50 75 10 25 50 75 10 25 50

n

10 25 50 75 10 25 50 75 10 25 50 75 10 25 50 75 10 25 50 75 10 25 50

ä

0 0 0 0 0.00001 0.00001 0.00001 0.00001 0.125 0.125 0.125 0.125 0.25 0.25 0.25 0.25 0.5 0.5 0.5 0.5 1 1 1

Results for the non-central t-distribution

Results for the non-central F-distribution

CP

LT

RT

AL

CP

LT

RT

AL

0.911‡ 0.910 0.914‡ 0.906 0.908 0.899 0.896 0.905 0.946‡ 0.953‡ 0.945‡ 0.957‡ 0.959‡ 0.955‡ 0.952‡ 0.944‡ 0.964‡ 0.933‡ 0.899 0.899 0.901 0.889 0.898

0.089‡ 0.090‡ 0.086‡ 0.094‡ 0.092‡ 0.101‡ 0.104‡ 0.095‡ 0.054 0.047 0.055 0.043 0.041 0.045 0.048 0.056 0.036‡ 0.047 0.056 0.050 0.043 0.060 0.048

0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0‡ 0.020‡ 0.045 0.051 0.056 0.051 0.054

1.09 0.673 0.474 0.394 1.08 0.686 0.481 0.392 1.11 0.696 0.507 0.418 1.14 0.737 0.560 0.472 1.25 0.864 0.650 0.541 1.51 0.992 0.700

0.961‡ 0.950‡ 0.953‡ 0.955‡ 0.901 0.902 0.886‡ 0.918‡ 0.911‡ 0.912‡ 0.898 0.907 0.917‡ 0.895 0.892 0.894 0.913‡ 0.907 0.898 0.899 0.899 0.889 0.897

0.039‡ 0.050 0.047 0.045 0.046 0.052 0.052 0.040 0.040 0.040 0.052 0.043 0.038‡ 0.045 0.048 0.056 0.037‡ 0.047 0.056 0.050 0.043 0.060 0.048

0‡ 0‡ 0‡ 0‡ 0.053 0.043 0.062‡ 0.042 0.049 0.048 0.050 0.050 0.045 0.060 0.059 0.049 0.050 0.046 0.046 0.051 0.058 0.051 0.055

1.01 0.769 0.547 0.365 0.997 0.639 0.441 0.363 1.03 0.651 0.479 0.398 1.07 0.714 0.555 0.467 1.21 0.862 0.655 0.543 1.52 0.995 0.700

{1 ÿ á ˆ 0:90:

1ÿá=2 con®dence bound for ä will be 0 only if t 2 < F 1ÿá=2 1, n‡ mÿ2 where F 1, n‡ mÿ2 denotes the 1 ÿ á=2 percentile point of the central F1, n‡ mÿ2 -distribution. For ä ˆ 0 this occurs with probability 1 ÿ á=2, giving an interval of the form [0, a], a > 0. Thus ä ˆ 0 will now be within the con®dence interval. For t 2 . F 1ÿá=2 1, n‡ mÿ2 the lower bound will be positive and hence, with probability á=2, ä ˆ 0 falls below the lower bound. For jäj large, expression (7) occurs with decreasing probability. Solving equations (5) and (6) thus guarantees intervals with the correct coverage, both overall and in each tail symmetrically. Furthermore because of the connection between the non-central t- and F-distributions the intervals from the two methods will tend to be identical as jäj increases. It is interesting that, when ä is very close but not equal to 0, the non-central F-distribution based intervals have very different properties from those obtained when ä ˆ 0. For ä ˆ 0‡ equations (5), (6) and (7) imply that the con®dence interval will be of the form [0, 0], [0, b], b . 0, and [c, d], c, d . 0, with probability á=2, 1 ÿ á and á=2 respectively. The true value of ä ˆ 0‡ will always fall outside the ®rst interval on the right, tend to fall inside the second interval and outside the third interval to the left respectively. Hence the overall coverage will be of about 1 ÿ á with equal `out-of-interval' probabilities for both left and right. This is in contrast with the non-central t-distribution based intervals which perform very similarly for ä ˆ 0 and ä small. These comments provide a theoretical explanation of the observed simulation results. The average length AL of the intervals behaves as expected, decreasing with increasing sample size and showing near equality for the non-central t- and F-distribution based procedures when ä is large and consequently the con®dence intervals are almost identical.

418

B. Reiser and D. Faraggi

4. An example Over 700 patients with early stage node negative breast cancer were diagnosed between the years 1972 and 1986 at two institutions in the USA and were used to evaluate prognostic markers that can predict the recurrence of the tumours (Taube et al., 1996). Only one of the evaluated markers, Ki67, was measured on a continuous scale (per cent staining). We shall use these measurements to examine OVL between the distribution of patients in which there were no recurrences (580 cases) and the distribution of the patients in which there were recurrences (131 cases). Since this marker is measured as a percentage its distribution was not expected to be normal. In fact we found that the data were highly skewed for both the recurrence and the non-recurrence populations. Consequently the standard inverse sine square-root transformation was used to normalize the data. A graphical analysis indicated that the normal assumption is more reasonable for the transformed data in both groups. In the transformed scale the estimated mean and standard deviation are 0.4333 and 0.2027 for the non-recurrence group. For the recurrence group they are 0.4992 and 0.1986. Clearly the equal variance assumption is reasonable and the resulting pooled estimate of the standard deviation is 0.2019. The OVL estimate is 0.8704. Solving equations (5) and (6) we obtain the 95% con®dence interval for OVL to be (0.7962, 0.9458). Since this con®dence interval does not include the value 1 the marker distributions for the two groups should not be considered to be identical. However, the large lower bound for OVL indicates substantial similarity between the two distributions. 5. Discussion We have investigated the performance of con®dence intervals for OVL based on both the noncentral F- and t-distributions. The intervals that are based on the non-central F-distribution perform well even for small sample sizes and all ä except for ä exactly 0 (which does not really occur in practice). Since the non-central t-distribution method can be quite conservative and asymmetric for moderate ä we recommend the non-central F-method. For large ä both the noncentral t- and F-methods give equivalent results. Furthermore Inman and Bradley (1994) showed that a normal approximation to the non-central t-distribution gave excellent results and performed almost identically with the non-central t-method. Since it is easier to compute the con®dence intervals based on the normal approximation this approximation should be used for large ä. References Aptech Systems (1994) Nonlinear Equation Gauss Applications. Maple Valley: Aptech Systems. Bradley, E. L. (1985) Overlapping coef®cient. In Encyclopedia of Statistical Sciences (eds S. Kotz and N. L. Johnson). New York: Wiley. Inman, H. F. and Bradley, E. L. (1989) The overlapping coef®cient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities. Communs Statist. Theory Meth., 18, 3851±3874. Ð (1994) Hypothesis tests and con®dence interval estimates for the overlap of two normal distributions with equal variances. Envirometrics, 5, 167±189. Lam, Y. M. (1987) Con®dence limits for non-centrality parameters of non-central chi-squared and F distributions. Proc. Statist. Comput. Sect. Am. Statist. Ass., 441±443. Reiser, B. and Faraggi, D. (1994) Con®dence bounds for Pr(a9X . b9Y). Statistics, 25, 107±111. Ð (1997) Con®dence intervals for the generalized ROC criterion. Biometrics, 53, 644±652. Reiser, B. and Guttman, I. (1986) Statistical inference for Pr(Y , X): the normal case. Technometrics, 28, 253±257. Rom, D. M. and Hwang, E. (1996) Testing for individual and population equivalence based on the proportion of similar response. Statist. Med., 15, 1489±1505. Taube, S., Faraggi, D., Bledsoe, M., Bennington, J., Dooley, W., Kuhajda, F. and Waldman, F. (1996) Which markers predict disease progression in node-negative breast cancer patientsÐa retrospective study of 700 patients. 21st Meet. International Association for Breast Cancer Research, Paris.