Power and Sample Size Calculations for Studies ... - Semantic Scholar

20 downloads 0 Views 214KB Size Report
to detect a regression slope of a given magnitude or to studies that test whether the ... Cohen's method of power calculations for multiple linear regression.
Power and Sample Size Calculations for Studies Involving Linear Regression William D. Dupont, PhD and Walton D. Plummer, Jr., BS Department of Preventive Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee

ABSTRACT: This article presents methods for sample size and power calculations for studies involving linear regression. These approaches are applicable to clinical trials designed to detect a regression slope of a given magnitude or to studies that test whether the slopes or intercepts of two independent regression lines differ by a given amount. The investigator may either specify the values of the independent (x) variable(s) of the regression line(s) or determine them observationally when the study is performed. In the latter case, the investigator must estimate the standard deviation(s) of the independent variable(s). This study gives examples using this method for both experimental and observational study designs. Cohen’s method of power calculations for multiple linear regression models is also discussed and contrasted with the methods of this study. We have posted a computer program to perform these and other sample size calculations on the Internet (see http://www.mc.vanderbilt.edu/prevmed/psintro.htm). This program can determine the sample size needed to detect a specified alternative hypothesis with the required power, the power with which a specific alternative hypothesis can be detected with a given sample size, or the specific alternative hypotheses that can be detected with a given power and sample size. Context-specific help messages available on request make the use of this software largely self-explanatory. Controlled Clin Trials 1998;19:589– 601  Elsevier Science Inc. 1998 KEY WORDS: Statistics, regression analysis, linear models, power calculations, sample size calculations, linear regression

INTRODUCTION Clinical investigators sometimes wish to evaluate a continuous response measure in a cohort of patients randomized to one of several groups defined by increasing levels of some treatment. In performing sample size and power calculations for such studies, one reasonable approach models patient response as a linear function of dose, and poses power calculations in terms of detecting dose-response slopes of a given magnitude. Alternately, we may wish to evaluate the dose-response curves of two different treatments and test whether slopes of these curves differ. This article provides an easily used, accurate method for power and sample size calculations for such studies. We have Address reprint requests to: William D. Dupont, PhD, Department of Preventive Medicine, Vanderbilt University School of Medicine, A-1124 Medical Center North, Nashville, TN 37232-2637. Received 20 June 1996; accepted 2 June 1998. Controlled Clinical Trials 19:589–601 (1998)  Elsevier Science Inc. 1998 655 Avenue of the Americas, New York, NY 10010

0197-2456/98/$19.00 PII S0197-2456(98)00037-3

590

W.D. Dupont and W.D. Plummer, Jr.

posted an interactive self-documented program to perform these calculations on the Internet. Other investigators have reviewed general methods for sample size and power calculations [1–3]. Hintze [4] provided a method for designing studies to detect correlation coefficients of specified magnitudes that uses a computational algorithm of Guenther [5]. This method provides results that are perhaps less easily understood than those based on regression slope parameters, because many investigators can more readily interpret slopes than correlation coefficients. Kraemer and Thiemann [3] provide tables that permit exact sample size calculations for studies designed to detect correlation coefficients of a given magnitude. They also give formulas that permit using these tables for designs involving linear regression. Although accurate, these methods are less convenient than those that we have incorporated into an interactive computer program. Cohen [2] provided more complex methods for designs involving multiple linear regression and correlation analysis. Later in this study we describe these methods, which require expressing the alternative hypothesis in terms of their effect on the multiple correlation coefficient [6]. Hintze [4] has written software for deriving these calculations, but clinical investigators may find his methods somewhat difficult to use and interpret. Goldstein [7] and Iwane et al. [8] have reviewed other power and sample size software packages. Simple Linear Regression We study the effect of one variable on another by estimating the slope of the regression line between these variables. For example, we might compare the effects of a treatment at several dose levels. Suppose that we treat n patients, that the jth patient has response yj after receiving dose level xj, and that the expected value of yj given xj is g 1 lxj. To test the null hypothesis that l 5 0 against a two-sided alternative hypothesis with type I error probability a, we must be able to answer the following three questions: 1. How many patients must we study to detect a specific alternative hypothesis l 5 la with power 1 2 b? 2. With what power can we detect a specific alternative hypothesis l 5 la given observations on n study subjects? 3. What alternative values of la can we detect with power 1 2 b if we study n patients? Either observational or experimental studies may use this design. In the former, both {xj} and {yj} are attributes of the study subjects, and we intend to determine whether these two variables are correlated. In these studies, the investigator must also estimate sx, the predicted standard deviation of xj in the patients under study. In experiments, the investigator determines the values of {xj}. Typically, xj denotes a drug dose given at one of K distinct values w1, . . . , wK, with a proportion ck of the study subjects being assigned dose level wk. The degree of dispersion of the response values about the regression line affects power and sample size calculations. A parameter that quantifies this dispersion is s, the standard deviation of the regression errors. The regression error for the jth observation is the difference between the observed and expected

Power Calculations for Linear Regression

591

Figure 1 In simple linear regression we obtain n pairs of observations {xj, yj}. We assume that the expected value of the response yj is given by the linear equation E(yj) 5 g 1 lxj. The jth regression error is the vertical distance between the observed response yj and its expected value g 1 lxj.

response value for the jth subject. In other words, the regression error is the vertical distance between the observed response yj and the true regression line (see Fig. 1); s is the standard deviation of these vertical distances. The values of s, sx, sy, l, and the correlation coefficient r are all interrelated. It is well known [6] that: l 5 rsy/sx

(1)

and it is easily shown that: s 5 sy√1 2 r2 5 lsx√1/r2 2 1 5 √s2y 2 l2s2x

(2)

Thus, when r 5 1, the observations {xj} and {yj} are perfectly correlated and lie on a straight line with slope sy/sx; the regression errors are all zero because the observed and expected responses are always equal) and hence s 5 0. When r 5 0, xj and yj are uncorrelated, the expected regression line is flat (l 5 0), and the standard deviation of the regression errors equals the standard deviation of yj (i.e., s 5 sy). Figure 2 illustrates the relationship between these parameters when 0 , r , 1. This figure shows simulated data for patients given treatments A and B under the assumption that the two treatments have identical means and standard deviations of the independent and response variables. They differ in that the correlation coefficient between response and independent variables is 0.9 for treatment A (black dots) and 0.6 for treatment B (open circles). Consequently, the response to treatment A are more closely clustered around their (black) regression line than the response to treatment B (gray). Thus, the average regression error is less for treatment A than for treatment B and, hence, s, the standard deviation of these errors, is less for treatment A than for treatment

592

W.D. Dupont and W.D. Plummer, Jr.

Figure 2 This figure illustrates the relationship in simple linear regression between r, the correlation coefficient between the independent and response variables; the regression errors (see Fig. 1); and s, the standard deviation of the regression errors. Higher values of r imply smaller regression errors, which, in turn, imply smaller values of s (see text).

B. Power or sample size calculations require estimates of sx, l, and s. It is often difficult to estimate s directly; however, we can obtain indirect estimates of s using equation (2) whenever we are able to estimate r or sy. We derive power and sample size formulas for simple linear regression in the Appendix. Contrasting Two Linear Regression Lines Suppose that we want to compare the slopes and intercepts of two independent regression lines. For example, we might wish to compare the effects of two different treatments at several dose levels. Suppose that treatments 1 and 2 are given to n1 and n2 patients, respectively, and that the jth subject who receives treatment i (i 5 1 or 2) has response yij to treatment at dose level xij, where the expected value of yij is gi 1 lixij. We want to determine whether the response to the treatments differ. Specifically, we intend to test the null hypotheses that g1 5 g2 and l1 5 l2. In this case, we must answer the three questions given earlier for alternative hypotheses concerning the magnitude of the differences in the y intercept and slope parameters for these two treatments. We derive power and sample size formulas for two treatment linear regression problems in the Appendix. COMPUTER SOFTWARE We have written a computer program to implement these and other sample size and power calculations [1] and have posted it, together with program

Power Calculations for Linear Regression

593

documentation, on the Internet. The program runs under either Windows 95 or Windows NT operating systems. To obtain free copies open the http:// www.mc.vanderbilt.edu/prevmed/psintro.htm page on the World Wide Web and follow instructions. The program, named PS, has a graphical user interface with hypertext help messages that make the use of the program largely selfexplanatory. It can answer the three questions given in the Introduction for each study design considered by this software. It can also generate graphs of sample size versus power, sample size versus detectable alternative hypotheses, or power versus detectable alternative hypotheses. It is written in Visual Basic [9] and Fortran 90 [10] and uses the First Impression graphics control [11]. EXAMPLES Linear Regression in an Observational Study A dieting program encourages patients to follow a specific diet and to exercise regularly. We want to determine whether the actual average time per day spent exercising is related to body mass index (BMI, in kilograms per square meter) after 6 months on this program. Previous experience suggests that the exercise time of participants has a standard deviation of sx 5 7.5 minutes. Kuskowska-Wolk et al. [12] reported that the standard deviation of the BMI for their female study subjects was sy 5 4.0 kg/m2. We have n 5 100 women willing to follow this program for 6 months. We want to determine the power with which we can detect a true drop of BMI of la 5 20.0667 kg/m2 per minute of exercise. (This would imply that the average BMI of participants who exercised half an hour a day would be 2 kg/m2 less than those who did not exercise at all. We use the PS program to determine the power with which the alternative hypothesis la 5 20.0667 can be detected with type I error probability a 5 0.05 as follows: choose linear regression with one treatment; specify that the investigator does not choose the treatment levels; enter sx 5 7.5 for the standard deviation of the independent variable; indicate that we want to determine the power of the proposed study and that we will provide an estimate of sy; and enter a 5 0.05, la 5 20.0667, sy 5 4.0, and n 5 100. The PS program then calculates that 100 women yield a power of only 0.24 for detecting this alternative hypothesis. Thus, the planned study would be insufficient to detect reliably a true slope of this magnitude. The user may experiment with different values of la, sy, and n to determine the sensitivity of the derived power to changes in these parameter values. The units of measurement of the response variable affect the magnitude of both sy and la. Thus, of BMI is measured in grams per square meter, then sy becomes 4000 and la becomes 266.7. Substituting these two values into the preceding power calculation, of course, leaves the power unchanged. Linear Regression in an Experimental Study Siber et al. [13] studied impaired antibody response to pneumococcal vaccine after treatment for Hodgkin’s disease. Seventeen patients treated with subtotal radiation received pneumococcal vaccine from 8 to 51 months later. A linear regression of natural log antibody concentration on the time interval between

594

W.D. Dupont and W.D. Plummer, Jr.

treatment and vaccination suggested that log antibody concentration increased with increasing time interval between treatment and vaccination. Siber’s group estimated the slope parameter for this regression to be lˆ 5 0.01 (p 5 0.11) and the correlation coefficient to be r 5 0.40. Suppose that we want to use these results as pilot data for a new study designed to detect the true alternative hypothesis that la 5 0.01 with power 1 2 b 5 0.90 and type I error probability a 5 0.05. We might decide to assign patients at random to receive vaccine at either w1 5 10, w2 5 30, or w3 5 50 months after radiation therapy. That is, we consider a study of K 5 3 treatment levels (vaccination delay times), with equal proportions of patients vaccinated after each delay interval (c1 5 c2 5 c3 5 1/3). To use the PS program, we choose linear regression with one treatment; specify equal allocation of the treatment levels to the three times 10, 30, and 50 months; indicate that we intend to determine the sample size and that we will provide an estimate of the correlation coefficient r; and enter a 5 0.05, 1 2 b 5 0.90, la 5 0.01, and r > r 5 0.40. Using these values in the PS program gives a sample size of n 5 57 patients needed to detect a true value of l 5 0.01 with 90% power, a 5 0.05, and patients equally allocated to receive vaccinations at 10, 30, and 50 months after radiation therapy. Comparing Slopes of Two Linear Regression Lines Armitage and Berry ([6], Table 9.4) gave the age and pulmonary vital capacity for 28 cadmium industry workers with less than 10 years of cadmium exposure and for 44 workers never exposed to cadmium. The standard deviations of the ages of those unexposed and exposed were sx1 5 12.0 and sx2 5 9.19, respectively. Regressing vital capacity on age in these two groups gives slope estimates of lˆ 1 5 20.0306 and lˆ 2 5 20.0465 liters per year of life in unexposed and exposed workers, respectively (i.e., a typical exposed worker loses 46.5 mL of vital capacity per year). The standard errors of lˆ 1 and lˆ 2 are 0.00754 and 0.0113, respectively; the residual mean squares from the unexposed and exposed regressions are 0.352 and 0.293, respectively. From equation (9.17) of Armitage and Berry [6], the pooled estimate of the error variance from both groups is s2 5 0.329, and hence s 5 0.574. The estimated difference in slope estimates, lˆ 2 2 lˆ 1 5 20.0159, is not significantly different from zero (p 5 0.26) {[6] Equation (9.19)}. Suppose that we want to recruit enough workers to detect a true difference of l2 2 l1 5 20.0159 in these two groups, with 80% power, type I error probability a 5 0.05, and a ratio of unexposed to exposed workers m 5 44/ 28 5 1.57. Applying the PS program, we choose linear regression with two treatments, specify that the investigator does not choose the treatment levels, enter sx1 5 12.0 and sx2 5 9.19 for the standard deviation of the independent variable (age) in the control (unexposed) and experimental (exposed) groups, respectively; indicate that we will provide an estimate of the standard deviation of the regression errors, that we wish to calculate sample size, and that we want to compare slopes; and enter a 5 0.05, 1 2 b 5 0.80, l2 2 l1 5 20.0159, s 5 0.574, and m 5 1.57. The program responds that the required experimental treatment sample size is 166. Hence, if we recruit 427 workers, 166 workers with less than 10 years of cadmium exposure and 1.57 3 166 5 261 unexposed

Power Calculations for Linear Regression

595

workers we will have 80% power to detect a difference in the rate of loss of vital capacity with age of 20.0159 L/yr of life.

LINEAR REGRESSION USING THE PASS PROGRAM One of the most popular commercially available power and sample size programs, PASS 6.0 [4, 8], provides a general approach to power calculations for multiple linear regression using the method of Cohen [2]. Let: yj 5 g 1 l1x1j 1 l2x2j 1 · · · 1 lkxkj 1 ej : j 5 1, . . . , J

(3)

denote a conventional multiple linear regression model in which the jth patient has a response variable yj and k covariates {x1j, x2j, · · · xkj}. We intend to test the null hypothesis that l1 5 l2 5 · · · 5 lp 5 0 for some p < k. Under this null hypothesis the regression model, equation (3), reduces to: yj 5 g 1 lp11xp11,j 1 lp12xp12,j 1 · · · 1 lkxkj 1 ej : j 5 1, . . . , J

(4)

Cohen provides an F statistic to test this null hypothesis that is based on the multiple correlation coefficients RT and R0 from equations (3) and (4), respectively. PASS [4] uses this test to determine the power with which we are likely to reject the null hypothesis given a true alternative hypothesis that is expressed in terms of D 5 R2T 2 R20. The next sections describe the applicability of PASS to the examples of the present work. Table 1 gives the required input and output for these examples.

Using PASS for the BMI and Vaccination Examples The correlation coefficient module of the PASS program is also applicable for simple linear regression when the data have a bivariate normal distribution [4, 5]. In the body mass index example given earlier, we calculated the power to detect la 5 20.0667, given sx 5 7.5 and sy 5 4.0. From equation (1) we see that this is equivalent to testing the alternative hypothesis that r 5 lsx/sy 5 0.125 against the null hypothesis that r 5 0. Entering this value into the correlation coefficient module of the PASS program with a two-tailed type I error probability, a 5 0.05, a null hypothesis, r0 5 0, and a sample size of n 5 100 gives a power of 0.24 to detect ra 5 0.125. This is the same power obtained with the PS program. The PASS correlation coefficient module is not applicable to the vaccination delay time example because the delay times are not normally distributed. This nonnormality arises from the fact that Hintze–Guenther method [4, 5] used by this program assumes that the independent and dependent variables have a bivariate normal distribution. For experimental data, however, the independent variable is rarely normally distributed. Instead, trials usually assign a fixed, often equal number of patients to, say, low, medium, and high treatment levels as in the vaccine example. In this case, these treatment levels are clearly not normally distributed.

Sample size n 5 57 Not applicable Exposed sample size n 5 166, total sample size J 5 427 Power 1 2 b 5 0.807

w1 5 10, w2 5 30, w3 5 50, c1 5 c2 5 c3 5 1/3, a 5 0.05, 1 2 b 5 0.9, la 5 0.01, r 5 0.4 Not applicable sx1 5 12.0, sx2 5 9.19, a 5 0.05, 1 2 b 5 0.80, l2 2 l1 5 20.0159, s 5 0.574, m 5 1.57 p 5 1, D 5 0.0128, k 2 p 5 2, R20 5 0.3115, J 5 427, a 5 0.05

PS PASS PS PASSc

b

Power 1 2 b 5 0.24

PASS

PS a 5 0.05, r0 5 0, ra 5 0.125, n 5 100

b

Output Power 1 2 b 5 0.24

Inputa sx 5 7.5, a 5 0.05, la 5 20.0667, sy 5 4, n 5 100

Program

See text for definitions. PASS correlation coefficient module. c PASS multiple regression module.

a

Cadmium exposure

Vaccination delay time

Body mass index

Example

Table 1 Input and Output Needed by PS and PASS Programs for Examples Considered in the Present Study

596 W.D. Dupont and W.D. Plummer, Jr.

Power Calculations for Linear Regression

597

Using PASS for the Cadmium Exposure Example To use the PASS program for the cadmium exposure example just discussed we first combine the data from the unexposed and exposed workers into a single multiple linear regression model. Let x2j 5

51:0: jj

th th

worker was exposed worker not exposed

also let x3j be the age when the jth patient’s vital capacity yj is measured, and x1j 5 x2j 3 x3j. The model: yj 5 g 1 l1x1j 1 l2x2j 1 l3x3j 1 ej

(5)

reduces to yj 5 g 1 l3x3j 1 ej and yj 5 (g 1 l2) 1 (l1 1 l3)x3j 1 ej for unexposed and exposed workers, respectively. Hence, in this model, l1 represents the difference in the rate of decline in vital capacity between exposed and unexposed workers, and testing whether this rate is the same in both groups is equivalent to testing the null hypothesis that l1 5 0. Analyzing equation (5) with the vital capacity data set gives an estimate of l1 5 20.0159 with R2T 5 0.3243. (Note that this estimate of l1 equals the difference of the slope estimates from the simple linear regressions given earlier). Under the null hypothesis, equation (5) reduces to: yj 5 g 1 l2x2j 1 l3x3j 1 ej

(6)

with a single slope parameter l3 for both exposure groups. This model gives R20 5 0.3115. Hence the increase in R2 from equation (6) to equation (5) is D 5 0.3243 2 0.3115 5 0.0128. Suppose we have access to an additional 427 workers, 166 of whom were exposed for less than 10 years, with the remainder unexposed. Entering p 5 1, D 5 0.0128, k 2 p 5 2, and R20 5 0.3115, J 5 427, and a 5 0.05 in the multiple regression module of the PASS program gives a power estimate of 0.808, which is comparable to the results of the PS program given earlier. (When running the PASS program, enter p and D in the Variables to be tested frame, k 2 p and R20 in the Variables controlled for frame, and zeros in both fields of the C: Variables removed frame. Positive values of these last two fields are used in power calculations for certain complex study designs and hypotheses that Cohen refers to as case 2 analyses [2].) Thus, in this example, the PS and PASS programs produce very similar results; they differ in their requirements for input of parameters. Also, this module of the PASS program does not facilitate direct sample size calculations. DISCUSSION The chief advantage of Cohen’s method of power calculations for multiple linear regression is its flexibility. It may be used to perform power calculations for a very wide range of linear regression problems and null hypotheses. This method, however, has three disadvantages that restrict its use: 1. The pilot data needed for Cohen’s method is often unavailable. Suppose that the literature provides an estimate of the slope of a linear regression of weight loss against hours of exercise per week for normal (control) subjects. Estimates of the standard deviations of these subjects’ weight loss

598

W.D. Dupont and W.D. Plummer, Jr.

and exercise time are also available. You believe that the rate of weight loss per hour of exercise for patients on an experimental treatment will differ from that for control subjects and intend to determine an appropriate sample size for an experimental comparing experimental and control treatments. Our method allows the alternative hypothesis to be naturally specified in terms of a difference in loss-per-hour slope estimates. The user can enter estimates of the control slope and control standard deviations directly into the PS program to obtain the sample size needed for the desired power. Cohen’s method, however, cannot be used for this example. His method requires estimates of the multiple correlation coefficients RT and R0 under equations (5) and (6) of the present study. It is unlikely that these statistics will be published in the literature, For this reason, Cohen’s method almost always requires complete pilot data on both experimental and control subjects to calculate RT and R0. This was the case in the cadmium example presented in this work. Frequently, however, we want to perform power calculations on the basis of data from the literature or on pilot data that consist of the control data only. In such situations, our method works well but Cohen’s is unusable. 2. It is difficult to interpret the results of Cohen’s method. In Cohen’s method the alternative hypothesis is stated in terms of D 5 R2T 2 R20. This statistic has little intuitive meaning to either clinicians or grant reviewers. In contrast, specifying the alternative hypothesis in terms of a slope difference—say 0.5 kg per hour of exercise—is easier to comprehend. 3. Some investigators may find Cohen’s method difficult to use. To use Cohen’s method, the investigator must first known how to set up equations (3) and (4) and then how to run the linear regressions needed to derive R2T and R20. In contrast, to use our method the investigator need only understand the basic concepts of statistical power and significance, and the simple linear models described in the Introduction. SOFTWARE ACCURACY We have written Excel spreadsheets that evaluate Appendix equations (A1) and (A2) for the different cases considered in this study. These spreadsheets provide independent confirmation that the PS program has correctly implemented our formulas. The fact that the PS and PASS programs give very similar answers to the cadmium and body mass index examples using very different methods is evidence that both programs have been coded correctly. This work was supported by NIH RO1 Grants CA50468, HL19153, and LM06226 and NCI Center Grant CA68485. We thank Drs. W.A. Ray, O.B. Crofford, G.W. Reed, M.D. Decker, G.R. Bernard, M.R. Griffin, and R.I. Shorr for their helpful suggestions.

REFERENCES 1. Dupont WD, Plummer WD. Power and sample size calculations: a review and computer program. Controlled Clin Trials 1990;11:116–128. 2. Cohen J. Statistical Power Analysis for the Behavioral Sciences 2nd ed. Hillsdale, NJ: Lawrence Erlbaum; 1988.

Power Calculations for Linear Regression

599

3. Kraemer HC, Thiemann S. How Many Subjects? Statistical Power Analysis in Research. Newbury Park, CA; 1987. 4. Hintze JL. PASS 6.0 User’s Guide. Kaysville, UT: NCSS Dr. Jerry L. Hintze; 1996. 5. Guenther W. Desk calculation of probabilities for the distribution of the sample correlation coefficient. Am Statistician 1977;31:45–48. 6. Armitage P, Berry G. Statistical Methods in Medical Research 3rd ed. Oxford, UK: Blackwell Scientific; 1994. 7. Goldstein R. Power and sample size via MS/PC-DOS computers. Am Statistician 1989;43:253–260. 8. Iwane M, Palensky J, Plante K. A user’s review of commercial sample size software for design of biomedical studies using survival data. Controlled Clin Trials 1997;18:65–83. 9. Microsoft Corporation. Microsoft Visual Basic Programmer’s Guide. Redmond, WA: Microsoft Corporation; 1995. 10. Microsoft Corporation. Microsoft Fortran PowerStation Programmer’s Guide. Redmond, WA: Microsoft Corporation; 1995. 11. Visual Components Sybase Inc. First Impression Active X User’s Guide: High Performance Software for Charting Data for Microsoft Visual Basic, Visual C11, and Other Languages (Version 5.0). Overland Park, KS: Visual Components Sybase, Inc.; 1997. 12. Kuskowska-Wolk A, Bergstrom R, Bostrom G. Relationship between questionnaire data and medical records of height, weight and body mass index. Int J Obes 1992;16:1–9. 13. Siber GR, Weitzman SA, Aisenberg AC, Weinstein HJ, Schiffman G. Impaired antibody response to pneumococcal vaccine after treatment for Hodgkin’s Disease. N Engl J Med 1978;299:442–448.

APPENDIX Generic Power and Sample Size Formulas Suppose for n patients (or groups of patients) we observe responses that depend on some parameter u. Let R, a statistic derived from the n responses, have a normal distribution with mean √nu and standard deviation sR. Let SR be another statistic independent of R such that vS2R/s2R has a x2 distribution with v degrees of freedom. Let Tv[t] be the cumulative probability distribution for a random variable having a t distribution with v degrees of freedom; tv,a 5 1 T2 v [1 2 a] denote the critical value that is exceeded by such a t statistic with probability a; u0 and ua denote the values of u under the null and a specific alternative hypothesis, respectively; d 5 (ua 2 u0)/sR; and a and b denote the type I and II error probabilities associated with a two-sided test of the null hypothesis and the alternative hypothesis ua, respectively. Then (R 2 √nu)/SR has a t distribution with v degrees of freedom [6] that can be used to test the null hypothesis that u 5 u0. The same argument used to derive equations (2) and (3) of Dupont and Plummer [1] proves that the power to detect alternative hypothesis u 5 ua is: 1 2 b 5 Tv[d√n 2 tv,a/2] 1 Tv[2d√n 2 tv,a/2] and that, for the relevant values of a and b:

(A1)

600

W.D. Dupont and W.D. Plummer, Jr.

n 5 (tv,b 1 tv,a/2)2/d2

(A2)

Equation (A2) must be solved iteratively because both v and d are themselves functions of n. Studies Using Simple Linear Regression Suppose that the error terms yj 2 (g 1 lxj) are independently and normally distributed with mean 0 and standard deviation s. Let x and y denote the means of {xj} and {yj}, respectively. Then it is well known [6] that: lˆ 5 S(xj 2 x)(yj 2 y)/S(xj 2 x)2 is an unbiased estimate of l; gˆ 5 y 2 lˆ x is an unbiased estimate of g; and s2 5 S[yj 2 (gˆ 1 lˆ xj)]2/(n 2 2) is an unbiased estimate of s2 independent of lˆ . Also, (n 2 2)s2/s2 has a x2 distribution with n 2 2 degrees of freedom, and lˆ has variance s2lˆ 5 s2/S(xj 2 x)2. Let R 5 √nlˆ , s2x 5 S(xj 2 x)2/n and S2R 5 s2/s2x. Then, R has variance s2R 5 ns2l 5 s2/s2x and (n 2 2)S2R/s2R 5 (n 2 2)s2/s2 z x2n22. Hence, substituting v 5 n 2 2 and d 5 (la 2 0)/sR 5 lasx/s into equations (A1) and (A2) gives power and sample size formulas for simple linear regression. In observational studies the investigator estimates s2x, in experiments, x 5 o ckwk and s2x 5 o ck(wk 2 x)2 in the definition of d given above. We can k

k

estimate s indirectly using equation (2) if a direct estimate is unavailable. Thus, if estimates of either r, the sample correlation coefficient, or sy, the sample standard deviation of yj, are available then s may be estimated by lasx √1/r2 2 1 or √s2y 2 l2as2x, respectively. Studies With Two Linear Regression Lines Suppose that the errors yij 2 (gi 1 lixij), i 5 1, 2, are independently and normally distributed with mean 0 and standard deviation s. Let xi, yi, lˆ i, and gˆ i be the corresponding mean values and regression parameter estimates. Let s2 5 Sij[yij 2 (gˆ i 1 lˆ ixij)]2/(n1 1 n2 2 4). Then s2 is an unbiased estimate of s2 and (n1 1 n2 2 4)s2/s2 z x2n11n224 . To test the null hypothesis that l2 2 l1 5 0, we use equation (9.18) of Armitage and Berry [6], which may be rewritten 2 var(lˆ 2 2 lˆ 1) 5 slˆ 22lˆ 1 5 s2[1/(s2x1 n1) 1 1/(s2x2 n2)], where s2xi 5 Sj(xij 5 xi)2/ni for i 5 1, 2. Let n 5 n2 and let m 5 n1/n2 be the ratio of the two group sizes. Let 2 R 5 √n (lˆ 2 2 lˆ 1) and s2R be the variance of R. Then s2R 5 nslˆ 22lˆ 1 5 s2[1/(ms2x1) 1 1/s2x2]. Let S2R 5 s2[1/(ms2x1) 1 1/s2x2]. Then, [n(1 1 m) 2 4]S2R/s2R 5 (n2 1 n1 2 4)s2/s2 z x2n21n124. Therefore, substituting v 5 n(1 1 m) 2 4 and d 5 (l2 2 l1)/sR into equations (A1) and (A2) gives power and sample size formulas for testing the equality of the dose-response slopes of two treatments. To test the equality of the y intercepts of the two treatments, we use equation (5.16) or Armitage and Berry [6], which gives var(gˆ i) 5 s2gˆ i 5 s2[1 1 x2i /s2xi]/ni for i 5 1, 2. Therefore, R 5 √n (gˆ 2 2 gˆ 1) has variance s2R 5

5

3

x2 x2 s2 1 1 21 1 m 1 1 22 m sx1 sx2

46

601

Power Calculations for Linear Regression

Let S2R 5

5

3

46

x2 x2 s2 1 1 21 1 m 1 1 22 m s x1 sx2

Then [n(1 1 m) 2 4]S2R/s2R 5 (n2 1 n1 2 4)s2/s2 z x2n21n124. Substituting v 5 n(1 1 m) 2 4 and d 5 (g2 2 g1)/sR into equations (A1) and (A2) gives the desired sample size and power formulas. As in the case for a single regression line, the values of xij may be either observed attributes of patients or controlled treatment values; s may be estimated from the correlation coefficient or standard deviation of the response variable among control subjects if a direct estimate is unavailable. These terms may be handled in equations (A1) and (A2) in the same way as in the previous section.