Now You See It, Now You Don't A Comparison of Traditional Versus ...

1 downloads 0 Views 146KB Size Report
Jul 17, 1996 - Analysis of Longitudinal Follow-Up Data From a Clinical Trial. Charla Nich ... Journal of Consulting and Clinical Psychology. © 1997 by the ...

Page 1 of 15

Journal of Consulting and Clinical Psychology April 1997 Vol. 65, No. 2, 252 -261

© 1997 by the American Psychological Association For personal use only--not for distribution.

Now You See It, Now You Don't A Comparison of Traditional Versus Random-Effects Regression Models in the Analysis of Longitudinal Follow-Up Data From a Clinical Trial Charla Nich Department of Psychiatry, School of Medicine Yale University Kathleen Carroll Department of Psychiatry, School of Medicine Yale University ABSTRACT To illustrate the limitations of commonly used methods of handling missing data when using traditional analysis of variance (ANOVA) models and highlight the relative advantages of random-effects regression models, multiple analytic strategies were applied to follow-up data from a clinical trial. Traditional ANOVA and random-effects models produced similar results when underlying assumptions were met and data were complete. However, analyses based on subsamples, to which investigators would have been limited with traditional models, would have led to different conclusions about treatment effects over time than analyses based on intention-to-treat samples using random-effects regression models. These findings underscore the advantages of models that use all data collected and the importance of complete data collection to minimize sample bias.

Support was provided by Cooperative Agreement U10 AA08430 from the National Institute of Alcoholism and Alcohol Abuse and by Grants RO1-DA04299, R18-DA06963, and KO2-DA00248 from the National Institute on Drug Abuse. We gratefully acknowledge the helpful comments of Helena Kraemer and Phil Alcabes and are particularly grateful to Don Hedeker for consultation on the statistical models and the application of MIXREG software. Correspondence may be addressed to Charla Nich, Department of Psychiatry, School of Medicine, Yale University, 34 Park Street, SAC S212, New Haven, Connecticut, 06519. Received: March 14, 1996 Revised: July 17, 1996 Accepted: August 7, 1996

The importance of follow-up evaluation of clinical trials, which can address the durability and delayed emergence of effects of study treatments, has long been recognized ( Cohen, 1979 ; Cross, Sheehan, & Khan, 1982 ; Fiske et al., 1970 ; Kazdin & Wilson, 1978 ). Follow-ups of clinical trials are particularly crucial in disorders such as depression and substance abuse, which tend to have varying courses characterized by episodes of remission and relapse ( Frank et al., 1991 ; Nathan & Lansky, 1978 ; Shea et al., 1992 ; Simpson, Savage, & Lloyd, 1979 ). However, follow-up studies, which ideally require participants to be interviewed several times over the course of an extended period ( Willet & Sayer, 1994 ), are challenging and time consuming because of the difficulties in


Page 2 of 15

locating patients. High rates of successful follow-up can be achieved with devoting additional resources to follow-up and use of multiple methods to track patients (see Twitchell, Hertzog, Klein, & Schuckit, 1992 ). However, even with intensive efforts, some missing data is often unavoidable and complete data sets are rare in follow-up studies of clinical trials ( Hoke, Lavori, & Perry, 1992 ; Howard, Krause, & Orlinsky, 1986 ; Lavori, 1990 ). There is, however, no consensus on optimal analytic strategies for managing missing data ( Johnson, George, Shahane, & Fuchs, 1992 ; Lavori, 1990 ; Taylor & Amir, 1994 ). For example, traditional analytic strategies for longitudinal studies, such as repeated measures analysis of variance (ANOVA) models typically require all data to be available on all patients at each measurement point. 1 As this condition is rarely met, investigators are typically left with two lessthan-optimal strategies, both of which can lead to substantial bias and undermine the validity of results. First, cases with missing data can be deleted and analyses conducted only on those patients without missing data. However, this strategy assumes randomness of missing values, which is usually an invalid assumption. For example, more severely impaired patients or those with poor response to treatment may be more difficult to follow. Moreover, investigators generally wish to make inferences about the entire target population ( Taylor & Amir, 1994 ), and deleting patients may restrict generalizability. A second strategy is the use of imputations for missing values (e.g., carrying values forward or backward to "fill in" missing values). However, simple imputation like this, particularly the use of carrying forward endpoint values, has come under increasingly acute criticism in recent years ( Gibbons et al., 1993 ; Lavori, 1992 ) because it reduces error variance, artificially increases degrees of freedom, and thus increases the risk of Type I error. The persistence of these practices may be due, in part, to the lack of feasible alternatives. Recently available statistical models, particularly random-effects regression models ( Laird & Ware, 1982 ), offer a powerful approach for evaluating change over time ( Bryk & Raudenbush, 1987 ; Gibbons et al., 1993 ; Jennrich & Schluchter, 1986 ; Johnson et al., 1992 ; Jones & Boadi-Boateng, 1991 ; Rutter & Elashoff, 1994 ). Random-effects regression models use an iterative method that estimates, for example, an intercept and slope for each patient based on all available data for that patient, augmented by the data from the whole sample. These random effects are included in a regression equation, along with the fixed effects of the independent variables, to predict values for the dependent variable of the patient across time. In other words, a regression equation is essentially run on each patient with all available data (independent and dependent variables) to assign "Level 1" summary statistics (the intercept and slope, for example). At "Level 2," between-group differences are measured with ANOVA-type models on the intraindividual values estimated in Level 1. Using the patient-specific intercept and slope from the model, missing observations become estimable by plotting the regression line through the model parameter estimates. Thus, a major advantage of this model is that it enables investigators use of all collected data from all patients. Another important feature of random-effects regression models is that they allow real time as opposed to scheduled time to be used for each patient and, thus, more closely fit the typical clinical trials data set, where not all patients provide follow-up interviews precisely at scheduled intervals. This is unlike ANOVA models, where follow-up data provided several months after the target data would be treated as if the data was provided at the scheduled time. Although random-effects regression models (also called mixed regression, multilevel modeling, hierarchical linear modeling, and Bayesian estimation for linear models) are relatively new to clinical treatment researchers, these models have been used for some time and have been increasingly adopted in naturalistic studies ( Bryk & Raudenbush, 1987 ; Bryk & Thum, 1989 ; Muñoz, Weiss, Tager, Rosner, & Speizer, 1987 ; Raudenbush & Chan, 1993 ; Tate & Hokanson, 1993 ; Williamson,


Page 3 of 15

Appelbaum, & Epanchin, 1991 ). Random-effects regression models have also been applied to data evaluating individual change across time, illustrating the advantage of including patient-specific covariates in a multilevel regression model ( Gibbons, Hedeker, Waternaux, & Davis, 1988 ; Hedeker, Gibbons, Waternaux, & Davis, 1989 ). Furthermore, these models can also greatly enhance analysis of simple between-group effects, that is, analyses of differential treatment effects over time. As random-effects regression models allow use of all available data, evaluate change across time, accommodate time-varying or invariant covariates, and do not require constant intervals of measurement across participants ( Carbonari, Wirtz, Muenz, & Stout, 1994 ; Hedeker, Gibbons, & Davis, 1991 ), random-effects regression models show promise for longitudinal data from clinical trials. However, random-effects regression and related models have not been widely applied to clinical trials data, and few studies have evaluated findings which would emerge from these analyses compared to those from more traditional approaches. Using data from a 1-year follow-up of a randomized clinical trial of psychotherapy and pharmacotherapy for cocaine abusers, we present a comparison of results from traditional ANOVA and random-effects regression models. We highlight differences in findings that would emerge from (a) different statistical models applied to the same data as well as (b) analyses based on different samples and subsamples to which investigators might be limited, depending on their choice of statistical model.

Method Study Design and Sample Data for these comparisons comes from a 1-year follow-up of a study that evaluated both psychotherapy (cognitive—behavioral relapse prevention or clinical management), hereafter called active psychotherapy and control psychotherapy (P+, P−, respectively), and pharmacotherapy (desipramine or placebo), referred to as active and control medication (M+, M−, respectively), for ambulatory cocaine abusers ( Carroll, Rounsaville, Gordon, et al., 1994 , Carroll, Rounsaville, Nich, et al., 1994 ). In the main phase of this study, 121 patients were randomly assigned to one of four treatments (P+M+, P−M+, P+M−, P−M−) in a 2 × 2 factorial design. Treatments were manual guided and offered in individual sessions delivered by doctoral-level therapists over a 12-week course of treatment. Outcome evaluation included urinalysis and multidimensional aspects of outcome by raters who had not been informed of patients' psychotherapy and medication conditions ( Carroll, Rounsaville, & Nich, 1994 ). Of the 121 patients who gave informed consent and were randomized to treatment, 34 (28%) were women, 58 (48%) were White, 88 (73%) were single or divorced, and 58 (48%) were working full or part time. Forty-one (34%) had some college education, 50 (41%) were high school graduates, and 30 (25%) did not complete high school. The mean age of the sample was 28.8 years ( SD = 5.9). Patients reported using an average of 5.1 g of cocaine per week ( SD = 4.9) for an average of 4.1 years. Seventy-five (62%) reported predominantly freebase use of cocaine, 36 (30%) were intranasal users, and 10 (8%) were intravenous (IV) users. Thirty-seven (31%) had some previous exposure to treatment. Follow-Up Design and Sample Follow-up was naturalistic, in that patients' exposure to other treatment after completing study treatment was monitored but not controlled or restricted. Patients were scheduled to be followed 1, 3,


Page 4 of 15

6, and 12 months after termination of study treatment. Eighty percent ( n = 97) of the 121 patients were interviewed face to face at least once, but not all patients were seen at all follow-ups. As shown in Table 1 , 31 patients attended only one follow-up interview, 16 attended two interviews, 25 attended three interviews, and 25 attended all four interviews, for a total of 238 follow-up interviews. ANOVA and log-linear regression analyses were used to evaluate possible differences in demographic and baseline characteristics, as well as treatment group by timing and number of followups. Across treatment conditions, patients who completed treatment were more likely to be reached for follow-up versus those who dropped out (96% vs. 72%), χ 2 (1, N = 121) = 10.75, p < .001. Similarly, there was a strong correlation between level of exposure to study treatment and number of follow-ups, with patients with better retention in study treatment overrepresented in the follow-up data. For example, the 24 patients who were never followed averaged 4 weeks of treatment, whereas the 25 who were followed at all four follow-ups averaged 10 weeks of treatment, t (47) = 8.21, p < .001. Assessments For simplicity of comparison across subsamples and models, we chose a single outcome variable for these analyses, the cocaine composite score from the Addiction Severity Index (ASI; McLellan, Kushner, et al., 1992 ; McLellan, Luborsky, Woody, & O'Brien, 1980 ). The ASI is a structured interview that assesses severity and need for treatment for alcohol and drug use, as well as five psychosocial areas often affected by substance abuse. The ASI composite scores are a weighted summary of objective and subjective information in each of the seven areas, for which 0 = absence of problem and 1 = greater problem severity and need for treatment. Analytic Strategy We compared differences in results across several subsamples and statistical models (commonly used univariate ANOVA and multivariate ANOVA [MANOVA] models, and random-effects regression models), using a single outcome variable. In all models, coding for the independent and dependent variables was identical. Time was represented in logarithmic form (month + 1) to account for nonsynchronous intervals between follow-up evaluations (1, 3, 6, and 12 months) and to allow for accelerated rates of change expected early in the follow-up year, as the bulk of relapse usually occurs in the first 3 months after abstinence initiation (see Marlatt & Gordon, 1985 ). SPSS/PC + was used to analyze ANOVA models for linear effects and MANOVA. MIXREG was used to analyze randomeffects regression models for linear effects ( Hedeker, 1993 ). For easier comparability of model statistics, F values are presented in their square root and labeled as t values. With statistics with greater than 1 degree of freedom, the square root of F is presented. Sample Descriptions Analyses were conducted on five samples from the dataset. 1. The cross-sectional subsamples included the patients seen at each follow-up interview; that is, the four subsamples of patients who were interviewed at each follow-up point (Month 1, n = 56; Month 3, n = 65; Month 6, n = 57; Month 12, n = 60). 2. The complete data subsample included the 25 patients who completed all four interviews (125 observations total: four interviews for 25 patients plus 25 treatment endpoint interviews).


Page 5 of 15

3. The final interview subsample included the 60 patients who were interviewed at the final (12month) follow-up interview but may have missed some or all of the other interviews (300 observations total). This subsample was selected because a balanced design could be approximated through imputations; that is, by carrying final values backwards for patients who missed interviews at the 1-, 3-, or 6-month points. We considered this strategy (which made the assumption that a patient's final value was a reasonable estimate of how they were doing during early points during follow-up) to be more conservative than carrying values forward for those who were not seen again. 4. The ever-followed subsample included the 97 patients who had at least one follow-up interview after treatment endpoint (335 observations total), as described in Table 1 . 5. The intention-to-treat sample included the 121 patients who were randomized to treatment, regardless of whether they had a follow-up (359 observations = 335 interviews + 24 treatment endpoint values for patients not interviewed during the follow-up period). Statistical Models Four analytic strategies were applied to the samples as follows. Multiple cross-sectional analyses. Four two-factor ANOVAs (Psychotherapy × Medication Condition) were used to evaluate the crosssectional samples for treatment differences at each follow-up point. Because the cross-sectional samples were composed of four different but overlapping subsets of patients, these analyses were restricted to evaluation of treatment differences at each of the follow-up timepoints and could not evaluate change across time nor treatment by time effects. Repeated measures univariate ANOVA. Two-factor repeated measures ANOVAs were used to evaluate time effects, treatment main effects, as well as differences in treatment effects across time. Univariate ANOVA models have the assumptions of normality, independence of observations, and equality of variance. Because the test statistic is based on the total number of observations (rather than simply the number of patients) and the standard error is constant, repeated measures ANOVA assumes that variance across time points is constant (assumption of sphericity). In other words, when some time point measures are more closely related than others (e.g., measures of a variable at Month 0 and Month 1 would be expected to be more highly correlated than measures at Month 0 and Month 12), the repeated measures omnibus ANOVA is biased. When the ANOVA is a confirmatory test of linearity (i.e., 1 degree of freedom is used for the time effect), sphericity is not an assumption ( Stevens, 1992 ). As noted above, ANOVA models in small samples usually require a balanced design; that is, each patient must have the same number of observations at the same time points (see Footnote 1). Thus, this model was used on the complete data subsample ( n = 25), as it had a balanced design. For the repeated measures ANOVA to be used with the final interview subsample ( n = 60), use of imputed values to create a balanced design was necessary. Repeated measures MANOVA. Two-factor repeated measures MANOVAs were also used to evaluate time, treatment, and Treatment


Page 6 of 15

× Time effects. This model also requires balanced design samples and, thus, was used with the complete data ( n = 25) and final interview ( n = 60) subsamples. In repeated measures MANOVA, the statistics and design matrix are based on the number of subjects rather than the number of observations, with a vector representing the dependent variable by patient. MANOVA models do not have the assumption of sphericity, often making the results more conservative. Other assumptions of the ANOVA model (as stated above) are also assumed in the MANOVA model. Although MANOVA would seem to be the preferable model in clinical trials where data collection points are often unequally spaced, the model is more conservative than the univariate when the univariate assumptions hold, and it requires a larger sample size to be reliable ( Vassey & Thayer, 1987 ). Random-effects regression models. In its simplest form, the random-effects regression model is similar to a mixed-model ANOVA or one-way ANOVA. The model is based on the number of observations (like the ANOVA) rather than Subject × Time vectors (like the MANOVA). Unlike either ANOVA-based model, the randomeffects regression model does not require a balanced design but allows each subject a unique set of data points. The frequency (whether all subjects provide observations at all time points) and specificity (whether all subjects provide observations at each precise time point) can be more "relaxed" because the model is essentially a two-stage process in which a trajectory is modeled for each subject at Level 1. Because the variance of the subject is accounted for at Level 1 (the individual observations in this analysis) and incorporated into Level 2 (the group or subject level in this analysis) of the analysis, time is often referred to as a "random" effect. The model also allows for constraining the slope to be constant across patients, in which case only the intercept is considered random (random intercept model). Random-effects regression models were used with several subsamples from this data set. First, to compare results directly to the ANOVA and MANOVA models, we analyzed the balanced design dataset (complete data subsample, n = 25) with two-factor (psychotherapy and pharmacotherapy) random-effects regression models. Second, to evaluate the effect of imputing values from the final interview subset (where 70 values were imputed by carrying final values backward where patients missed earlier interviews), the final interview subsample ( n = 60) was analyzed using random-effects regression both with and without the imputed values; that is, data from that subsample was analyzed first with the imputed values (230 observed values plus the 70 imputed values), and a second random-effects regression analysis that used only the observed values (230 observations). Finally, random-effects regression models were used to evaluate two samples to which the ANOVA models could not be applied because of the requirement of a balanced design. Thus, random-effects regression models were used to analyze data from the ever-followed subsample who provided followup data at least once ( n = 97), as well as the intention-to-treat sample ( n = 121). A summary of analyses and results by subsample and statistical model is provided in Table 2 .

Results Multiple Cross-Sectional Analyses Of the cross-sectional two-factor ANOVAs conducted at each time point, one suggested a significant psychotherapy effect. As shown in Figure 1 , the 12-month analyses indicated that subjects in active psychotherapy had significantly better outcome than those in the control psychotherapy at that point; 12 months, t (56) = 2.19, p < .05. Again, however, each analysis was specific to the small and variable subsample interviewed at each time point and hypotheses could only be tested regarding


Page 7 of 15

outcomes at specific time points which included different subsamples. Thus, inferences regarding change across time could not be made from this analysis. Complete Data Subsample As shown in Figure 2 , the univariate repeated measures ANOVA for the complete data subsample of 25 subjects indicated that subjects treated with active psychotherapy had a significantly greater rate of change than those treated with control psychotherapy (i.e., less cocaine use across time), t (1, 21) = 2.57, p < .05. However, the repeated measures MANOVA indicated no significant psychotherapy, pharmacotherapy, or Treatment × Time effects. The MANOVA analysis also differed from the ANOVA by suggesting a trend for greater change over time for subjects on control medication compared with those on active medication, t (4, 18) = 1.64, p < .10. The random-effects regression model did not converge with both the intercept and slope specified as random, so a simpler random intercepts model was run. As shown in Figure 2 , the random-effects regression model analysis with a fixed slope and random intercept indicated subjects in active psychotherapy showed greater rates of improvement than subjects in control psychotherapy (Psychotherapy × Time estimate = −.07, SE = .03, z = 2.54, p < .05). Final Interview Subsample As shown in Table 2 , the repeated measures univariate ANOVA for the subsample of 60 subjects who provided a 12-month interview (where 70 values were imputed to create a balanced design) indicated a Psychotherapy × Time interaction, t (1, 56) = 2.89, p < .05, with subjects who received active psychotherapy reporting greater reductions in cocaine use across time compared with subjects assigned to the control psychotherapy. A three-way interaction was also found, t (1, 56) = 2.29, p < .05. However, these data violated the assumption of sphericity (with data points being more closely related to data points closer in time than those data further in time). Thus, the univariate results were biased in the direction of Type I errors. In contrast, these effects were no longer significant when the multivariate model was used. The MANOVA indicated a Psychotherapy × Time trend (square root of F [4, 53] = 1.51, p < .10, favoring active psychotherapy and a three-way trend (square root of F [4, 53] = 1.55, p < .10, in the same direction as the ANOVA described above. No significant Medication × Time effects were found. However, the random-effects regression model indicated a significant Psychotherapy × Time effect (estimate = −.071, SE = .02, z = 2.94, p < .05), as well as the significant three-way interaction seen in the ANOVA (estimate = .05, SE = .02, z = 2.23, p < .05). As shown in Figure 3 , there was no significant difference in treatment response by medication assignment. Next, to evaluate the impact of imputing values in this subsample to create a balanced design for the ANOVA models, another random-effects regression analysis was conducted, this time using only the observed values, without imputations. Thus, this analysis was based on 230 rather than 300 observations for the 60 subjects in this subsample. Results of this analysis are presented in Figure 4 . As with the random-effects regression analysis, which used the 70 imputed values, this analysis indicated a significant Psychotherapy × Time effect (estimate = −.07, SE = .02, z = 2.80, p < .05), but the three-way interaction was not significant (estimate = .05, SE = .02, z = 1.83, p < .10). Ever-Followed Subsample A random-effects regression model was used to analyze the data from the 97 subjects who were


Page 8 of 15

reached for follow-up at least once after treatment termination. For this analysis, both time and the intercept were treated as random and the correlational structure as first-order autoregressive. As shown in Table 2 and Figure 5 , this model indicated a significant Psychotherapy × Time effect (estimate = −.06, SE = .02, z = 2.25, p < .05), but neither a three-way interaction nor a Medication × Time effect were found. Intention-to-Treat Sample A random-effects regression model was used to analyze the 121 subjects in the intention-to-treat sample, which included all subjects who had been followed at least once ( n = 97, as above) plus the 24 subjects who were never followed and whose end-of-treatment value provided the basis for estimates through the model. As with the previous analysis, both time and the intercept were treated as random and the correlational structure of the errors as first-order autoregressive. As shown in Table 2 and Figure 6 , this model also indicated a significant Psychotherapy × Time effect (estimate = −.08, SE = .02, z = 2.99, p < .05). Similar to the analysis of the data from the 97 subjects who provided a follow-up observation, neither a three-way effect nor a pharmacotherapy effect were found. However, unlike the previous analyses, a main effect for psychotherapy was found (est. = .09. SE = .04, z = 2.11, p < .05), suggesting that subjects assigned to active psychotherapy had significantly higher (more severe) treatment endpoint values.

Discussion Using follow-up data from a randomized clinical trial of psychotherapy and pharmacotherapy for cocaine abusers, we evaluated posttreatment longitudinal outcomes using several statistical models and sample subsets to compare the models and illustrate differences in conclusions that would follow from the use of models that differ in their treatment of time and missing values. Overall, there was consistency across samples and models in pointing to a significant Psychotherapy × Time interaction, suggesting reduced cocaine use across time for the subjects who received active psychotherapy compared to control psychotherapy. Two main points emerge from this series of comparisons. First, within subsamples, there was consistency in findings based on the traditional ANOVA and random-effects regression models. For example, for both the complete data and final interview subsamples, the univariate ANOVA and random-effects regression models produced very similar results, including magnitude of test statistics and significance levels. This suggests that, when underlying assumptions are met and data are complete, random-effects regression models replicate results from ANOVA. However, in several instances where the ANOVA and random-effects regression analyses indicated significant Psychotherapy × Time or three-way interactions, the MANOVA model results were not statistically significant, which illustrates the comparatively limited power of the more commonly used MANOVA models for repeated measures designs with limited sample size. Second, across subsamples, several differences in findings emerged, as some effects were seen for some subsamples and not with others. This could lead to very different conclusions about the effectiveness of the treatments evaluated and highlights the problems associated with missing data in longitudinal studies. For example, analyses based on the final interview subsample ( n = 60) indicated a three-way interaction that was not found for the other subsamples nor the intention-to-treat sample. This three-way interaction appears to be either an artifact of sample bias; that is, it was restricted to the particular subsample of 60 subjects who completed a 12-month follow-up, or, perhaps a byproduct of the imputation process, which exaggerated the influence of the last data point by carrying


Page 9 of 15

it backward. Furthermore, the random-effects regression analyses on the final-interview subset with the imputed values indicated a significant three-way interaction, whereas the same analyses conducted on the same subsample but without the imputed values indicated a similar, but nonsignificant effect (Figures 3 and 4). This suggests that, in our sample, the practice of backward imputation of missing values was reasonably close to, but different from, data-driven estimation through the regression analysis. A different convention for imputing values (e.g., carrying values forward rather than backward) may have led to a more divergent pattern of findings. Although the results based on the complete data subset ( n = 25) were similar to those based on the samples that used all available data ( n = 97 or n = 121), these analyses would be subject to criticisms of sample bias and limited generalizability because they included only 20% of the subjects originally randomized to treatment. Moreover, comparison of Figures 2 and 6 illustrates the contrast between findings based on a select subset versus the full sample. There were also differences between random-effects regression analyses based on patients who contributed to the follow-up dataset ( n = 97) and those who did not ( n = 121). Random-effects regression analyses based on the intention-to-treatment sample, which included the 24 patients who were never followed, resulted in a main effect for psychotherapy condition, which was not found in any of the other analyses. 2 This highlights that there may have been important differences between the subjects who were reached for follow-up and those who were not. For the most part, these 24 subjects who contributed only a single end-of-treatment observation to the dataset, were exposed to far less study treatment than those who contributed to the follow-up dataset. Most important, as illustrated by the series of figures, there are clear advantages to models such as random-effects regression which allow use of all available data from all subjects. Even the most conscientiously collected data set from any clinical trial requiring repeated observations will have some missing values. In such cases, traditional analytic approaches require analyses based on subsamples, which can lead to sample and selection bias, or the imputation of missing values through some usually arbitrary convention, which further increases the risk of bias. Random-effects regression models allow use of all available data from all patients, thus reducing sampling bias and increasing power. Limitations The comparisons presented here were intended to illustrate some of the problems associated with common strategies for dealing with missing values in longitudinal data, as well as the relative advantages of random-effects regression models. However, it is important to note that no mathematical model or approach is a panacea for inadequate data or data collection. Even the most powerful and sophisticated models can only yield valid results when they are applied to unbiased data, and no model can rectify invalid or biased data. The burden is still on researchers to exert every effort to build as complete and valid data sets as they can. Before proceeding with the analyses presented here, we evaluated our follow-up data set to determine whether all treatment groups were equally represented (e.g., rates of follow-up were not different across demographic groups, all groups had comparable numbers of follow-up evaluations as well as early vs. late follow-ups). However, the contrast in findings between the 97 patients who were reached for follow-up and the 121 subjects in the intention-to-treat sample underscores that other, undetected sources of bias may exist. Furthermore, although the random-effects regression model can include subjects in an analysis with as little as one data point (e.g., only a baseline value, as for 24


Page 10 of 15

subjects in our intention-to-treat sample), it is important to note that as the number of missing data points increases per subject the reliability of the model decreases. Thus, what might be gained in terms of reduction in sample bias by including the 24 subjects who were never followed must be considered against the loss of efficiency in the model as it is based on fewer actual observations per subject. In addition, although the assumptions made by random-effects regression models concerning missing values are sophisticated and flexible, they are limited by the availability of true data on which those estimates are made. Thus, because the modal number of follow-up observations in this data set was two, we were limited to a linear, rather than a quadratic, model. Finally, each of the models used here has specific assumptions, in addition to normality, to which the results are not robust and should be noted. Univariate ANOVA models assume sphericity, which is frequently violated with repeated measures data. MANOVA models do not assume sphericity but, otherwise, have the same assumptions as ANOVA models. MANOVA additionally assumes that the vector of responses is distributed as multivariate normal. Random-effects regression models are comparatively new, and the consequences of violating their assumptions are not well understood ( Bryk & Raudenbush, 1991 ). In addition, unlike ANOVA models that summarize data with means and variances, the random-effects regression model may simply not converge with some data. Particularly with large data sets in which individual data points may be difficult to plot and interpret, the absence of descriptive data from random-effects regression may be a disadvantage for some researchers. Also, these models assume an individual growth trajectory that has a fixed course, the shape of which can be set somewhat arbitrarily (linear, curvilinear). Thus, these models can provide estimates that are extended well beyond the range of the variable. Also, random-effects regression models require comparatively large samples. For example, when applied to the "complete data" subsample ( n = 25), the model did not converge with time specified as random; the model did converge with time specified as fixed, but this effectively removed subject-specific growth rates and thus a major advantage of this model. Summary The availability of random-effects regression models offers a practical and powerful alternative to problems usually encountered in analyzing longitudinal data. The major advantages when applied to clinical trials data where subjects are observed over time include: (a) the freedom to use continuous outcome variables rather than dichotomized outcomes (abstinent vs. relapsed), which lose information about the dynamic and episodic nature of change; (b) the ability to make use of all available data on each subject and thus not base conclusions only on the small and highly unrepresentative sample of subjects who contribute complete data; and (c) the increased flexibility afforded by treating time as a continuous independent effect allows use of data from individuals observed at times other than the fixed data points. With the availability of random-effects regression models, the methods available for analysis of longitudinal data are now on par with the time and expense incurred in collecting these data ( Gibbons et al., 1988 ). These models make more realistic assumptions about the nature and availability of data, and therefore, about the nature of how people change.

References Bryk, A. S. & Raudenbush, S. W. (1987). Application of hierarchical linear models to assessing change.( Psychological Bulletin, 101, 147—158.) Bryk, A. S. & Raudenbush, S. W. (1991). Hierarchical linear models for social and behavioral


Page 11 of 15

research: Applications and data analysis methods. (Newbury Park, CA: Sage) Bryk, A. S. & Thum, Y. M. (1989). The effects of high school organization on dropping out: An exploratory investigation.( American Educational Research Journal, 26, 353—383.) Carbonari, J. P., Wirtz, P. W., Muenz, L. R. & Stout, R. L. (1994). Alternative analytical methods for detecting matching effects in treatment outcomes.( Journal of Studies on Alcohol, 12 (Suppl.), 83— 80.) Carroll, K. M., Rounsaville, B. J., Gordon, L. T., Nich, C., Jatlow, P. M., Bisighini, R. M. & Gawin, F. H. (1994). Psychotherapy and pharmacotherapy for ambulatory cocaine abusers.( Archives of General Psychiatry, 51, 177—187.) Carroll, K. M., Rounsaville, B. J. & Nich, C. (1994). Blind man's bluff?(The effectiveness and significance of psychotherapy and pharmacotherapy blinding procedures in a randomized clinical trial. Journal of Consulting and Clinical Psychology, 62, 276—280.) Carroll, K. M., Rounsaville, B. J., Nich, C., Gordon, L. T., Wirtz, P. W. & Gawin, F. H. (1994). Oneyear follow-up of psychotherapy and pharmacotherapy for cocaine dependence: Delayed emergence of psychotherapy effects.( Archives of General Psychiatry, 51, 989—997.) Cohen, L. H. (1979). Clinical psychologists' judgments of the scientific merit and clinical relevance of psychotherapy outcome research.( Journal of Consulting and Clinical Psychology, 47, 421—423.) Cross, D. G., Sheehan, P. W. & Khan, J. A. (1982). Short- and long-term follow-up of clients receiving insight-oriented therapy and behavior therapy.( Journal of Consulting and Clinical Psychology, 50, 102—112.) Fiske, D. W., Hunt, H. F., Luborsky, L., Orne, M. T., Parloff, M. B., Reiser, M. F. & Tuma, A. H. (1970). Planning of research on the effectiveness of psychotherapy.( Archives of General Psychiatry, 22, 22—32.) Frank, E., Prien, R. F., Jarrett, R. B., Keller, M. B., Kupfer, D. J., Lavori, P. W., Rush, A. J. & Weissman, M. M. (1991). Conceptualization and rationale for consensus definitions of terms in major depressive disorder.(Remission, recovery, relapse, and recurrence. Archives of General Psychiatry, 48, 851—855.) Gibbons, R. D., Hedeker, D., Elkin, I., Waternaux, C., Kraemer, H. C., Greenhouse, J. B., Shea, M. T., Imber, S. D., Sotsky, S. M. & Watkins, J. T. (1993). Some conceptual and statistical issues in analysis of longitudinal psychiatric data.( Archives of General Psychiatry, 50, 739—750.) Gibbons, R. D., Hedeker, D., Waternaux, C. & Davis, J. M. (1988). Random regression models: A comprehensive approach to the analysis of longitudinal psychiatric data.( Psychopharmacology Bulletin, 24, 438—443.) Hedeker, D. H. (1993). MIXREG: A Fortran program for mixed-effects linear regression models [Computer program].(Rockville, MD: National Institute of Mental Health Division of Services Research) Hedeker, D., Gibbons, R. D. & Davis, J. M. (1991). Random regression models for multicenter clinical trials data.( Psychopharmacology Bulletin, 27, 73—77.) Hedeker, D., Gibbons, R. D., Waternaux, C. & Davis, J. M. (1989). Investigating drug plasma levels and clinical response using random regression models.( Psychopharmacology Bulletin, 25, 227— 231.) Hoke, L. A., Lavori, P. W. & Perry, A. C. (1992). Mood and global functioning in borderline personality disorder: Individual regression models for longitudinal measurements.( Journal of Psychiatric Research, 26, 1—16.) Howard, K. I., Krause, M. S. & Orlinsky, D. E. (1986). The attrition dilemma: Toward a new strategy for psychotherapy research.( Journal of Consulting and Clinical Psychology, 54, 106—110.) Jennrich, R. I. & Schluchter, M. D. (1986). Unbalanced repeated-measures models with structured covariance matrices.( Biometrics, 42, 805—820.) Johnson, W. D., George, V. T., Shahane, A. & Fuchs, G. J. (1992). Fitting growth curve models to longitudinal data with missing observations.( Human Biology, 64, 243—253.)


Page 12 of 15

Jones, R. H. & Boadi-Boateng, F. (1991). Unequally spaced longitudinal data with AR(1) serial correlation.( Biometrics, 47, 161—175.) Kazdin, A. E. & Wilson, G. T. (1978). Evaluation of behavior therapy: Issues, evidence, and research strategies. (Cambridge, MA: Ballinger) Laird, N. M. & Ware, J. H. (1982). Random-effects models for longitudinal data.( Biometrics, 38, 963—974.) Lavori, P. W. (1990). ANOVA, MANOVA, my black hen: Comments on repeated measures. ( Archives of General Psychiatry, 47, 775—778.) Lavori, P. W. (1992). Clinical trials in psychiatry: Should protocol deviation censor patient data? ( Neuropsychopharmacology, 6, 39—48.) Marlatt, G. A. & Gordon, J. R. (Eds.) (1985). Relapse prevention: Maintenance strategies in the treatment of addictive behaviors.(New York: Guilford Press) McLellan, A. T., Kushner, H., Metzger, D., Peters, R., Smith, I., Grissom, G., Pettinati, H. & Argerious, M. (1992). The fifth edition of the Addiction Severity Index.( Journal of Substance Abuse Treatment, 9, 199—213.) McLellan, A. T., Luborsky, L., Woody, G. E. & O'Brien, C. P. (1980). An improved diagnostic evaluation instrument for substance abuse patients: The Addiction Severity Index.( Journal of Nervous and Mental Disease, 168, 26—33.) Muñoz, A., Weiss, S. T., Tager, I. B., Rosner, B. & Speizer, E. (1987). Statistical methods for the analysis of the association between bronchial responsiveness and pulmonary function changes. ( Bulletin Europeen Physiopathologic Respiratoire, 23, 377—381.) Nathan, P. E. & Lansky, D. (1978). Common methodological problems in research on the addictions. ( Journal of Consulting and Clinical Psychology, 46, 713—726.) Park, T. (1993). A comparison of the generalized estimating equation approach with the maximum likelihood approach for repeated measurements.( Statistics in Medicine, 12, 1723—1732.) Raudenbush, S. W. & Chan, W. (1993). Application of a hierarchical linear model to the study of adolescent deviance in an overlapping cohort design.( Journal of Consulting and Clinical Psychology, 61, 941—951.) Rutter, C. M. & Elashoff, R. M. (1994). Analysis of longitudinal data: Random coefficient regression modelling.( Statistics in Medicine, 13, 1211—1231.) Shea, M. T., Elkin, I., Imber, S. D., Sotsky, S. M., Watkins, J. T., Collins, J. F., Pilkonis, P. A., Beckham, E., Glass, D. R., Dolan, R. T. & Parloff, M. B. (1992). Course of depressive symptoms over follow-up: Findings from the National Institute of Mental Health Treatment of Depression Collaborative Research Program.( Archives of General Psychiatry, 49, 782—787.) Simpson, D. D., Savage, L. J. & Lloyd, M. R. (1979). Follow-up evaluation of treatment of drug abuse during 1969 to 1972.( Archives of General Psychiatry, 36, 772—779.) Stevens, J. (1992). Applied multivariate statistics for the social sciences (2nd ed.).(Hillsdale, NJ: Erlbaum) Tate, R. L. & Hokanson, J. E. (1993). Analyzing individual status and change with hierarchical linear models: Illustration with depression in college students.( Journal of Personality, 61, 181—206.) Taylor, M. A. & Amir, N. (1994). The problem of missing clinical data for research in psychopathology.( Journal of Nervous and Mental Disease, 182, 222—229.) Twitchell, G. R., Hertzog, C. A., Klein, J. L. & Schuckit, M. A. (1992). The anatomy of a follow-up. ( British Journal of Addiction, 87, 1327—1333.) Vassey, M. W. & Thayer, J. F. (1987). The continuing problem of false positives in repeated measures ANOVAs in psychophysiology: A multivariate solution.( Psychophysiology, 24, 479— 486.) Willet, J. B. & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time.( Psychological Bulletin, 116, 363—381.) Williamson, G. L., Appelbaum, M. & Epanchin, A. (1991). Longitudinal analyses of academic


Page 13 of 15

achievement.( Journal of Educational Measurement, 28, 61—76.) 1 Although the repeated measures ANOVA can be conducted on unbalanced designs, the results are based on least squares estimation (LSE), which has been shown to be biased in small samples ( Park, 1993 ). 2 It should be noted that in this model the main effect of psychotherapy is at Time 0 for patients in the pharmacotherapy cell coded 0 (placebo). Figure 1. Cocaine composite scores across time by treatment group: Cross-sectional analyses ( n = 121, observations = 335). Data represent subgroups of patients interviewed at each point; baseline n = 121, 1-month n = 56; 3-month n = 65; 6-month n = 57; 12-month n = 60. Addiction Severity Index (ASI) cocaine composite scores range from 0 to 1, with lower scores indicating less cocaine use. Minus signs indicate psychotherapy control condition (Psych) or medicine (Med); plus signs indicate active psychotherapy or medication (e.g., desipramine).

Figure 2. Cocaine composite scores across time by treatment group: Complete data subsample ( n = 25, observations = 125). Minus signs indicate psychotherapy control condition (Psych) or placebo (Med); plus signs indicate active medication or psychotherapy. ANOVA = analysis of variance; ASI = Addiction Severity Index.

Figure 3. Cocaine composite scores across time by treatment group: Final interview subsample ( n = 60, observations = 300). Minus signs indicate psychotherapy control condition (Psych) or placebo (Med); plus signs indicate active medication or psychotherapy. ANOVA = analysis of variance; ASI = Addiction Severity Index.


Page 14 of 15

Figure 4. Cocaine composite scores across time by treatment group: Final interview subsample without imputations ( n = 60; observations = 230). Minus signs indicate psychotherapy control condition (Psych) or placebo (Med); plus signs indicate active medication or psychotherapy. ASI = Addiction Severity Index.

Figure 5. Cocaine composite scores across time by treatment group: Ever-followed subsample ( n = 97, observations = 335). Minus signs indicate psychotherapy control condition (Psych) or placebo (Med); plus signs indicate active medication or psychotherapy. ANOVA = analysis of variance; ASI = Addiction Severity Index.

Figure 6. Cocaine composite scores across time by treatment group: Intention-to-treat sample ( n = 121, observations = 335). Minus signs indicate psychotherapy control condition (Psych) or placebo (Med); plus signs indicate active medication or psychotherapy. ANOVA = analysis of variance; ASI = Addiction Severity Index.


Page 15 of 15

Table 1.

Table 2.