Severity of Bulimia Nervosa

1 downloads 0 Views 257KB Size Report
Nov 20, 2008 - Key Words. Bulimia nervosa, severity Transformation of variables ..... sification with maximum sensitivity, the 'Area Under the. Curve' (AUC, a ...
Original Paper Psychopathology 2009;42:22–31 DOI: 10.1159/000173700

Received: July 12, 2007 Accepted after revision: February 8, 2008 Published online: November 20, 2008

Severity of Bulimia Nervosa Measurement and Classification into Health or Pathology

A. Hartmann a A. Zeeck a A.J. van der Kooij b a Department b Data

of Psychosomatic Medicine and Psychotherapy, University of Freiburg, Freiburg, Germany; Theory Group, University of Leiden, Leiden, The Netherlands

Key Words Bulimia nervosa, severity ⴢ Transformation of variables ⴢ Predictor selection

Abstract Aims: In order to identify the most important components of the severity of bulimia nervosa (as well as identifying clinical cases), we explored the relation between dimensional and categorical assessment. This was achieved by studying the performance of variables from standard instruments (measuring specific and general psychopathology) in predicting an expert rating of overall syndrome severity. Method: In total, 213 cases were selected (across the whole range of severity). We applied regression with optimal scaling to model nonlinear relations in the data, and the lasso method with bootstrapping for predictor selection. The best model contained 2 scales of the Eating Disorders Inventory (‘bulimia’ and ‘drive for thinness’) and the frequency of the binges. The sensitivity and specificity of case classification using the obtained model was determined. Results: The model can predict the probability of being a clinical case at a rate of 88%. The presented statistical methods are innovative and promising approaches that can help researchers and clinicians to better define sets of variables for treatment evaluation and outcome studies. Conclusion: The results indicate that severity and outcome in bulimia nervosa should be determined by measuring both cognitive and behavioral aspects of the symptoms. Copyright © 2008 S. Karger AG, Basel

© 2008 S. Karger AG, Basel 0254–4962/09/0421–0022$26.00/0 Fax +41 61 306 12 34 E-Mail [email protected] www.karger.com

Accessible online at: www.karger.com/psp

Introduction

Reliable and valid operationalizations of the severity of a disorder, as well as thresholds separating pathological states from health, are necessary for the evaluation of treatment outcome. This is a generic problem that also pertains to the evaluation of eating disorders. Clausen [1, 2] found that recovery rates of outcome studies vary from 24 to 74%, and that a great amount of this variance is accounted for by the method of assessment [1] and its time point [2]. In a similar study, Olmsted et al. [3] showed that varying definitions of relapse and remission in bulimia nervosa produce relapse rates ranging from 21 to 55% in the same sample. Obviously, defining the symptom severity or the outcome of bulimia nervosa is not a trivial matter. For bulimia nervosa, DSM-IV [4] lists some criteria for eating behavior and weight control, as well as the abnormal relation between self-esteem and weight or shape, but no criteria for the severity or the amount of impairment. The criterion defining the borderline between pathology and normality (an average of 2 binges per week over 3 months) is not based on empirical investigations, but rather on expert consensus. Many patients suffering from bulimia nervosa suffer from more than abnormal eating behavior and weight concern. Very often we find comorbidity with depression, anxiety disorders [5–8] and personality disorders [9], which contribute to the severity of the overall impairPD Dr. Almut Zeeck Department of Psychosomatic Medicine and Psychotherapy, University of Freiburg Hauptstrasse 8, DE–79104 Freiburg (Germany) Tel. +49 761 270 6923, Fax +49 761 270 6885 E-Mail [email protected]

ment. Furthermore, the course of bulimia nervosa often fluctuates with periods of symptom abstinence followed by relapses. The severity and status of illness or health can be determined categorically or dimensionally [10]. To determine diagnoses is an ostensibly simple categorical solution, but the absence of the intake diagnosis at the termination of treatment is not necessarily a sign of good outcome. Patients who suffered from bulimia nervosa may no longer fulfill the DSM (or ICD) criteria at treatment termination, yet still be without a good outcome. For example, there are patients who have ceased bingeing completely, but they still vomit after regular meals, keeping their weight low and suffering from cognitive fixations on food or shape. Of course, these persons could be diagnosed as EDNOS (eating disorder not otherwise specified), preventing a misclassification as healthy. Also, there is a problem with clinically relevant improvements. For a patient having started with many binge/purges a day, a reduction down to 2 purges a week would be an excellent treatment outcome, while according to the DSM criteria the patient still has to be diagnosed as bulimic. The clinical evaluation of ‘good outcome’ (with respect to number of binge-purges) is contradicted by the diagnostic label ‘still ill’. In psychotherapy research, the concept of clinical significance [11–13] has been developed to operationalize outcome. Methods of clinical significance provide 2 measures: the cutoff value C, drawing a line between healthy and ill on a continuous variable, and the Reliable Change Index (RCI), defining the difference between 2 values over time where the probability of random change is p ! 0.05. The calculation of C is based on distribution characteristics of samples of ill and healthy persons, determining a cutoff where the minimum of misclassifications is to be expected (based on the assumption of normal distributions of the outcome variable in both samples). The calculation of the RCI is based on the retest reliability of a measure (sometimes on Cronbach’s ␣), and it estimates the minimum nonrandom difference. The use of the cutoff value C suffers from the same flaw as diagnostic categories: relevant improvement within the pathological range is not covered. Some authors combine C and RCI, forming a set of 6 outcome classes: [healthy/ill] ! [reliably improved/unchanged/reliably deteriorated]. This is clearly a better solution, but the RCI is a very lenient index for improvement. For example, the subscale ‘bulimia’ from the Eating Disorders Inventory (EDI) scores on a range from 0 to 21 [14, 15]. For this scale, the C = 4.5 and RCI = 2.4, meaning that a change of 2.4 or more points

on this scale is regarded as nonrandom relevant change. However, a change of 3 points may mean something very different, depending on the starting point: improving from 6 to 3 means to cross the line between clinical and normal (C = 4.5); improving from 18 to 15 means reliable improvement, but it is clinically irrelevant; improving from 18 to 6 would be reliable and clinically relevant. The question is how to link clinically relevant categories to the scores of questionnaires or interviews, and how to interpret differences in the light of clinical judgment. Another issue is multimeasure, multisource outcome measurement. Many authors claim that it is necessary to measure many facets of a disorder and to use multiple sources of information [16]. This leads to the unfortunate situation where the investigations produce a lot of data, and where the relevance of the variables, their independence and the significance levels of comparisons are open to question. In order to avoid the problems of multiple testing, researchers then have to reduce the complexity of their data. Some choose the solution to define 1 variable as final endpoint – the ‘rest’ of the multimeasure outcome data is entered for exploratory analyses. Others factorize their outcome data set, either using factor scores to define outcome or selecting the highest loading variables. All these considerations show the complexity of determining symptom severity (for bulimia nervosa) using categorical, dimensional, multivariate and multisource approaches. We would argue that it is most relevant to know which variables contribute to the overall severity of a disorder, and how these variables are related to severity. This seems to be a necessary condition to determine whether therapy is successful. Empirical investigations into this issue would be valuable for researchers, guiding their decisions on primary outcome selection, as well as for reviewers assisting their selection of specific outcome variables from the sets of variables reported in studies. Our research questions are: firstly, how can we determine which facets of the eating disorder and of general psychopathology determine expert ratings of overall syndrome severity? In other words, how do we select an optimal subset of variables from a standard battery of outcome questionnaires to predict severity of bulimia nervosa as rated by experts? Secondly, can we empirically determine a cutoff score between health and pathological states? Finally, can we determine the score region of the predictor variables where discrimination is best? In answering these questions, we will take into account the noninterval scale of the variables and possible nonlinear relations between them.

Severity of Bulimia Nervosa

Psychopathology 2009;42:22–31

23

Table 1. Severity ratings by time of measurement

Severity rating

Time of measurement, n

0 No eating disorder 1 Slight eating disorder 2 Marked eating disorder (outpatient treatment sufficient) 3 Severe eating disorder (inpatient treatment or intense outpatient treatment necessary) 4 Very severe eating disorder (inpatient treatment necessary) Total

Total n

intake

after treatment

follow-up

3 4 35

1 6 4

24 21 17

28 31 56

58 34

0 0

6 0

64 34

134

11

68

213

Severity rating from SIAB item No. 84 (there has been a change of SIAB version, mainly adding items for binge eating disorder; the item number of the severity rating was changed, from 53 to 84, but not the content).

Method Sample and Instruments We have drawn our sample from all consecutive admissions of patients suffering from bulimia nervosa (ICD-10) who were seen in our outpatient clinic at the Department for Psychosomatic Medicine and Psychotherapy of the University Hospital in Freiburg, Germany. Our first selection criteria were being diagnosed with the Structured Interview for Anorexia and Bulimia (SIAB-EX), a standardized and comprehensive interview for eating disorders (www.epi.med.uni-muenchen.de/siab2_e.htm) [17, 18], and having filled out a battery of psychometric tests within 3 weeks before or after the interviews. Between interview and questionnaires, diagnostic procedures took place, but no treatment nor serious intervention. Between January 1990 and December 2005, a total of 862 patients with eating disorders were seen in our department. Depending on therapist’s capacity and available funding for research projects, 290 patients were diagnosed with the SIAB-EX, some of them more than once, resulting in 535 available interviews. From these, we excluded cases with insufficient self-report data. If a patient was interviewed more than once, we selected only the interview at the time point where the patient was diagnosed as being in one of the extreme severity categories (‘no eating disorder’ or ‘very severe eating disorder’). We made this choice because the frequency of the extreme categories is considerably lower than that of the diagnoses in-between, and we wanted our sample to contain enough cases of all kinds of severity. This procedure resulted in a final dataset containing 213 independent cases. The interviews were obtained mainly from a treatment sample, so there were relatively few people who were considered to show no pathology. The majority of these cases were found after treatment and at follow-up assessments (3 months to 3 years after termination of therapy; table 1). In our investigation, we used the therapists global severity rating of the bulimic eating disorder section of the SIAB-EX [17–19]. The interviewers were all trained, and checked for their reliability. The interviewers gave the global severity rating at the end of the interview, which took 1 h on average. The instructions say that

24

Psychopathology 2009;42:22–31

the global severity rating should summarize all the symptoms found in the interview. The rating represents an overall clinical impression of the interviewer, and may explicitly contain symptoms not related to the eating disorder, like impairment of social functioning or social integration (except for isolated obsessions or anxieties). The therapists ranked the severity into 5 categories (table 1). Our set of variables (table 2) can be considered as a typical multi-source multi-measure data set, containing both variables measuring cognitive, emotional and behavioral aspects of the disorder (EDI-2, SIAB-EX) [14, 17, 18] as well as general psychopathology Symptom Checklist 90 revised (SCL-90-R) [20, 21]. About 65% of the interviews and self-reports we used were obtained in the context of intake diagnostic procedures; about 30% were obtained during follow-up, and the rest at the end of treatment (table 1). As was expected, no or mild symptoms are mostly found at follow-up, and most of the higher ratings are found at intake. This distribution mirrors daily practice and the time point of administration of the interview. Self-Ratings and Expert Ratings We were also interested in the relation between self-ratings and expert ratings of severity. The SIAB-S is a questionnaire, strictly parallelized with the interview. Since 1999, we have used it as a self-report instrument for eating disorders. For the admissions after 1999 (n = 68), we computed the correlation between the experts’ ratings of severity and the self-report rating of ‘impairment caused by the eating disorder’, using Pearson’s r and Spearman’s ␳ for rank data, and found that the expert and selfratings are highly correlated (r = 0.78, ␳ = 0.77), supporting our decision to use the larger sample of experts’ ratings (SIAB-EX). Statistical Analysis We chose to use CATREG, which is a method for regression with categorical variables (unordered or ordered) using optimal scaling [22, 23]. The method is implemented in the program CATREG in the Categories package of SPSS [24, 25]. Using CATREG, we neither need to assume that the variables are interval scaled, nor that they are linearly related nor normality of residuals. The CATREG model is the classical linear regression

Hartmann /Zeeck /van der Kooij

Table 2. Variables

Variable Patient characteristics Age at intake, years Age at onset (self report), years BMI at intake Gender Male Female EDI factors, score F1 drive for thinness F2 bulimia F3 body dissatisfaction F4 ineffectiveness F5 perfectionism F6 interpersonal distrust F7 interoceptive awareness F8 maturity fears SCL-90-R, score GSI T-score GSI raw score GSI T-score classes T ≤60 T 61 to ≤70 T 71 to ≤80 T 81 to ≤90 T >90

Rangea

Mean 8 SD or n

16–54 3–34 17.51–35.70

26.0286.56 16.9684.09 21.8883.22 6 (2.8) 207 (97.2)

0–21 0–21 0–27 0–28 0–18 0–17 0–30 0–23 34.4–144.4 0.01–2.84

Behavioral measure Frequency of binge eatingb Not markedly Rarely Occasionally (average at least twice weekly) Frequently (up to 1 per day) Very frequently (more than once per day)

9.9086.15 7.4785.57 15.1588.42 9.3086.91 5.9884.47 6.1083.81 9.3087.70 4.7283.30 73.8824.6 1.0580.66 73 (35.1) 28 (13.5) 36 (17.3) 21 (10.1) 50 (24.0)

58 (27.2) 18 (8.5) 40 (18.8) 46 (21.6) 51 (23.9)

Figures in parentheses are percentages. SCL-90-R = Symptom Checklist 90 revised; GSI = Global Severity Index. a The ranges of most of the EDI variables and the SCL-90-R cover the full span of values, which is due to including both patients in the sample with no pathology and the most severe pathology. b Rated by interviewer, based on overall evidence.

The dependent variable, severity of syndrome, and the predictor variables frequency of binges (freq-binges) and T-GSI (Global Severity Index T score) are clearly rank ordered. Although the EDI variables could be considered to be of interval level, they were also treated as rank-ordered variables to allow possible nonlinear relations to be revealed. To obtain transformations that respect the rank order of the observed values, we specified monotonic transformations: (1) a step function (appropriate for variables with a limited number of categories) for the severity rating and the predictors freq-binges and T-GSI, and (2) a spline transformation for the EDI predictors (appropriate for variables with a high number of categories), based on second-degree polynomials, with 2 interior knots; the degree and the number of knots control the smoothness of the transformation. For selecting a subset of predictors, we used the Lasso (least absolute shrinkage and selection operator) [26], which was recently incorporated in the CATREG method [27, 28]. The Lasso applies a penalty to the regression model to reduce the estimation variance due to multicollinearity. The effect of the penalty is that the coefficients are shrunken. With a penalty value of zero, the ordinary- least-squares model is obtained. With increasing values of the penalty, the coefficients shrink toward zero at different rates; predictors with the most stable estimate of the coefficient will shrink to zero more slowly (requiring higher penalty values). For sake of simplicity, we did not want to select the optimal shrunken model, but only used the Lasso in an explorative way to select a subset of predictors. From the CATREG-Lasso result we obtained subsets of p predictors, with p ranging from 1 to the total number of predictors. Then we analyzed each of these subsets with CATREG and selected the optimal subset with the .632 bootstrap [29] (a particular form of the bootstrap often used in regression, that outperforms cross-validation), applying the 1-standard-error rule: select the most parsimonious model within 1 standard error (SE) of the model with minimum prediction error. We used the predicted values resulting from the selected model as test variable scores to evaluate the discriminative power of the model for classification into health and illness. For this purpose, we divided the severity ratings into: 0–1 (‘no’ and ‘slight’) and 2–4 (‘marked’ through ‘very severe’). We used ROC (receiver operating characteristic) analysis to evaluate classifications obtained with different cutoff values for clinical caseness. ROC curves graphically show how the sensitivity (proportion of true positives) and specificity (proportion of true negatives) of classifications vary as the cutoff value is varied.

Results model applied to transformed variables. The variables are transformed by replacing categories with optimal values, called category quantifications, using the optimal scaling methodology [22]. The transformed variables have all the numeric properties of interval variables. The regression coefficients and the quantifications are estimated simultaneously in an iterative process. The quantifications are optimal for regression, which means that the squared multiple regression coefficient (proportion of variance explained) is as large as possible.

Model Selection In figure 1, the CATREG-Lasso paths are displayed. The most parsimonious unshrunken model with prediction error that is within 1 SE of the best model (fig. 2) is the model with 3 predictors: EDI-F1 ‘drive for thinness’, EDI-F2 ‘bulimia’ and freq-binges. Because we preferred an unshrunken model, we used CATREG without shrinkage and applied the .632 boot-

Severity of Bulimia Nervosa

Psychopathology 2009;42:22–31

25

0.4

freq-binges

0.3

0.2

EDI-F1 drive for thinness

0.1

T-GSI EDI-F7 internal awareness EDI-F8 maturity fears EDI-F6 interpersonal distrust EDI-F3 body dissatisfaction

0

EDI-F5 perfectionism –0.1

EDI-F4 ineffectiveness 0 high penalty

0.2

CATREG-Lasso Analysis This resulted in selecting the model with 3 predictors: freq-binges and the EDI factors F2 ‘bulimia’ and F1 ‘drive for thinness’ (expected error = 0.389, apparent error = 0.330); the apparent proportion of explained variance (1 – apparent error) is 0.67 (F = 45.3; d.f. = 9; p ! 0.0001). Table 4 displays the regression coefficients of the predictors and their relative importance [30]. The regression with optimal scaling procedure shows that there are nonlinear relations between the predictors and severity rating of eating disorder, indicated by the nonlinear curves in the transformation plots (fig. 3), especially for EDI-F1 and EDI-F2. As in standard linear regression, the contribution of a predictor variable to the predicted value for a person is the person’s value on the predictor variable multiplied by the regression coefficient; with CATREG the value on the predictor variable is not the score on the original variable, but the quantification of that score. The regression coefficients (table 3) are all positive, so the categories of the predictor variables with negative quantified values (below the ‘zero line’ in Psychopathology 2009;42:22–31

0.6

0.8

1.0 zero penalty

Standardized sum of absolute coefficients

strap and the 1-SE rule to 10 models: from the model including all predictors to the model including only 1 predictor, including predictors in these models according to the results of the CATREG analysis.

26

0.4

Expected prediction error

Fig. 1. CATREG-Lasso coefficient paths. The sum of the absolute coefficients is standardized by dividing with the sum of the absolute coefficient values for the ordinary least-squares model, which is the model with zero penalty. The regression coefficients are on the y-axis (CATREG standardizes the transformed variables, so the coefficients are ␤) and the x-axis represents the penalty. Thus, on the right hand side of the plot (sum = 1, no penalty) we see the regression coefficients for the model including all predictors unshrunken, and on the left (sum = 0, high penalty) all predictors are shrunken out.

␤ coefficient

EDI-F2 bulimia

0.50

0.45

0.40

0.35 1

2

3

4 5 6 7 Number of predictors

8

9

10

Fig. 2. Expected prediction error estimated with the .632 bootstrap for unshrunken models with predictor subsets as suggested by the CATREG-Lasso. The error bars around the error points represent the SE. The horizontal line represents the minimum error estimate (for the model with 4 predictors) plus 1 SE.

the transformation plot) contribute negatively to the prediction, while the categories with positive quantifications contribute positively. Thus, predictor categories with negative quantified values are related to the categories ‘no’ and ‘slight’ of the severity rating variable (that received negative quantifications), and predictor categories Hartmann /Zeeck /van der Kooij

Table 3. Regression coefficients for the selected model

Standardized coefficients

Freq-binges EDI-F1 drive for thinness EDI-F2 bulimia



SE

0.355 0.225 0.370

0.052 0.058 0.065

Importancea

d.f.

F

Significance

0.365 0.219 0.416

3 3 3

47.437 14.982 32.155

0.000 0.000 0.000

a

In contrast to the regression coefficients, the importance measure defines the importance of the predictors additively (importance equals the product of the regression coefficient and the zero-order correlation for the predictor; these products add to R-square, so dividing the products by R-square, they add to one).

Table 4. Classification results using the predicted values from CATREG for the selected model as test scores

Observed

Predicted pathological improved/ healthy

Max. sensitivity Pathological Improved/healthy

144 18

8 41

Max. specificity Pathological Improved/healthy

107 3

45 50

Max. sensitivity and specificity Pathological 126 Improved/healthy 10

26 49

Sensitivitya

Specificityb

Efficiencyc

PVPd

PVNe

␬f

0.95

0.70

0.88

0.89

0.84

0.68

0.70

0.95

0.77

0.97

0.55

0.54

0.83

0.83

0.83

0.93

0.65

0.61

a Proportion true positives of all observed positives. b Proportion true negatives of all observed negatives. Proportion correct of all cases. d Predictive value positive test (proportion true positives of all predicted positives). e Predictive value negative test (proportion true negatives of all predicted negatives). f Measure of agreement (minimum 0, maximum 1). c

with positive quantified values are related to the categories ‘marked’, ‘severe’ and ‘very severe’ (that received positive quantifications). For the expert rating of freq-binges, the categories ‘none’ (0) and ‘rarely’ (1) have negative quantified values that are markedly different, meaning that these 2 categories of the freq-binges variable can distinguish persons with no eating disorder from persons with a slight eating disorder very well. Also, the quantified values for categories 0 and 1 are markedly different from the quantifications for categories 2–4 (occasional to very frequent bingeing). Thus, the freq-binges variable can also make a very good distinction between persons with no and slight eating disorder and persons with a marked to very severe eating disorder. However, the quantifications for catego-

ries 2–4 (occasional to very frequent bingeing) do not differ much from each other, indicating that the freq-binges variable cannot clearly separate patients into the eating disorder categories 2, 3 and 4 (marked to very severe eating disorder). So, the binge rating variable distinguishes 3 groups: persons without binge eating, with slight binge eating, and with marked to very severe binge eating. In the transformation plots for EDI-F1 ‘drive for thinness’ and EDI-F2 ‘bulimia’ we see the same trend: for EDI-F1 ‘drive for thinness’, the transformation curve starts to level off at category 6, and at category 5 for EDI-F2 ‘bulimia’; thus, the lower categories (related to health) are well separated from each other and from the higher categories (related to pathology), but within the pathology range there is little discrimination.

Severity of Bulimia Nervosa

Psychopathology 2009;42:22–31

27

1.0 1.0 0.5

Quantification freq-binges

Quantification severity

0.5 0

–0.5

–1.0

–1.5

0

–0.5

–1.0

–2.0

–1.5

–2.5

–2.0 None (0)

Slight (1)

Marked (2)

Severe (3)

None (0)

Very severe (4)

Categories

≥2/week (2)

Up to 1/day (3)

>1/day (4)

Categories

1

1 Quantification EDI-F2 – bulimia

Quantification EDI-F1 – drive for thinness

Rarely (1)

0

–1

0

–1

–2 –2 –3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Categories

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Categories

Fig. 3. Quantifications of categories.

The findings with respect to distinguishing health from pathological states are confirmed if we apply a clinical significance measure: the cutoff value between the normal and the pathological population C; for EDI-F1 ‘drive for thinness’, C = 6, and, for EDI-F2 ‘bulimia’, C = 4.5. 28

Psychopathology 2009;42:22–31

Discrimination between Health and Pathology We used the predicted values saved by CATREG for the selected model as the test variable in ROC curve analysis to find cutoff values to classify the cases as either healthy (no and slight eating disorder) or pathological (marked to Hartmann /Zeeck /van der Kooij

very severe eating disorder). In diagnostic decision making, there is always a trade-off between sensitivity and specificity. Therefore, we determined 3 cutoff values: 1 for a classification with maximum sensitivity, 1 for a classification with maximum specificity and 1 for a classification with the best compromise between sensitivity and specificity. All 3 classifications give good results. For the classification with maximum sensitivity, the ‘Area Under the Curve’ (AUC, a discrimination measure, equivalent to Mann-Whitney U) is 0.93 (SE = 0.019, 95% CI = 0.892– 0.968); table 4 displays the results for all three classifications.

Discussion

To operationalize the syndrome severity of bulimia nervosa in a satisfying way is a challenge to researchers. We used a sample of bulimic patients that covered the whole range of severity, and measured severity and symptoms with widely used instruments like the EDI-2, SCL90-R and the SIAB. We aimed to identify the important components of the disorder by optimally predicting overall syndrome severity and impairment as rated by experts from a small set of self-report measures. The obtained models can be used to guide the selection of instruments and variables for clinical screenings of patients with selfreport instruments, as well as for operationalizations of outcome in research.

ger be diagnosed as bulimic, but as EDNOS. If the preoccupation with food and eating remains at the initial level, these patients are still impaired, although having changed diagnostic category. Preoccupation with eating has also been found to be associated with relapses after recovery [32]. Although we expected general psychopathology to be an important component of the severity, no variable covering this aspect (e.g. SCL-90-R/GSI, EDI-F4 ‘ineffectiveness’) remained in the final model. We cannot rule out that evaluating the level of general psychopathology with the SCL-90-R or the general factors of the EDI is insufficient. It may be necessary to supplement it with specific measures for anxiety, depression, personality disorder or social adjustment. To summarize the results, a severity index can be derived from the presented model that is highly related to expert ratings of severity. This is a very parsimonious model, as only 2 (of 8) scales and a ranking of freq-binges are needed – given that a full SIAB-EX-interview takes a full hour or more. We recommend that operationalizations of outcome in treatment studies and global measurement of severity of bulimia nervosa should include both the behavioral and the cognitive/emotional aspects of the disorder. Each of these 3 components is relevant (importance ranging from 0.22 to 0.42) to statistically predict severity. Furthermore, these 3 components are important diagnostic criteria for the diagnosis of the disorder, both in DSM-IV and ICD-10.

Severity of Disorder From our results, we concluded that the best operationalization of the syndrome severity of bulimia nervosa combines a behavioral measure of freq-binges (SIABEX) and measures of cognitive-emotional aspects of the disorder (EDI scales ‘bulimia’ and ‘drive for thinness’). ‘Drive for thinness’ is an important aspect of bulimia, as patients aiming to reduce weight remain in the circulus vitiosus of restricted eating and craving for food. ‘Drive for thinness’ and restricted eating are considered important factors maintaining the disorder [31, 32]. To include a measure of cognitive occupation with thoughts of eating and the urge to binge also clinically makes sense, as patients are impaired by their fixation on thoughts of eating as well as by the eating behavior itself. Some patients even report that continuous thoughts about eating are more relevant for their social withdrawal, problems at work and well-being than the pathological behavior itself. Furthermore, patients that succeed in reducing their frequency of binges to less than 2 per week would no lon-

Discrimination between Health and Pathology Using the results of regression with optimal scaling for the selected model, it is possible to separate remitted and subclinical cases from marked and severe cases with a high efficiency. With this classification quality, the model might be used in determining outcome in treatment studies, and as a support tool for clinical decision making. The labels of the overall severity rating include an indication for psychotherapy; thus, the expert rating also has implications on the determination of being a clinical case, with consequences for further clinical action. Clinical action is very much dependent on the national/local health care system, e.g. very severe bulimia can of course only lead to inpatient treatment in health care systems providing this treatment option. Nevertheless, clinicians from many cultures might well consent to our classification into those 2 classes. Depending on the aim of clinical decision making, either high sensitivity or high specificity may be important.

Severity of Bulimia Nervosa

Psychopathology 2009;42:22–31

29

For example, a high specificity would allow for an economic selection of pathological cases (only a few false positives might get unnecessary treatment). On the other hand, a high sensitivity is necessary to correctly identify the cases needing treatment, and thus not withhold treatment when necessary. We showed that our results allow for either high sensitivity with acceptable specificity or high specificity with acceptable sensitivity. Discrimination within the Range of Pathology An accurate prediction of severity rating within the range of pathology would be useful for decision making in stepped care, although clinical action and decisions on the service level (self-help, outpatient, day clinic, inpatient) depend very much on the national/local health care system. For example, severe bulimia can only be treated in an inpatient setting in health care systems providing this treatment option. The predictors included in our model show lower accuracy in distinguishing the pathological severity ratings (2–4) from each other. This lack of discrimination within the pathology range was also found when including other and/or more predictors in the model. So, for the purpose of discrimination within the pathological range, we would recommend adding another set of variables, adding further aspects of pathology like presence or severity of personality disorder, chronicity, social adjustment, parasuicidal behavior, anxiety or depression. Regression with Optimal Scaling The results of the regression with optimal scaling clearly show nonlinear (monotone) relationships between the predictors and the severity rating of bulimia nervosa. With standard linear regression, we obtained roughly the same results for the binary classification into health and illness. However, with linear regression, the lack of discriminative power in the pathological range of EDI-F1 and EDI-F2 especially, implying that persons with less severe pathological ratings of eating disorder cannot be well distinguished from persons with more severe ratings, is not revealed. Also, the good discrimination between no and slight bulimia nervosa is obscured by the linear requirement. Using regression with optimal scaling, the discriminative characteristics of the predictors become immediately clear by inspecting the transformations plots. Furthermore, by loosening the restriction of linear relations to monotonic relations, the apparent explained variance with our model increased from 58.2 to 67.0%. So, CATREG is more suited for our data than linear regression, and we prefer CATREG above standard 30

Psychopathology 2009;42:22–31

nonlinear regression because of the freedom in modeling nonlinear relations (nonlinear regression requires a predefined mathematical function). The Lasso The Lasso provides a solution for the problem of selecting a sparse set of predictors with good prediction accuracy from a collection of intercorrelated predictors. Stepwise regression methods, like forward and backward selection, do not provide models with stable estimates of the regression coefficients (among other problems [33]), and are therefore not preferred. Implementation of the Lasso and the .632 bootstrap in CATREG is scheduled for SPSS Categories version 17. For linear regression, the Lasso is also available in the LARS software for R (http:// www.r-project.org) and S-Plus (http://www-stat.stanford. edu/⬃tibs/lasso.html). Answers to the Research Questions Using regression with optimal scaling to reveal nonlinear transformations, and the Lasso and the 632 bootstrap to select a sparse model with stable predictors, we identified a subset of 3 variables that predicts the expert severity rating of bulimia nervosa very well (namely: the freq-binges and the EDI-factors 1 ‘bulimia’ and 2 ‘drive for thinness’). Determining cutoff values for classification into pathological and healthy states, we showed that the predicted values resulting from the presented model can be used for case or outcome classification with high sensitivity and specificity. We also showed that discrimination in the lower scores region of the predictors is better than in the higher scores region. Especially the higher scores of the EDI factors do not discriminate patients with marked, severe and very severe bulimia nervosa well. It is open to further investigation to replicate or challenge the results with other instruments or other samples.

Acknowledgments The authors express their appreciation to the patients, who invested much time in completing questionnaires, and to the therapists of the ward and the day clinic for their continuous support of this research. We are grateful as well to Elvira Bozkaya for help in monitoring the data, and to Thomas Herzog, who helped to establish some important foundations for this work. Leiden University holds the copyright of the procedures in the SPSS package Categories, and the Department of Data Theory receives the royalties.

Hartmann /Zeeck /van der Kooij

References 1 Clausen L: Review of studies evaluating psychotherapy in bulimia nervosa: the influence of research methods. Scand J Psychol 2004;45:247–252. 2 Clausen L: Time course of symptom remission in eating disorders. Int J Eat Disorder 2004;36:296–306. 3 Olmsted MP, Kaplan AS, Rockert W: Defining remission and relapse in bulimia nervosa. Int J Eat Disorder 2005;38:1–6. 4 American Psychiatric Association: Diagnostic and Statistical Manual of Mental Disorders, ed 4. Washington, American Psychiatric Association, 1994. 5 Casper RC: Depression and eating disorders. Depress Anxiety 1998;8(suppl 1):96–104. 6 Tobin DL, Griffing AS: Coping and depression in bulimia nervosa. Int J Eat Disorder 1995;18:359–363. 7 Levy AB, Dixon KN, Stern SL: How are depression and bulimia related? Am J Psychiatry 1989;146:162–169. 8 Cooper PJ, Fairburn CG: The depressive symptoms of bulimia nervosa. Brit J Psychiatry 1986;148:268–274. 9 Cassin SE, von Ranson KM: Personality and eating disorders: a decade in review. Clin Psychol Rev 2005;25:895–916. 10 Rounsaville BJ, Alarcón RD, Andrews G, Jackson JS, Kendell RE, Kendler K: Basic nomenclature issues for DSM-V; in Kupfer DJ, First MB, Regier DA (eds): A Research Agenda for DSM-V. Washington, American Psychiatric Association, 2002. 11 Jacobson NS, Follette WC, Revenstorf D: Psychotherapy outcome research: methods for reporting variability and evaluating clinical significance. Behav Ther 1984; 15: 336– 352. 12 Jacobson NS, Truax P: Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol 1991; 59:12–19.

Severity of Bulimia Nervosa

13 Jacobson NS, Roberts LJ, Berns SB, McGlinchey JB: Methods for defining and determining the clinical significance of treatment effects: description, application, and alternatives. J Consult Clin Psychol 1999; 67: 300–307. 14 Garner DM: Eating Disorder Inventory-2: Professional Manual. Odessa, Psychological Assessment Resources, 1991. 15 Meermann R, Napierski CH, Schulenkorf EM: EDI-Münster – Selbstbeurteilungsfragebogen für Essstörungen; in Meermann R, Vandereycken W (eds): Therapie der Magersucht und Bulimia nervosa: Ein klinischer Leitfaden für den Praktiker. Berlin, Walter de Gruyter, 1987. 16 Fichter M, Quadflieg N: Six-year course of bulimia nervosa. Int J Eat Disorder 1997; 22: 361–384. 17 Fichter M: Structured Interview for Anorexia and Bulimia Nervosa (SIAB): development of a new instrument for the assessment of eating disorders. Int J Eat Disorder 1991; 10: 571–592. 18 Fichter M, Quadflieg N: The structured interview for anorexic and bulimic disorders for DSM-IV and ICD-10 (SIAB-EX): reliability and validity. Eur Psychiat 2001; 16: 38– 48. 19 Fichter MM, Herpertz S, Quadflieg N, Herpertz-Dahlmann B: Structured Interview for Anorexic and Bulimic disorders for DSM-IV and ICD-10: updated (third) revision. Int J Eat Disorder 1998;24:227–249. 20 Derogatis LR: SCL-90-R, administration, scoring & procedures manual-I for the R(evised) version. Baltimore, Johns Hopkins University School of Medicine, 1977. 21 Franke G: Die Symptom-Checkliste von Derogatis (SCL-90-R) – Deutsche Version – Manual, ed 2 . Göttingen, Hogrefe, 2002. 22 Gifi A: Nonlinear multivariate analysis, ed 2. Chichester, John Wiley, 1990.

23 Van der Kooij A, Meulman JJ: Regression with optimal scaling; in Meulman JJ, Heiser WJ, SPSS (eds): SPPS Categories 13.0, ed 10. Chicago, SPSS, 2004, pp 107–157. 24 Meulman JJ, Heiser WJ, SPSS Inc: SPSS Categories 10.0, ed 10. Chicago, SPSS Inc, 1999. 25 Meulman JJ, Heiser WJ, SPSS Inc: SPSS Categories 13.0, ed 10. Chicago, SPSS Inc, 2004. 26 Tibshirani RJ: Regression shrinkage and selection via the lasso. J R Stat Soc B 1996; 58: 267–288. 27 Van der Kooij A, Meulman JJ: Accuracy of regression with optimal scaling transformations: the .632 bootstrap with CATREG. Submitted for publication, 2008. 28 Van der Kooij A, Meulman JJ: Regularization with ridge penalties, the lasso, and the elastic net for regression with optimal scaling transformations. Submitted for publication, 2008. 29 Efron BE: Estimating the error rate of a prediction rule: improvements on cross-validation. J Am Stat Assoc 1983;78:313–316. 30 Pratt JW: Dividing the indivisible: using simple symmetry to partition variance explained; in: Pukkila T, Puntanen S (eds): Proceedings of the Second International Conference in Statistics. Tampere, University of Tampere, 1987, pp 245–260. 31 Fairburn CG, Stice E, Cooper Z, Doll HA, Norman PA, O’Connor ME: Understanding persistence in bulimia nervosa: a 5 year naturalistic study. J Consult Clin Psychol 2003; 71:103–109. 32 Halmi KA, Agras WS, Mitchell J, Wilson GT, Crow S, Bryson SW, et al: Relapse predictors of patients with bulimia nervosa who achieved abstinence through cognitive behavioral therapy. Arch Gen Psychiatry 2002; 5:1105–1109. 33 Judd C, McClelland G: Data Analysis: A Model-Comparison Approach. San Diego, Harcourt Brace Jovanich, 1989.

Psychopathology 2009;42:22–31

31