Goal attainment scaling - Springer Link

Journal of Psychopathology and Behavioral Assessment, Vol. 15, No. 4, 1993

Goal Attainment Scaling: An Idiosyncratic Method to Assess Treatment Effectiveness in Agoraphobia Edwin de Beurs, 1&4 Alfred Lange, 1 Roland W. B. Blonk, a Peter Koele, 3 Anton J. L. M. van Balkom, 2 and Richard Van Dyck 2 Accepted: October 17, 1993

Goal attainment scaling (GAS) is an individually tailored way to measure treatment gains, using a highly standardized procedure. An advantage of the method is that it takes into account individual characteristics of the patients, and at the same time the data are suitable for quantitative analysis and comparable across patients. Despite the wide acceptance and use of the method in the evaluation of psychotherapy, data on its psychometric properties are rather scarce. In the current study, GAS was used as one of several outcome measures in a research project on the effectiveness of various treatments for panic disorder with agoraphobia. Guidelines for GAS are presented as well as data on the reliability and validity of the procedure. Results indicate that the procedure is reliable, valid, and sensitive to the improvement of patients during treatment. Comparison o f G A S with standardized measures revealed considerable concordance, although the clinical end status of patients diverged somewhat dependent on the measure considered. KEY WORDS: goal attainment scaling; treatment changes; validity; panic disorder; agoraphobia.

1Department of Clinical Psychology, University of Amsterdam, Amsterdam, The Netherlands. 2Department of Psychiatry, Free University, Amsterdam, The Netherlands. 3Department of Methodology, University of Amsterdam, Amsterdam, The Netherlands. 4To whom correspondence should be addressed at Department of Clinical Psychology, University of Amsterdam, Roetersstraat 15, 1018 WB, Amsterdam, The Netherlands. 357 o882-2689/93/12oo-o3575o7.oo/o 9 1993PlenumPublishiingCorporation

358

de Beurs et al.

INTRODUCTION One of the biggest challenges in the evaluation of psychotherapy is the reliable and valid assessment of change. Various outcome measures are used to assess changes brought about by psychotherapy. One aspect in which these outcome measures differ is their place on the dimension nomothetic-idiographic, that is, whether the measurement instrument is tailored to the individual or made suitable for a variety of patients. The rating of a therapist after the treatment is completed is a preeminent example of an idiographic report: Such a report is aimed at individuals with their unique problems. A self-report measure such as the SCL-90-R (Derogatis, 1977), on the other hand, is a good example of a nomothetical measure: By keeping the instructions and questions identical for all respondents, one gathers information in a highly standardized fashion, thus ensuring comparability across patients. Subjects with higher scores suffer from more severe complaints than subjects with lower scores on the instrument. These two examples illustrate the pros and cons of both methods. While the nomothetical method can yield psychometrically sound data, the research evaluates some patients on variables that are irrelevant to their specific complaints and perhaps neglects some complaints not addressed in the assessment instrument. In contrast, a method of assessment more tailored to the individual takes into account the diversity of patients yet may also yield scores incomparable across patients. Further, it is usually impossible to investigate the reliability of data obtained in an idiographic way. Therefore, individualized measures are usually of limited value in controlled research evaluating psychotherapy. Kiresuk and Sherman (1968) have developed goal attainment scaling (GAS) as a solution to the dilemma of choosing the proper measurement instrument. More specifically, GAS is a method for developing individualized, multi-variable scaled descriptions of goals with respect to treatment outcome. In short, GAS boils down to selecting target complaints before therapy and assessing the status of the patient after treatment with respect to these complaints. Preceding treatment, goals to be attained are formulated. For each goal, a 5-point scale of likely treatment outcomes is constructed, ranging from least to most favorable. The midpoint of the scale is defined as the most likely outcome based on the severity of the complaints of the patient and the treatment under evaluation. The various goals are described in concrete and specific terms using indicators for the target behavior. The procedure resuits in a set of outcome scales, called a "GAS follow-up guide." After treatment the progress of the patient toward the goals specified is reviewed. The position of the patient on the scales is determined and a

Goal Attainment Scaling

359

composite goal attainment score can be computed. GAS is an idiographic measure, as the goals selected are specific to the individual. Additionally, the method by which a follow-up guide is completed is highly formalized and standardized, thus opposing the drawbacks of more diffuse and unstructured observations, typical for idiographic measures. Therefore, under the condition that GAS has been reliably set up, the GAS scores of different patients can be meaningfully compared, which makes GAS potentially suitable for controlled studies comparing the efficacy of various treatments. Since its introduction, GAS has been used in a broad variety of therapeutic settings and GAS techniques are widespread. Extensive reviews on the literature regarding GAS are given by Cytrynbaum, Ginath, Birdwell, and Brandt (1979), Choate, Smith, Cardillo, and Thomson (1981), Mintz and Kiesler (1982), and Emmerson and Neely (1988). The method was enthusiastically introduced in inpatient and outpatient clinics for a variety of problems or complaints. GAS has been applied outside mental health settings as well, such as in educational programs (Maher, 1983). As an evaluation device for psychotherapy, the method is especially appealing to clinicians. This is likely attributed to its idiographic nature, which suggests that it better addresses the idiosyncracies of individual patients in comparison to standardized measures. Beyond use of the method as an evaluation tool, GAS has been put forth as an adjunct to treatment. Stating specific and precisely circumscribed goals may focus both the therapist and the patient on a target and facilitate progress toward it. Use of GAS in controlled outcome research is less widespread. Data on the reliability and validity of the method are scarce (Lewis, Spencer, Haas, & DiVittis, 1987; Mintz & Kiesler, 1982). In particular, there is a need to compare outcome measured with GAS to outcome assessed by standardized measures. In a previous study with a similar patient sample, experience was gained with the application of GAS as an outcome measure (de Beurs, Lange, Blonk et al., 1991). Furthermore, the reliability and validity of GAS were examined. The preliminary data suggested that the method was a suitable tool for the assessment of treatment outcome and a valuable adjunct to standardized outcome measures. In the present study, GAS was used in a comparative outcome study of various treatments for panic disorder with agoraphobia. To investigate the psychometric properties of the GAS, the data of the various treatment groups have been combined. The scales were inspected and the outcome according to GAS is compared with the results from other outcome measures.

de Beurs a aL

360

METHOD Subjects Forty patients, all meeting DSM-III-R criteria for panic disorder with moderate or severe agoraphobia (300.21; American Psychiatric Association, 1987), participated in the study at the Outpatient Clinic for Anxiety Disorders at the Free University in Amsterdam, The Netherlands. Patients had an intake session with a psychiatrist. The diagnoses were confirmed in a second session with a resident in psychiatry or a clinical psychologist who made use of the Anxiety Disorders Interview Schedule-Revised (Di Nardo, O'Brien, Barlow, Waddell, & Blanchard, 1983; de Ruiter, Garssen, Rijken, & Kraaimaat, 1987). The mean age of the sample was 40.4 years (SD = 9.4); the mean duration of agoraphobic complaints was 13.2 years (SD = 11.9), with a mean severity of 31.5 (SD = 6.2) on the agoraphobia subscale of the Fear Questionnaire (range, 0-40), indicating that most patients could be considered severely agoraphobic. Demographic data such as housing, level of education, and social class revealed no differences between the sample and the general population. The gender was skewed: 75% of the patients were female, in accordance with the prevalence of the disorder in the general population (Boyd et al., 1990).

Treatment Patients were randomly assigned to the following four treatment conditions: (1 and 2) a double-blind placebo-controlled fluvoxamine condition combined with exposure in vivo, (3) psychological panic management (repeated hyperventilation provocations and breathing retraining) combined with exposure in vivo, and, (4) exposure in vivo only. All treatments consisted of 12 weekly sessions delivered by experienced therapists (psychotherapist and psychiatrists), who had been extensively trained in behavior therapy. On the mean they had 7 years of experience in clinical practice, the greater part of which was attained in our anxiety disorders clinic. The therapists were assisted by a detailed manual to ensure a standardized administration of the treatments. Further details on the design of the study and the procedures that are of no relevance to GAS are presented elsewhere (de Beurs, van Balkom, Lange, Van Dyck, & Koele, 1993).


361

GAS Procedure Kiresuk and Sherman (1968) describe three basic requirements of the procedure. First, the goals are to be set independently of the therapeutic process, implying that the construction of the scales should not be done by the therapist, but rather by an independent assessor. Second, patients should be randomly assigned to treatment conditions following the goal-setting. Third, the assessment of the level of goal attainment after treatment should be done again by independent raters and not by the therapists. These requirements were met in the following way. After the last intake session but prior to random assignment to one of the treatment conditions, the GAS was carried out by independent observers (goal setters). Construction of the GAS involved carrying out 19 steps of a comprehensive protocol (Blonk & Hondema, 1990), which can be summarized as follows: (1) Selection of goal areas. Selection of goal areas was based on information from the initial interview of the psychiatrist and on information from self-report of the patient. To be selected as a goal area, a problem had to meet two requirements: The nature of the problem had to be such that (a) it was possible to do something about the problem and (b) the treatment would attend to it. Usually, three to five goal areas were selected, which met these requirements. (2) Interv&wing the patient. During a 45-min interview with the goal setter, information was obtained from the patient in order to identify goal areas and formulate specific indicators of various levels of attainment within a goal area. (3) Formulation of the "expected outcome." For each goal area the most probable result was stated, taking into account the duration of the treatment and the severity of complaints. Outcome had to be stated in terms sufficiently unambiguous so that two observers could agree on whether it had been attained. (4) Completion of the other scale levels. First, the two extremes of the scale, the least and the most favorable outcome, were specified and, next, the levels in between. A completed scale had to encompass a set of five mutually exclusive scale points, representing realistic and probable outcomes for a particular goal area. Figure 1 presents a completed GAS follow-up guide. Upon reviewing the literature on GAS it becomes evident that numerous alterations and adaptations have been done on the original procedure from Kiresuk. Often the goals are set in consultation with the patient (Calsyn, Tornatsky, & Dittmar, 1977), resulting in goal setting in itself having a possible therapeutic effect. Although this may be a favorable side effect in therapeutic

362

de Beurs a aL

Evaluation form GAS Nr: 920312 Patient: Miss. G. Date: 12499-92

Nr. form: 1 Therapist: B. Made by: Assessor Scale 1

Level of goal Attainment:

Goal:

Scale 2

Goal:

Scale 3

Goal:

To be on the street Shopping in a on her own supermarket

Being able to travel alone by car

Being confined to

Not able to enter the supermarket

Totaly unable to drive her car

Less than expected Able to leave her success with house, but not to treatment leave her street

Able to enter but not able to do her shopping

Able to drive on two lane roads accompanied

Expected level of treatment success

Shops for 15 minutes during quiet shopping time

Drives on two i lane roads alone

Able to shop indefinitely during quiet shopping time

Travels on the highway accompanied

Most unfavorable : outcome thought likely

More than expected success with treatment

Best anticipated success with treatment

her house

Walks in her own neighbourhood without a companion

! Able to walk around in the ! entire city

Able to travel Able to shop without restriction indefinitely during rush hour

Scale 4

Goal:

Travels on the highway alone

Fig. 1. Example of a GAS follow-up guide.

practice (cf. France, 1985; LaFerriere & Calsyn, 1978), an effect of the m e a s u r e m e n t i n s t r u m e n t on the t r e a t m e n t is undesirable f r o m a res e a r c h e r ' s perspective. T h e r e f o r e , we discarded consultation with the patient in setting goals, and instead goals were set based on the information gathered in the intake interview. There was, however, one pretreatment interview with the patient in which suitable indicators of goals were explored. In this interview, the exact content of the scales as well as the precise goals, were withheld from the patients.


363

It is possible with the GAS procedure to assign weights to the various problem areas. This feature was added to GAS to accommodate general usage in clinical practice, where change in one area is usually considered of greater importance than change in another area. Nevertheless, we did not make use of assigning differential weights to scales of the GAS for the following reasons. First, it would add considerable subjectivity to the measure, which hampers a standardized formulation of the GAS. Second, some statistical problems are attached to assigning weights, partly because the scales cannot be considered independent (Fiester & Fort, 1978). Third, it introduces superficial precision to the method. Instead of assigning weights, one can better discard goal areas that are considered less important. It was our experience that with agoraphobic patients it is not difficult to find three or four goal areas of almost equal importance. The goals were set and the scales were constructed by independent judges, advanced students of clinical psychology, who went through an intensive training given by the researchers and the therapists in formulating realistic goals for the treatment under investigation. First, the judges were trained with the setting of goals using medical records of agoraphobic patients. The scales were discussed at length and the expected level of posttreatment functioning was compared with actual data of these patients. Next the goal setters practiced by setting up GAS follow-up guides for patients who were treated at the clinic but did not take part in the comparative outcome study. In addition, they were closely supervised during the execution of the study by the researchers. During the treatment the therapists were blind to the selected goals for their patients. Finally, it is important to mention that randomization over the treatment conditions took place after the construction of the GAS follow-up guide. Thus, the goal setters were blind to the treatment to which the patient was assigned. This was done since knowledge about the specific treatment could influence their expectations about the outcome, thus obscuring differential treatment effects (cf. Calsyn et al., 1977). A total of 159 goals was set for 40 patients. Most goals pertained to walking on the street (30 patients), shopping (22 patients), traveling by public transportation (21 patients) or driving a car (14 patients), getting panic under control (15 patients), and staying alone at home (12). These six goal areas were most frequently selected, and together they constituted 72% of the goals. For every patient at least one of these goal areas was used. The remaining 41 goals pertained to a smaller number of patients. Goals were set, such as visiting friend or

364

de Beurs a aL

relatives (six patients), going to (movie) theaters or churches (four patients), visiting restaurants (three patients), and taking a trip far away from home (three patients). All goals concerned problems usually encountered by patients with panic disorder with agoraphobia. Other conceivable complaints, such as marital distress or depression, were not used as goals, since these problems were not addressed in the protocol treatments. Following the formulation of the GAS follow-up guide, the patients were randomly assigned to one of the four treatment conditions. After treatment had concluded the patients were invited for the posttest assessment. In a concluding interview the position of each patient on the various scales was assessed. Scores on the scales were combined using the equation proposed by Kiresuk and Sherman (1968):

10 E w~i T

= 50+/

(I-p)

Y~wi2+O(Y~wi) 2

where wi = the weight of the ith goal attainment scale

xi = the score on the ith scale p = the weighted average correlation among the scales Since we refrained from assigning weights to the different scales, as outlined above, the equation is considerably simplified. With p set to .30, as suggested by Kiresuk and Sherman (1968), the equation becomes T = 50 + 10Ck Exi, where Ck = .456 in the case of three scales, Ck = .363 for four scales, and ck = .302 for five scales. Additional Outcome Measures

Self-Report Questionnaires. A comprehensive assessment battery of self-report measures included the Fear-Questionnaire (FQ; Marks & Mathews, 1979), the Mobility Inventory (MI; Chambless, Caputo, Jasin, Gracely, & Williams, 1985), the Hopkins Symptom Check List (SCL-90;


365

A r r i n d e l l & E t t e m a , 1986; D e r o g a t i s , 1977), the A g o r a p h o b i c Cognitions Questionnaire and Bodily Sensations Questionnaire (ACQ and BSQ; Chambless, Caputo, Bright, & Gallager, 1984), the Beck D e p r e s s i o n I n v e n t o r y ( B D I ; Beck, W a r d , M e n d e l s o n , Mock, & E r b a u g h , 1961; L u t e i j n & B o u m a n , 1988), and the D e p r e s s i o n A d j e c t i v e C h e c k List ( D A C L ; L u b i n , 1965). T h e p s y c h o m e t r i c properties of the measures, investigated with the data of a larger group o f a g o r a p h o b i c s (N = 99), a p p e a r e d to be o u t s t a n d i n g . M o s t (sub)scales proved to be stable over a 6-week test-retest period (mean r= = .81) and the internal consistency of all subscales was very high (mean Cronbach's ~ = .87). To limit the number of outcome variables and further increase their reliability, subscales were combined into three composites by summing the scores after correction for scale length. A composite for agoraphobia consisted of both the agoraphobia subscales of the FQ and SCL-90 and the Mobility Inventory avoidance alone scale. A composite for depression consisted of the BDI, the DACL, and the SCL-90 depression subscale. A composite for somatic anxiety consisted of the BSQ fear and frequency scales and the A C Q physical consequences scale. Behavioral Avoidance. A multitask Behavioral Avoidance Test (M-BAT) was designed as one of the outcome measures in the investigation. It consisted of three tasks: a 3-mile walk from the clinic to the city center of Amsterdam, shopping in a nearby supermarket, and a short ride on public transportation. The performance on the tasks and the anxiety experienced during the tasks were combined to yield one score reflecting better performance and less anxiety. In an earlier study, the reliability and validity of the M - B A T was investigated (de Beurs, Lange, Van Dyck, Blonk, & Koele, 1991). The test proved to be psychometrically sound and appeared to be a valid measure of the condition of the patients. Results demonstrated sufficient concordance with other measures of agoraphobia and the test appeared sensitive to clinical change. The M-BAT was administered by the goal setter of the GAS. First, the GAS follow-up was formulated, then the M-BAT was administered. This order was chosen to ensure that the setting up of the GAS follow-up guide was not affected by the outcome of the M-BAT. Therapist Rating. The therapists were asked to rate the level of functioning of the patient after the last treatment session on a 9-point scale of global outcome with the following anchor points: 1--4 = deterioration, 5 = no change, 7 = small to average improvement, and 8 9 = considerable improvement or recovery (cf. Newman, 1983).

366

de Beurs et al.

RESULTS GAS Scores

A first prerequisite for a valid GAS procedure is a proper choice of expected level of attainment (Heavlin, Lee-Merrow, & Lewis, 1982). Systematic underestimating or overestimating of the expected level would result in misjudging to the positive or negative of the outcome of treatment. Moreover, it would hamper the administration of the scales at the posttest, because of ceiling or threshold effects. Whether the appropriate level was chosen was investigated in several ways. First, on a subset of the patients (n = 28), therapists and intakers were asked to evaluate the relevance of the goal area for a patient on a 5-point scale, ranging from "not relevant at all" to "highly relevant." Therapists evaluated the GAS scales of patients not treated by themselves to avoid that knowledge of the goals might have an influence on the treatment at hand. The mean judgment of the therapists was 4.68 (SD = .51) and of the intaker was 4.66 (SD = .52), indicating that the goal areas were suitably chosen. Next, the scales were presented to the therapists and the intakers and they were asked to indicate what level would be attained within a certain goal area (cf. Woodward, Santa-Barbara, Levin, & Epstein, 1978). Both the therapists and the intakers had experience with the treatments and their outcome. Their estimates were compared with the expected level assigned by the goal setters. One hundred eleven scales were compared. In 43% there was perfect agreement between the goal setters and the therapists on the level to be attained at posttest. In another 43% of the goal level estimates, they differed by 1 scale point and in 14% of the cases the therapist and the goal setter differed by 2 scale points. There was neither a systematic overestimation nor a systematic underestimation on behalf of the goal setters. Comparison of the expected level according to the goal setters and according to the intakers was done for 47 scales. It evidenced even more concordance: In 57% there was perfect agreement, in 41% the goal setter and the intaker differed by 1 scale point, and in 2% there was a difference of 2 points. From these results the conclusion can be drawn that there was reasonable concurrence between the goal setters and the therapists and between the goal setters and the intakers on the level to be attained by the patients after treatment. It is, however, possible, that the therapists and the intakers, although experienced with the treatments, tend to overestimate or und e r e s t i m a t e t h e o u t c o m e to the s a m e e x t e n t as the goal setters. Therefore, we investigated the correctness of the expected level in a different way as well. By definition the GAS score is 50 if the patient f u n c t i o n s after t r e a t m e n t as was predicted. A score lower t h a n 50


367

means that the outcome fell short of the expectation; a score higher than 50 means that the treatment was more successful than expected. When GAS is appropriately set up, then the GAS scores of all patients should have a normal distribution, with a mean of 50 and a standard deviation of 10. As Gillespie and Seaberg (1977) state, especially a bias toward stating goals that are too easily attainable would pervade the GAS procedure. The mean GAS score amounted to 51.2 (SD = 15.9), very close to the theoretical mean of 50 (SD = 10). Moreover, analyses r e v e a l e d no skewness (s k = .24, p > .20) or kurtuosis (kr = .70, p > .20) of the distribution. The Kolmogorov-Smirnov statistic indicated that the distribution did not differ significantly from the theoretical distribution (Z = 1.20, p = .11).

Concordance with Other O u t c o m e Measures

Evidence of validity of GAS can be obtained by investigating concordance of GAS and other outcome measures. Pearson product-moment correlations were calculated between the GAS score and the therapist rating. To investigate the correlation between the GAS score and the pre-post measures, residual gain scores were applied. Based on the multitrait m u l t i m e t h o d approach (Campbell & Fiske, 1959), a reasonably high correlation was predicted between GAS and residual gain scores on the composite of agoraphobia (similar "trait"). Between the GAS score, on the one hand, and the composite for somatic anxiety and the composite for depression, on the other, lower correlations were predicted. Finally, a high concordance was expected between GAS and the rating of treatment outcome by the therapist (similar " m e t h o d " or assessment mode; cf. Cytrynbaum et al., 1979). The correlation coefficients that resulted from the analysis are presented in Table I. The concordance between GAS and most other outcome measures is profound. Moreover, the correlations are generally in accordance with the prediction based on the multitrait multimethod approach. The GAS score correlates highly with the composite of agoraphobia questionnaires, with the M-BAT, and with the outcome rating by the therapists (similar trait). It correlates only moderately with the composite for depression (different trait) and is unrelated to the composite for somatic anxiety. The self-report measures in themselves are again concordant (similar method). All in all, the results support the concurrent validity of GAS.

368

de Beurs et al. Table I. Correlation (pmcc) Between the Various Outcome Measures

1.

GAS-score

2. M-BAT 3. Agoraphobia

2

3

4

.57b

.63b

.32a

.17

.43b

--

.54b

.30

.43

.35a

--

.52b

.50b

.48b

--

.47b

.48t'

--

.08

4. Depression 5. Somatic anxiety

5

6

6. Therapist rating ap < .05. bp < .01.

The high correlations raise the question whether GAS is redundant with the other measures or has a unique contribution in measuring outcome. To investigate this issue the multiple correlation between the G A S score, on the one hand, and all other outcome measures, on the other, was assessed, using regression analysis. The multiple correlation of the five variables with the G A S score a m o u n t e d to R = .73 [F(6,39) = 6.43; p = .0001]; therefore, the other measures accounted for 54% of the variance in G A S scores and the conclusion seems warranted that GAS is not redundant with the other measures.

Clinical Significance

A different approach to the inquiry into the validity of G A S consisted of an inspection of the data of those patients with an extreme G A S score (cf. Woodward et al., 1978). Two groups of patients were distinguished: 11 patients for whom treatment gains were much lower than expected (GAS score < 40) and 11 patients who had improved more than expected (GAS score > 60). Additionally, the patients were assigned to one of three clinically meaningful categories: unimproved, reliably improved, and clinically significantly improved (recovered), based on their score on the agoraphobia composite. To do so, the method proposed by Jacobson was used (Jacobson & Truax, 1991). Thus, the clinical end status of the 22 patients with a GAS score to the lower or higher extreme was assessed.

369

Goal Attainment Scaling Table II. Outcome of Treatment According to the Agoraphobia Composite of Patients with High and with Low GAS Scores (n = 22). According to the agoraphobia composite Not improved

Reliably improved

Recovered

GAS < 40

9

1

1

GAS > 60

1

1

9

As Table II shows, most patients with a high GAS score could be considered recovered, while most of the patients with a low score were not improved. There was, however, some divergence in outcome according to both measures: One patient with a disappointing treatment outcome according to the GAS score was considered recovered according to self-report measures. In retrospect, the goals for this patient had been set too high. There was one conflicting result in the opposite direction: One patient with a high GAS score was considered unchanged according to the self-report measures for agoraphobia. This patient had reported considerably fewer complaints at pretest compared to the other patients, thus leaving less room for improvement on the self-report scales. DISCUSSION At first glance, it seems a peculiarity in the GAS protocol to start with the expected level of goal attainment as the midpoint of the scale. This ventures forth a potential source of error in the measure: incorrect estimation of the level to be attained. It seems more straightforward to take the present level of functioning of the patient as the midpoint o f the scale (Beidel, Turner, Bellack, Hersen, & Luber, 1983). However, this would bring forth several serious problems. First, it would be impossible for patients who are extremely dysfunctional to formulate scale levels representing deterioration. Second, it would seriously hamper a meaningful comparison of the end level of different patients: The improvement of one patient, expressed in the number of scale points progressed, is not necessarily equal to the improvement of another patient, who progressed an equal number of scale points. On the other hand, setting up a scale for the expected outcome, with the pretreatment level of functioning taken into consideration, ensures a meaningful comparison of the end score of different patients. However, one should take into account that the GAS score does

370

de Beurs et al.

not represent the actual level of functioning of the patient after treatment. A score of 50 implies that the treatment was as effective as expected. However, a score well above 50 does not necessarily imply better functioning of the patient compared to someone with a lower GAS score; it merely shows that the treatment was more effective than expected. Compared to previous investigations into the validity of GAS, we found high correlations between GAS and other outcome measures. Substantial concordance between the therapist ratings and GAS is usually found, but high correlations (r > .50) between GAS and standardized measures such as self-report questionnaires are remarkable (cf. Cytrynbaum et aL, 1979). The first explanation for this finding is the homogeneity of the patient group, all suffering from agoraphobia, which enables an exact estimate of the treatment outcome to be attained. Also, in other inquiries into concurrent validity of GAS, typically global measures of patients' functioning are used, such as the MMPI. In the present investigation we employed well-established self-report questionnaires and a comprehensive behavioral avoidance test, which both can be considered as more specific outcome measures. The self-report measures, the M-BAT, and the therapist rating taken together explain more than 50% of the variance in the GAS scores. This leaves open the question whether GAS adds something to the already reliable and valid assessment battery in a homogeneous group of patients such as agoraphobics. One might wonder why one would go through the procedures of GAS, since seemingly the instrument does not enhance the determination of treatment outcome. However, our results show that the various outcome measures are not concordant to the extent that they should be considered redundant. Even diverging conclusions emerge, dependent on the outcome measure considered. Most likely, this finding is explained by the conceptual difference between the standardized measures and the individualized GAS. Two examples of patients may clarify this issue. The first patient is seriously restricted in her mobility because she does not dare leave the vicinity of a hospital nearby. She fears she will die from a heart attack and wants to have the opportunity to be rushed to the intensive care unit in time. The more common agoraphobic situations, such as crowded streets and public transportation, are no problem as long as she is near a hospital. The second patient avoids driving on the highway, because he fears losing control over the car. He usually travels on secondary roads, despite the loss of time. Given the nature of his complaints, pursuing his job as a salesman has become virtually impossible. Both patients are severely impaired in their daily functioning, but both will score low on the Fear Questionnaire agoraphobia subscale or on the Mobility Inventory scales. The seriousness of their complaint is fully addressed only on an individualized measure and so are changes brought about by therapy.


371

Some limitations to the use of GAS should be mentioned. First, setting up a GAS follow-up guide can be a time-consuming endeavor. It took an experienced goal setter about 0.75 hr for each patient to go through the entire protocol. However, the posttreatment measurement is less time-consuming; usually the level of attainment is assessed in 10 min. A second limitation is that the reliability of the measure depends largely on a realistic estimation of the level of the goals to be attained. Unfortunately, this aspect of its reliability can be assessed only after a c o n s i d e r a b l e n u m b e r of patients have b e e n m e a s u r e d at posttest. Therefore, some experience with the patients and knowledge of the gains to be expected from the treatment under investigation is a prerequisite. Finally, applied to assess a homogeneous group of patients, GAS becomes conceptually similar to other individualized measures, such as the I-BAT (Craske et al., 1989) or the individualized version of the Phobic Anxiety and Avoidance Scale (PAA; Watson & Marks, 1971). After some experience with this patient group, a reliable estimation of the level of complaints at posttest is quite conceivable. It could well be that estimating the expected level after treatment in a more h e t e r o g e n e o u s patient group, such as schizophrenics, is more complicated. It has often been stressed that for a proper evaluation of treatment, multiple sources to assess the functioning of patients should be incorporated (Lambert, Shapiro, & Bergin, 1986). Essentially, GAS is a highly formalized procedure for the rating of treatment effect by i n d e p e n d e n t assessors. Thereby, it pertains to a different domain of assessment than self-report inventories or behavioral avoidance tests do, which asserts its value. Finally, an advantage of GAS, compared to standardized measurement instruments, is that it takes into account individual characteristics of patients, thereby possibly supplementing crucial information. Because GAS is especially suitable for the measurement of patients with an atypical complaints profile, it can be a meaningful addition to a comprehensive assessment battery in the evaluation of treatments.

ACKNOWLEDGMENTS The authors thank Brigit van Widenfelt for her detailed editorial comments. We also thank the anonymous reviewers for their helpful comments on an early draft of the manuscript.

372

de Beurs et al.

REFERENCES

American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd ed., rev.). Washington, DC: APA. Arrindell, W. A., & Ettema, J. H. M. (1986). SCL-90: Handleiding bij een multidimensionele psychopathologie-indicator. (SCL-90: Manual for a multidimensional indicator of psychopathology). Lisse, The Netherlands: Swets & Zeitlinger. Beck, A. T., Ward, C. H., Mendelson, M., Mock, J. E., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Beidel, D. C., Turner, S. M., Bellack, A. S., Hersen, M., & Luber, R. F. (1983). Using the Goal Attainment Scale to measure outcome in schizophrenia. International Journal of Partial Hospitalization, 2, 33-41. Blonk, R. W. B., & Hondema, M. (1990). Goal attainment scaling en therapie-effect (Goal attainment scaling and treatment outcome). Unpublished master,s thesis. Amsterdam: Department of Psychology, University of Amsterdam, Boyd, J. H., Rae, D. S., Thompson, J. W., Burns, B. J., Bourdon, K., Locke, B. Z., & Regier, D. A. (1990). Phobia: Prevalence and risk factors. Social Psychiatry and Psychiatric Epidemiology, 25, 314-323. Calsyn, R. J., Tornatsky, L., & Dittmar, S. (1977). Incomplete adaptation of an innovation: The case of goal attainment scaling. Evaluation Quarterly, 4, 127-130. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation of the multimethod matrix. Psychological Bulletin, 56, 81-105. Chambless, D. L., Caputo, G. C., Bright, P., & Gallager, R. (1984). Assessment of fear of fear in agoraphobics: The Body Sensations Questionnaire and the Agoraphobic Cognitions Questionnaire. Journal of Consulting and Clinical Psychology, 52, 1090-1097. Chambless, D. L., Caputo, G. C., Jasin, S. E., Gracely, E. J., & Williams, C. (1985). The Mobility Inventory for agoraphobia. Behavior Research and Therapy, 23, 35-44. Choate, R., Smith, A., Cardillo, J. E., & Thomson, L. (1981). Training in use of goal attainment scaling. Community Mental Health Journal, 17, 171-181. Craske, M. G., Street, L., & Barlow, D. H. (1989). Instructions to focus upon or distract from internal cues during exposure treatment of agoraphobic avoidance. Behavior Research and Therapy, 27, 663-672. Cytrynbaum, S., Ginath, Y., Birdwell, J., & Brandt, L. (1979). Goal attainment scaling: A critical review. Evaluation Quarterly, 3, 5-40. de Beurs, E., Lange, A., Blonk, R., Van Dyck, R., van Balkom, A. J. L. M., & van Daal, M. (1991). De waarde van Goal Attainment Scaling voor het vaststellen van therapie-effect bij agorafobici (The value of goal attainment scaling for the assessment of treatment effect with agoraphobic patients). Gedragstherapie, 24, 235-252. de Beurs, E., Lange, A., Van Dyck, R., Blonk, R. W. B., & Koele, P. (1991). Behavioral assessment of avoidance in agoraphobia. Journal of Psychopathology and Behavioral Assessment, 13, 285-300. de Beurs, E., van Balkom, A. J. L. M., Lange, A., Van Dyck, R., & Koele, P. (1993). Treatment of panic disorder with agoraphobia. I. A comparison of fluvoxamine, placebo, psychological panic management, and exposure in vivo (submitted for publication). de Ruiter, C., Garssen, B., Rijken, H., & Kraaimaat, F. (1987). Anxiety Disorders Interview Schedule-Revised: Dutch translation. Utrecht: Department of Psychiatry, University of Utrecht, The Netherlands. Derogatis, L. R. (1977). SCL-90. Administration, scoring & procedures manual-I for the R(evised) version and other instruments of the psychopathology rating scales series. Baltimore: Clinical Psychometrics Research Unit, Johns Hopkins University School of Medicine. Di Nardo, P. A., O'Brien, G. T., Barlow, D. H., Waddell, M. T., & Blanchard, E. B. (1983). Reliability of DSM-III anxiety disorder categories using a new structured interview. Archives of General Psychiatry, 40, 1070-1075.


373

Emmerson, G. J., & Neely, M. A. (1988). Two adaptable, valid, and reliable data-collection measures: Goal attainment scaling and the semantic differential. Counseling Psychologist, 16, 261-271. Fiester, A. R., & Fort, D. J. (1978). A method of evaluating the impact of services at a comprehensive community mental health center. American Journal of Community Psychology, 6, 291-302. France, K. (1985). Goal attainment scaling as a therapeutic aid in crisis intervention. Crisis-Intervention, 14, 11-20. Gillespie, G. F., & Seaberg, J. R. (1977). Individual problem rating: A proposed scale. Administration in Mental Health, 5, 21-29. Heavlin, W. D., Lee-Merrow, S. W., & Lewis, V. M. (1982). The psychometric foundations of goal attainment scaling. Community Mental Health Journal, 18, 230-241. Jacobson, N. 8., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Kiresuk, T. J., & Sherman, S. E. (1968). Goal attainment scaling: A general method for evaluating comprehensive community mental health programs. Community Mental Health Journal, 4, 443-453. LaFerriere, L., & Calsyn, R. (1978). Goal attainment scaling: An effective treatment technique in short-term therapy. American Journal of Community Psychology, 6, 271-282. Lambert, M. J., Shapiro, B. A., & Bergin, A. E. (1986). The effectiveness of psychotherapy. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change. New York: Wiley. Lewis, A. B., Spencer, J. H., Haas, G. L., & DiVittis, A. (1987). Goal attainment scaling: Relevance and replicability in follow-up of patients. Journal of Nervous and Mental Disease, 175, 408-418. Lubin, B. (1965). Adjective checklist for the measurement of depression. Archives of General Psychiatry, 12, 57-62. Luteijn, F., & Bouman, T. K. (1988). De validiteit van de Beck's Depression Inventory. (The validity of Beck's Depression Inventory). Nederlands Tijdschrift voor de Psychologic, 43, 340-343. Maher, C. A. (1983). Goal attainment scaling: A method for evaluating special education services. Exceptional Children, 49, 529-536. Marks, I. M., & Mathews, A. M. (1979). Brief standard self-rating for phobic patients. Behavior Research and Therapy, 17, 263-267. Mintz, J., & Kiesler, D. J. (1982). Individualized measures of psychotherapy outcome. In P. C. Kendall & J. M. Butcher (Eds.), Handbook of research methods in clinical psychology (pp. 491-533). New York: Wiley. Newman, F. L. (1983). Therapist's evaluation of psychotherapy. In M. J. Lambert, E. R. Christensen, & S. S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 498-536). New York: Wiley. Watson, J. P., & Marks, I. M. (1971). Relevant and irrelevant fear in flooding: A crossover study of phobic patients. Behavior Therapy, 2, 275-293. Woodward, C. A., Santa-Barbara, J., Levin, S., & Epstein, N. B. (1978). The role of goal attainment scaling in evaluating family therapy outcome. American Journal of Orthopsychiatry, 48, 464-476.