Cross-Validation of the Behavioral and Emotional Rating Scale-2 ...

J Child Fam Stud DOI 10.1007/s10826-006-9117-y ORIGINAL PAPER

Cross-Validation of the Behavioral and Emotional Rating Scale-2 Youth Version: An Exploration of Strength-Based Latent Traits Michael J. Furlong · Jill D. Sharkey · Peter Boman · Roslyn Caldwell

C

Springer Science+Business Media, LLC 2007

Abstract High-quality measurement is a necessary requirement to develop and evaluate the effectiveness of programs that use strength-based principles and strategies. Using independent cross-validation samples, we report two studies that explored the construct validity of the BERS-2 Youth Report, a popular measure designed to assess youth strengths, whose conceptual structure has not yet been examined. In Study 1, an exploratory factor analysis found a four-factor solution with conceptual support, which included both internal assets associated with (a) the management of emotions and positive social interaction skills and (b) engagement in the important social contexts of family and school. In Study 2, confirmatory factor analyses found reasonable model fit for the BERS-2 five-factor structure and superior model fit for the more parsimonious four-factor solution found in Study 1. In future studies, parallel reporting of the four-factor model may provide additional insight to the nature and structure of the BERS-2 Youth Version’s clinical validity and utility when compared with the five-factor model, thus potentially contributing to a broader objective to develop a better understanding of important strength-based latent traits. Keywords Strengths . Assessment . Rating scales . Test validity . Youth self-report

M. J. Furlong () Gevirtz Graduate School of Education, Department of Counseling, Clinical, and School Psychology, University of California, Santa Barbara, CA 93106-9490, USA e-mail: [email protected] J. D. Sharkey Center for School-Based Youth Development, University of California, Santa Barbara, Santa Barbara, CA, USA P. Boman School of Education, James Cook University, Cairns, Queensland, Australia R. Caldwell Department of Psychology, John Jay College of Criminal Justice, City University of New York, New York, NY, USA Springer

J Child Fam Stud

There has been a surge of advocacy for positive psychology as a legitimate topic of research in its own right and to offer an alternative to the predominance of deficit-based models that have defined research regarding the mental health needs of youth (Seligman, Steen, Park, & Peterson, 2005). Reflecting this growing interest, recent journal special issues have examined the values that accrue from integrating information about internal and external assets into prevention and treatment program planning—School Psychology Quarterly (Huebner & Gilman, 2003), Psychology in the Schools (Chafouleas & Bray, 2004), California School Psychologist (Jimerson, 2004), and the Journal of School Health (Blum & Libbey, 2004). In addition to discussing the theoretical rationales and clinical approaches of positive psychology, as in any research field, there is a need to develop, enhance, and refine the conceptual and psychometric foundation of measures that assess relevant positive psychology constructs. This is what Epstein (1999) and others have called “strength based” assessment, which is based on the hypothesis that prevention and intervention services are enhanced when they incorporate a youth’s personal strengths and his or her external social resources (Jimerson, Sharkey, Nyborg, & Furlong, 2004; Rhee, Furlong, Turner, & Harari, 2001). Absent reliable and valid procedures to assess youth strengths, it is not possible to develop and evaluate the effectiveness of programs that purport to use strength-based principles and strategies (Libbey, 2004). Without high-quality assessments, positive youth development and its focus on resiliency and strengths becomes primarily a philosophical or values orientation. To operate from a science base, high-quality measurement is a necessary requirement. Assessments have begun to be developed, but have not had time to fully mature, and theoretical models to guide scale development have not been fully explored (Libbey, 2004; O’Farrell & Morrison, 2003). Within this context, Epstein and colleagues offer one instrument designed to measure youth strengths—the Behavioral and Emotional Rating Scale (BERS; Epstein, 2004; Epstein & Sharma, 1998). The BERS was originally developed as part of a Children’s Mental Health Services-funded system of care project (Center for Mental Health Services, 2001; ORC Macro, 2003) and, as such, was partially based on developing wraparound services for youth with emotional-behavioral disorders (EBD) that considered the youth’s and their family’s resources and strengths (Lourie, Stroul, & Friedman, 1998). Although the System of Care initiative extolled the values of treatment plans based on strengths, Epstein and colleagues recognized that there were no psychometrically proven instruments to assess positive changes in the youths with EBD involved in a cross-agency, wraparound programs. In response to this need, Epstein and Sharma (1998) conducted a literature search of strength-based assessment, developmental psychopathology, resilience, and protective factors. After developing an item pool of 1,200 items, consulting with content experts, and preliminary analyses, items that produced the greatest differentiation between EBD and non-EBD youth were retained for scale development. Exploratory factor analysis identified a final scale consisting of 52 items with five factors that were called: Interpersonal Strength, Family Involvement, Intrapersonal Strength, School Functioning, and Affective Strength; hence the scale had a mixture of positive internal assets and external resources. The original BERS norm group was comprised of a mixed group of adult raters (teacher, caregivers, and case managers) (Epstein & Sharma, 1998). Epstein (2004) expanded the reach of the BERS (now called the BERS-2) by re-norming it with separate groups of teachers and parents, and extended it by developing a youth self-report format. The BERS was developed using a mixture of professional judgement and empirical procedures; however—it had no prespecified theoretical foundation or set of psychological constructs. Following from this observation, reviewers of the original BERS raised questions about the robustness of the original BERS factors. Doll (2001), for example, commented that, “The factor analysis underlying the scale needs to be verified through replication on Springer

J Child Fam Stud

independent samples, and a conceptual model is needed to describe the meaning of the scale and its subscales” (p. 144). Similarly, Olmi’s (2001) judgement was that, “The presentation on construct validity that was offered in the BERS manual was less than adequate” (p. 145). Since the original exploratory factor analysis reported in the BERS manual, no other researcher has independently examined its construct validity. These early questions about the generalizability of the factor structure of the original BERS when completed by adult raters point to the need to carefully examine the cross-informant concordance of the factor structures, particularly for the BERS-2 Youth Version. Epstein, Mooney, Ryser, and Pierce (2004) presented three analyses of the BERS-2 Youth Version for convergent validity with the Social Skills Rating Scale and the Achenbach (1991) Youth Self Report. They also examined the stability of scores. Although the samples in these three analyses were small and had restricted range, they provided evidence that the BERS-2 Youth Version score had short-term stability and had positive correlations when corrected for restricted range, with the correlations of the BERS-2 Total Strength Index being .71 with the SSRS Social Skills Composite and −.50 with the YSR Total Problems score. The one-week test-retest coefficient was .80. Additional support for the use of the BERS-2 Youth Version is provided by Uhing, Mooney, and Ryser (2005) who compared the responses of 386 adolescents without an emotional disturbance (non-ED) with those of 71 youth with ED. Predictive validity was shown in that the Total Strength Index for the ED youth (92.3, with a mean of 100) was significantly lower than that for the non-ED youth (101.3), with significant and moderate effect size differences on each of the five subscales. Farmer and colleagues (2005) examined the relationship between the BERS Parent and youth self-reported behavior using the Interpersonal Competence Scale–Self (Cairns, Leung, Gest, & Cairns, 1995) in a sample of African American middle school students. They found that students placed into low, middle, and high groups based on the Total Strength Index derived from parent ratings differed on aggression (high group reported lowest aggression), affiliation (high group reported most affiliation, for girls only), and popularity (high group reported most popularity). Results for the other subscales, including academics and internalizing problems, were not significant. This study provided evidence that the parent BERS-2 ratings had some concurrent validity; however, the analysis did not use the BERS-2 subscales and did not add information about its construct validity across parent and youth BERS-2 ratings. The only study to report the construct validity of the BERS-2 can be found in the BERS-2 Manual (Epstein, 2004), which presents CFA results using the 52 items to assess their fit with the original BERS five-factor model as measured variables assessing the latent trait of a Total Strength Index. The resulting fit indices were generally adequate (CFI = .995; TLI = .986; NFI = .995). However, the RMSEA of .12 was above the generally accepted range of .05–.08 for even a reasonable fit (Browne & Cudeck, 1993; Browne & Mels, 1990; Steiger, 1989; Thompson, 2004). Hence, there is a need for further exploration and verification of the conceptual structure of the BERS-2 Youth Self Version using an independent cross-validation sample. Given the emerging interest in strength-based assessment and the BERS-2’s wide use in the research (its use is mandated as part of the national evaluation of all CMHS System of Care Grants) (Center for Mental Health Services, 2001), it is important to further examine its psychometric properties to refine and advance its use as a research and clinical instrument. Furthermore, aligning the BERS-2 with theoretical models based in fields of risk and resilience (e.g., Cicchetti & Lynch, 1993; Cicchetti & Toth, 1997) and school engagement (e.g., Jimerson et al., 2004; National Research Council and the Institute of Medicine, Springer

J Child Fam Stud

2003) will allow for a better understanding of how the BERS-2 subscales relate to youth functioning. Hence, this study has two major purposes. First, we examine the factor structure of the BERS-2 Youth Version with an independent sample of adolescents. Exploratory factor analysis (EFA) will be used with an independent sample to examine item characteristics of the 52 items to determine if they fit best within the proposed five-factors structure or an alternative factor structure and if all items have adequate loadings (the BERS-2 manual does not report any EFA analyses for the youth version). Second, we sought to confirm the optimal factor structure of the BERS-2 Youth Version with a second independent sample of adolescents. Confirmatory Factor Analysis (CFA) analyses will be used to examine the comparative fit of the original BERS-2 Youth Version model versus alternative structures identified in the EFA. In order to align the BERS-2 with theoretical models related to strength-based assessment, this analysis seeks to identify organizations of the items and subscales that have clear links to constructs used in the related literatures identified by Epstein (2004; i.e., behavioral and emotional skills, strength-based assessment, developmental psychopathology, risk and resilience, and protective factors). Study 1: Exploratory factor analysis Method Participants The 752 adolescents who participated in Study 1 included two distinct samples of youths. The first group consisted of 386 students attending a comprehensive high school located in the central coast region of California with 194 males and 192 females, of who most were European American (87%). The majority of students were in Grade 9 (93%) with a few in Grades 10 (4%), 11 (2%), and 12 (1%). Participants in the second group were 366 youths referred to a California county juvenile probation department for a first-offense intake interview and risk assessment. This group included more males (66%) than females (34%) and reported Latino American (61%), European American (33%), African American (3%), and other (3%) ethnicities. Youths were ages 10 to 18, with 3% under 12-years-old, 28% between 12- and 13-years-old, and 69% above 14-years-old. Participants in the normative sample of the BERS-2, to whom our study participants are being compared, included 54% males, 80% “Whites,” 12% “Blacks,” and 8% “Other,” with 8% of the total sample responding “Yes” to being “Hispanic.” The “West” was represented by 22% of the normative sample. With the exception of geographical location, the comprehensive high schools sample was demographically similar to the BERS-2 normative sample. Measure Behavior and Emotional Rating Scale-2 (BERS). The BERS-2 youth version (Epstein, 2004) is a 52-item, standardized, norm-referenced scale originally designed for parents, teachers, and/or other caregivers to report about five aspects of the behavioral and emotional strengths of children and adolescents aged 11 years, 0 months to 18 years, 11 months. Raters respond to each question using a 4-point Likert scale, which for the youth version ranges from 0 (not like me) to 3 (very much like me). Epstein (2004) describes five subscales. The Interpersonal Strength subscale measures a youth’s ability to control his or her emotions or behaviors in Springer

J Child Fam Stud

social situations (e.g., “I can express my anger in the right way”). For the Youth Version, the BERS-2 manual (Epstein, 2004) reports an internal consistency alpha of .82 across all ages and a test-retest reliability of .89 over two-weeks for a sample (N = 42) of 11- and 12-yearolds. The Family Involvement subscale measures a child’s participation in and involvement with his or her family (e.g., “My family makes me feel wanted”). The BERS-2 manual reported an internal consistency alpha of .80 and a test-retest reliability of .85. The School Functioning subscale measures competence in school and classroom tasks (e.g., “I complete tasks when asked”). The BERS-2 manual reported an internal consistency alpha of .88 and a test-retest reliability of .89. The Intrapersonal Strength subscale measures a youth’s outlook on his or her competence and accomplishments (e.g., “I believe in myself”). The BERS-2 manual reported an internal consistency alpha of .82 and a test-retest reliability of .91. Finally, the Affective Strength subscale measures the ability of a child to accept affection from others and express feelings towards others (e.g., “It’s okay when people hug me”). The BERS-2 manual reported an internal consistency alpha of .80 and a test-retest reliability of .84. The BERS-2 also provides an overall Strength Quotient made up of all items with a reported internal consistency alpha of .95 and a test-retest reliability of .91. Overall, inter-rater and retest reliability indicates moderate to high correlation across all subscales. Procedures Measure. At this time that this study was conducted, the youth version of the BERS-2 (Epstein, 2004) was unavailable. Given the timing of our study, the youth survey used in this study is not exactly the same as the published BERS-2 measure due to subtle wording differences though the content and essential meaning of each question examined is the same. To use the BERS with youth participants, we merely changed the four-point Likert response format of the BERS parent version to the first person (e.g., 2 = Like my child changed to 2 = Like me). Subsequently, the BERS-2 was published with the same 52 items used in this study and the original BERS (Epstein & Sharma, 1998) with minor modifications. Information in the BERS-2 manual describing how the youth version was adapted only indicates that as a result of focus group feedback, “. . . individuals were satisfied with the content, format, and uses of the instrument” (Epstein, 2004, p. 66). BERS-2 authors identified the need for a brief career strengths subscale and five items to address this were added to the youth version. Thus, the career strengths subscale is not evaluated in this study. The manual does not mention any process for piloting or seeking input from youth on item content. For most items, authors of the BERS-2 made wording changes so items reflected the first person (e.g., “studies for test” became “I study for tests” in the youth version). Other items were changed apparently to contain vocabulary that would be more easily understood by the younger adolescents (e.g., “Trusts a significant person with own life” became “I trust at least one person very much”). Items in our study were presented in the same order as in the BERS-2. For this study, surveys R software package for were developed into a machine-readable format using the Teleform efficient data entry. Sample 1. For the comprehensive high school sample, classroom teachers under the direction of the school counselor (substance abuse prevention coordinator) administered surveys in Winter 2002. All students in a freshman Health class who were in attendance at school on the day of data collection anonymously completed the BERS-2 as part of a tobacco prevention school unit (non-ninth graders were taking the class to satisfy a graduation requirement). Data were sent to researchers for data entry with neither names nor identifying numbers. Springer

J Child Fam Stud

Sample 2. All participants in the juvenile probation department sample were drawn from the evaluation of the early intervention component of a state-funded delinquency prevention program. The youths had no probation interventions prior to this first referral. The BERS-2 Youth Version was administered between July 1997 and June 2001 as part of the regular intake assessment process by probation officers trained in assessment procedures and sent to evaluators for data entry and analysis with codes substituted for names. Data preparation. Prior to analysis, all surveys were examined for marking errors and ambiguities (i.e., bubbles that were not completely filled in were darkened or markings outside of the bubble were corrected). If an item had two marked responses, it was considered missing data. After carefully examining the surveys, research assistants scanned and verified the data accuracy using the Teleform software package, which automatically uploaded into an SPSS file. During further review of the data, extreme responders were excluded; for example, participants who marked all 0’s or 3’s. Per manual procedures, for surveys that had less than two items of missing data per subscale, missing data points were substituted with the overall subscale mean (Epstein & Sharma, 1998; Switzer & Roth, 2002).

Analysis Exploratory Factor Analysis (EFA). EFA was performed using SPSS 11.0. EFA is a method of investigating how scale items may optimally group together into different subsets to best measure an overall construct. General characteristics of EFA are that (a) the number of factors identified can range from a single factor to a number equal to the number of items in a scale, (b) all items are able to correlate with all of the factors, which may make identifying distinct factors difficult, and (c) rotation to change correlations between items and factors is used to identify the clearest pattern of results (Kline, 1998; Thompson, 2004). Though empirical techniques, such as creating an eigenvalue cut-off or minimum factor loading, can be used to identify the ideal number of factors, it is most accurate to consider empirical data within a theoretical framework that explains item groupings.

Results Description of the youth responses Table 1 displays the mean scores for the two groups of participants included in study 1. The first group of participants, who attend a comprehensive high school, were not assessed for their gender and thus, direct comparisons to the normative sample cannot be made. When compared to the male normative sample descriptively, these comprehensive high school participants had a similar mean on the Interpersonal Strength and School Functioning subscales, a lower average score on the Family Involvement and Intrapersonal Strength subscales, and a higher score on the Affective Strength subscale. Regarding the second group of participants, who were first-time offenders on probation in California, males had a similar mean on the Interpersonal Strength subscale as the normative sample and both males and females had similar scores to the normative sample on the Affective Strength subscale. All other scores were lower than the norm, which is not surprising given that a lack of assets is associated with delinquent behavior. Springer

J Child Fam Stud Table 1 Descriptive statistics: Average raw scores for three groups of participants compared to range of raw scores at a scaled score of 10 in the normative sample

Gender

BERS-2 scale

Normative sample

Comprehensive high schoola

Probation CA

Detention NV

Male

Interpersonal strength Family involvement Intrapersonal strength School functioning Affective strength Interpersonal strength Family involvement Intrapersonal strength School functioning Affective strength

31–32 22 27–28 19–20 14 35–36 22–23 29 22 16

31.2 19.7 25.0 18.6 15.5

31.4 20.4 24.9 17.9 14.8 32.3 19.0 25.3 18.5 15.8

25.9 16.3 21.2 13.5 12.2 27.8 15.1 23.7 13.9 14.4

Female

a Gender

was not available for the comprehensive high school sample. Thus, means are shown for the total

sample.

Exploratory factor analysis An exploratory factor analysis was conducted on the 52 BERS-2 Youth Version items. The principal components factor method was used. A scree plot of the eigenvalues was examined to determine the number of factors to be retained. Four pre-rotational factors were found with eigenvalues ranging from 3.60 to 28.99. Because of the moderate correlations between subscales, a direct oblimin rotation was then performed to find the simplest structure (Fabrigar, Wegener, MacCallum, & Strahan, 1999; Thompson, 2004; Widaman, 1993). Table 2 contains all within-scale loadings for the BERS-2, which were between .26 and .71. The first factor contained a mixture of 15 items from three of the five original BERS2 subscales (Interpersonal Strength, Intrapersonal Strength, and Affective Strength; e.g., expresses affection for others, requests support from peers and friends). We interpret this pattern to reflect youths’ global assessment of their social skills and their management and expression of emotions particularly as they relate to their relationships with others. Given this content, this factor could be labeled General Social Skills. The second factor contained seven items from the original BERS-2’s School Functioning subscale (e.g., completes homework regularly, pays attention in class). This differs from the original scale in that two items (14 and 41) did not load. The retained items focus on active participation in school-related tasks and not other types of school involvement such as sense of membership and engagement; thus, this factor could be called School Participation. The third factor contained six items from the original BERS-2 Interpersonal Strength subscale (e.g., accepts responsibility for own actions, considers consequences of own behavior), these six items appear to represent a youth’s sense of his or her self-control and coping with emotions. It makes conceptual sense that all Interpersonal Strength items would load together in an adult rating format because adults observe self-control by observing a youth’s social interactions. However, these results suggest that youths may distinguish between external social skills and internal emotional control (a youth may experience the need to calm themselves and not react in a social situation and this may not be obvious from his or her external behavior); thus, this factor could be called Emotional Control. The fourth factor contained nine items primarily from the original BERS-2 Family Involvement subscale, but with two items crossing over from the Intrapersonal Strength subscale Springer

J Child Fam Stud Table 2

Exploratory factor analysis: BERS-2 items with loadings above .40 (N = 752)

Original BERS-2 item number

BERS-2 subscalea

I

34. 13. 25. 23. 6. 21. 3. 12. 33. 44. 49. 26. 46. 38. 32. 31. 24. 39. 51. 47. 52. 40. 37. 35. 28. 30. 17. 16. 15. 7. 29. 5. 1. 11. 36. 42. 45. Eigenvalue % of variance No. of items Score range Alpha

Affective strength Affective strength Affective strength Affective strength Affective strength INTRApersonal Affective strength INTERpersonal INTERpersonal INTERpersonal INTERpersonal INTRApersonal INTRApersonal INTRApersonal INTRApersonal School functioning School functioning School functioning School functioning School functioning School functioning School functioning INTERpersonal INTERpersonal INTERpersonal INTERpersonal INTERpersonal INTERpersonal Family involvement Family involvement Family involvement INTRApersonal Family involvement Family involvement Family involvement INTRApersonal Family involvement

.68 .64 .63 .60 .57 .56 .53 .53 .51 .46 .43 .41 .40 .40 .40

II

III

IV

.71 .71 .68 .67 .64 .61 .49 .64 .58 .55 .55 .50 .44

28.99 56% 16 0–48 .88

4.90 9% 7 0–21 .81

3.90 8% 6 0–18 .76

.80 .76 .63 .63 .60 .55 .54 .46 .41 3.60 7% 9 0–27 .87

Note. Item numbers are those in the BERS-2. Four factor Oblimin resolution. Loadings lower than 0.40 suppressed for purposes of clarity of presentation. a BERS-2

Springer

subscale on which the item is reported loading for the youth version.

J Child Fam Stud

(e.g., participates in family activities, demonstrates a sense of belonging to family). These two Intrapersonal Strength items (i.e., is self-confident, is enthusiastic about life) appear to differ from other items in the original BERS-2 Youth Version scale in that they may be more related to family values and functioning than to peer and social functioning. Consistent with original BERS subscale, we label this factor Family Involvement. Discussion EFA yielded a somewhat different factor structure than the original five-factor model provided by the BERS-2 developer (Epstein, 2004). Our four-factor model is more parsimonious, in that it relies on fewer variables (37 instead of 52) and still has good internal consistency characteristics. However, an over-reliance on the statistics of a particular EFA can overfit the model to a particular sample, thus, confirmatory factor analysis with a second sample is desirable before drawing conclusions.

Study 2: Confirmatory factor analysis Method Participants Participants were 358 youths referred to Juvenile Justice Services of Clark County, Nevada for a first-time intake interview and risk assessment. This sample included more males (58%) than females (42%) and reported African American (34%), European American (29%), Latino American (28%), Native American (7%), and Asian (2%) ethnicities. Youths were the following ages: 13 (5%), 14 (14%), 15 (21%), 16 (27%), and 17 (33%). Measures The same modified BERS-2 Youth Version was used for Study 2. Procedures The modified BERS-2 Youth Version was administered to adjudicated youth by trained detention staff during 2002 and 2003 as part of the regular juvenile justice intake assessment process. Data were sent to the researchers for data entry and analysis, with codes substituted for names. The same data preparation strategies used in Study 1 were used in Study 2. Analyses AMOS 4.0 was used to conduct a Confirmatory Factor Analysis (CFA) to compare the original five-factor BERS-2 model to the four-factor model suggested by Study 1. CFA helps address questions of validity and “is at the heart of the measurement of psychological constructs” (Nunnally, 1978, p. 113). CFA allows researchers to test predetermined models by plotting a proposed factor structure with measured variables loading onto proposed, or “latent” variables (Kline, 1998). If CFA finds reasonable results, (a) items have high correlations with latent variables and (b) correlations between factors are not excessively high (