Assessing the Promise of Standards-Based ...

1 downloads 0 Views 186KB Size Report
new standards-based evaluation system being phased in by the district with those assigned to ...... Larchmont, NY: Eye on Education. Peterson, K. D. (2000).
Leadership and Policy in Schools, 8:233–263, 2009 Copyright © Taylor & Francis Group, LLC ISSN: 1570-0763 print/1744-5043 online DOI: 10.1080/15700760802416099

Assessing the Promise of Standards-Based Performance Evaluation for Principals: Results from a Randomized Trial

1744-5043 and Policy in Schools, 1570-0763 NLPS Leadership Schools Vol. 8, No. 3, Apr 2009: pp. 0–0

Standards-Based Steven Miller Kimball Performance et al. Evaluation

STEVEN MILLER KIMBALL, ANTHONY MILANOWSKI, and SARAH A. MCKINNEY University of Wisconsin–Madison, USA

Principals (N = 76) in a large western U.S. school district were randomly assigned to be evaluated using either a new standardsbased system or to continue with the old system. It was hypothesized that principals evaluated with the new system would report clearer performance expectations, better feedback, greater fairness and system satisfaction, and spending more effort on priorities emphasized in the new system. Surveys and interviews were used to assess these perceptions. The hypotheses about feedback and satisfaction were supported. The study also revealed several issues with implementing standards-based evaluation, including competition with many other messages that define performance expectations for principals. State and federal accountability systems have increased pressure for schools and districts in the United States to greatly improve student achievement. Schools are required to teach all students to high academic standards and eliminate achievement gaps between those from different ethnic and socioeconomic backgrounds. Principals are directly involved in this movement toward increased accountability for student achievement. Because they are an important link between district programs to improve achievement and

A previous version of this article was presented at the annual conference of the American Educational Research Association, April 10, 2007, Chicago, Illinois. The research reported in this article was supported by the U.S. Department of Education, Institute for Education Sciences, Education Finance, Leadership, and Management Research Program (Grant R305E05135). The opinions expressed are those of the authors and do not necessarily reflect the view of the Institute for Educational Sciences, the U.S. Department of Education, the institutional partners of the Consortium for Policy Research in Education, the Wisconsin Center for Education Research or the University of Wisconsin–Madison.

Address correspondence to Anthony Milanowski, University of Wisconsin–Madison, WCER, 1025 W. Johnson St., Madison, WI 53706, USA. E-mail: [email protected] 233

234

Steven Miller Kimball et al.

teacher efforts in the classroom (Hallinger & Heck, 1996, 1998; Leithwood, Seashore Louis, Anderson, & Wahlstrom, 2004; Waters, Marzano, & McNulty, 2003), school districts are seeking ways to develop, motivate, and hold principals accountable for their ability to influence school outcomes. Principal performance evaluation represents one way that districts monitor, support, and intervene in principal performance. Yet there is little if any research on the impact of principal evaluation on principal performance. For example, within a book dedicated to developing a new research agenda for education leadership (Firestone & Riehl, 2005), several comprehensive reviews of the literature on school leadership are presented, but there is no substantial discussion of principal performance evaluation and its impact on principal behaviors and school outcomes, and little attention to how district leaders can influence principal practice. As has been argued with respect to teacher evaluation (e.g., Peterson, 2000), it may be that principal evaluation is largely pro forma and unaligned with other district efforts to improve or measure performance. However, just as with teacher evaluation, newer standards-based evaluation approaches may hold promise for influencing principal performance. Based on the promulgation of standards for leaders, such as the Interstate School Leaders Licensure Consortium (ISLLC) Standards for School Leaders (Council of Chief State School Officers, 1996), new evaluation approaches have been developed that parallel standards-based evaluation for teachers. Examples include the work of Hessel and Holloway (2002) and Reeves (2004). Unfortunately, there is also little research on whether the use of these leadership standards for any purpose affects principal practice or school performance. As Smylie and colleagues observe “. . . there is remarkably little empirical evidence on school leader knowledge, skills, and dispositions and their relationship to leadership practice” (Smylie, Bennett, Konkol, & Fendt, 2005, p. 142). We know little about how these leadership standards are used by districts, including how they work for purposes such as principal evaluation. This lack of research leaves educational organizations with little guidance beyond expert opinion in their efforts to identify those characteristics or proficiencies that are important to leader success, and which in turn can be evaluated and supported. This article reports on research designed to provide evidence on the potential effectiveness of a standards-based approach to principal evaluation by assessing how a standards-based evaluation system implemented in one U.S. school district was used to communicate performance expectations, direct principal effort, and help principals improve their practice. We compared the experiences of principals who were randomly assigned to a new standards-based evaluation system being phased in by the district with those assigned to continue being evaluated by the existing system, which was not standards based. In addition, because our prior work with standards-based teacher evaluation found that implementation was critical to both

Standards-Based Performance Evaluation

235

acceptance of the system and positive effects on performance (Heneman, Milanowski, Kimball, & Odden, 2006), we also studied how the evaluation process was carried out, asking principals and supervisors to explain evaluation processes, discussions, evidence applied, and decisions.

STANDARDS-BASED PRINCIPAL PERFORMANCE EVALUATION Standards-based models for principal evaluation incorporate features taken from standards-based teacher evaluation. Both the Reeves (2004) and Hessel and Holloway (2002) models are based on standards and rubrics similar in concept to innovative approaches developed for teacher evaluation (Danielson, 1996; Danielson & McGreal 2000). Research has shown that standards-based teacher evaluation (e.g., Danielson, 1996) can gain acceptance by teachers and administrators (Kimball, 2002), help guide teaching practice, and that evaluation scores can predict value-added student achievement (Milanowski, Kimball, & Odden, 2005). Applied to school leaders, standards-based evaluation: • is grounded in research on leadership qualities or processes that can help those who are most directly involved with student learning (teachers) improve student achievement; • includes specific standards for behavior and detailed behaviorally specific ratings scales (rubrics) that define multiple levels of performance, typically ranging from unsatisfactory through basic and proficient to exemplary; • the rubrics are specified in enough detail to clarify the behaviors or competencies required of a good performer; and • includes the use of multiple sources of evidence and training of evaluators to recognize evidence and consistently apply the rating scale or rubric. Though critics of leadership standards have pointed to potential limitations in their design and use (e.g., Leithwood & Steinbach, 2003), they do represent a concerted effort to define principal competencies, and their use as a basis for evaluating principals seems a step forward from rating on generic traits or on some global notion of satisfactory or unsatisfactory performance. Figure 1 represents the conceptual framework underlying our research on the potential impacts of standards-based principal evaluation. According to this framework, standards-based principal evaluation systems, along with school context and leader background, influence principal behaviors, including the development of needed knowledge and skills. Principal behaviors influence school organizational features that in turn can impact teacher behaviors and ultimately student achievement (Hallinger & Heck, 1996, 1998; Leithwood et al., 2004). These features include the school’s

236

Steven Miller Kimball et al.

Principal Background Standards-Based Principal Evaluation System - Model of Quality Leadership - Incentives to Improve Performance - Support Systems

School Features Principal Knowledge, Skills, and Behaviors

- Instructional Program - Resource Acquisition/ Allocation - Staff Quality - Goals/Mission - Culture

Teacher’ Behaviors

Student Achievement

School Context

FIGURE 1 Conceptual framework for understanding the influence of principal behavior on student achievement.

instructional program, physical resources, staff (and especially teacher) quality, mission, and culture. In this article, we are concerned with the first link shown: that between the evaluation system and principal behavior. In conjunction with incentives to improve performance (which could range from recognition of good performance by higher level administrators to financial rewards) and support systems to help principals improve (e.g., feedback on current performance, coaching, professional development), standards-based principal evaluation provides a well-specified model for the leadership behaviors expected by the district that should influence principal behavior. By clearly defining the domain of performance, these evaluation systems provide more guidance to principals as to district expectations and priorities for performance. The detailed behavioral rubrics allow principals’ supervisors to provide more specific feedback, which should help principals improve performance. The standards and rubrics also should help evaluators make more accurate and credible evaluations, so that incentives flowing from evaluation ratings are more tightly tied to the desired performance represented by the model. As in any evaluation system, incentives such as recognition, job retention, and perhaps additional compensation may be attached to evaluation ratings, providing motivation to perform according to the model. Standards-based systems provide an additional incentive by including a clearly and publicly defined exemplary or mastery level of performance to which principals can aspire. Due to these features, we would expect standards-based evaluation to have a stronger influence on principal behavior, all else equal, than an evaluation system without them. For evaluation systems to have such positive effects, they must also be accepted by those being evaluated. Acceptance can be influenced by many factors, but among the most important are perceptions of the fairness of the process and its outcomes (Gilliland & Langdon, 1998; Roberts, 1994). Standards-based systems, with explicit standards and rubrics that communicate clearer performance expectations and more guidance to evaluators,

Standards-Based Performance Evaluation

237

would likely be perceived as more fair by those evaluated. And if such a system does provide better feedback, is more useful in improving performance, and is perceived as fair, those evaluated might be expected to be more satisfied with the system as a whole. We tested these expectations by studying the implementation of a standards-based principal system in a medium-sized western school district (described below). We were able to use the phased implementation of this new system to compare the experiences of two groups of principals: one group evaluated using the old system which was not standards-based, and another group evaluated using the new standards-based system. Based on the potential advantages of standards-based evaluation discussed above, we hypothesized that those evaluated under the new system would: 1. 2. 3. 4. 5. 6.

have clearer performance expectations; perceive receiving higher quality performance feedback; perceive that their evaluation was more useful in improving performance; perceive the system as more fair; be more satisfied with the evaluation system overall; and report spending more time and effort on job facets emphasized in the new system.

Hypotheses 1–5 focus on the key perceptions that principals would need to have on order for any evaluation system to motivate them to change behavior toward the goals underlying the system. Because of the features of standards-based systems discussed above, we hypothesized that these systems would do a better job of fostering the perceptions necessary for an evaluation system to impact behavior. These hypotheses address conditions necessary for the initial link in our conceptual framework to be made. Hypothesis 6 addresses one way the link is made: whether principals respond to the evaluation system by increasing their effort toward the aspects of job performance emphasized by the evaluation system.

METHODS The Study District The study was done in a geographically diverse school district in the western United States. The district includes two primary population centers and four small, outlying communities. There are 61 elementary schools, 15 middle schools, and 12 comprehensive high schools in the district. In addition, there is a special education school for children with multiple and complex disabilities, and three alternative secondary schools. Nineteen of the elementary schools operate on a year-round schedule. There are over 62,000 students, with an ethnic breakdown of about 60% white and 40% nonwhite.

238

Steven Miller Kimball et al.

About 30% of the nonwhite students are Hispanic. There are close to 4,000 certified staff (including teachers) and over 300 administrators. The district is led by a superintendent and three assistant superintendents focused on elementary education, secondary education, and district operations. The two assistant superintendents that oversee elementary and secondary schools supervise some of the principals at these levels and work with six senior directors who supervise school and principal performance for the majority of schools.[0] To ensure anonymity and provide clarity, we will refer to the senior directors and assistant superintendents as supervisors.

Standards-Based Principal Evaluation in the District The implementation of standards-based principal performance evaluation represents a continuation of district efforts to improve and align evaluation systems, coming after the introduction of a standards-based teacher evaluation approach based on Danielson’s (1996) Framework for Teaching in 2000. In the spring of 2004, after a number of informal meetings and discussions among district leaders and principal association representatives, the district began the redesign effort. There were three primary reasons the district decided to design and implement a new principal performance evaluation process. First, there was dissatisfaction among principal supervisors and, to some extent, principals, with the evaluation process that was in place. Second, the district had positive experience with its standards-based teacher evaluation process. Third, growing accountability pressures from state standards-based reform and then the federal No Child Left Behind Act (NCLB) focused more district attention on principal development and accountability. A design committee was established to develop the new standards-based evaluation system, and after a year of work the approach was approved for pilot testing by the district’s Board in January 2005. (Samples from the new and old evaluation systems are included in the Appendix.) From January to June of 2005, the district conducted a pilot study with each supervisor, applying the approach with one to three principals from January to June of 2005. The purpose of this pilot test was to ease principal supervisors into using the new standards-based approach, to establish a level of comfort, and to identify any glaring problems before the system was applied to a larger group. Fourteen principals were included in the pilot study. For 2005–2006 the district agreed to a phased implementation of the new system in which one-half of the remaining principals would be evaluated under the new system and one-half under the old. Supervisors were trained in the new system both by exposure during the pilot and in two formal sessions. In each session, the evaluation system design leader facilitated collegial interactions about the new evaluation system. The first session in January 2005 exposed the supervisors to the new

Standards-Based Performance Evaluation

239

performance dimensions and allowed them to talk through the performance expectations. Another training session occurred in the early fall of 2005, focusing on experiences supervisors had with the pilot. During this session, supervisors agreed that they would center their evaluation interactions with principals on a few standards within the dimensions of focus, rather than all dimensions and standards. These selected standards would vary according to the principal’s goals. In addition, the discussion was intended to further understanding of the standards, and to begin considering how the supervisors consistently apply the standards in making performance decisions, including appropriate sources of evidence that could inform evaluation decisions. However, as we later found, the supervisors had not come to a consensus about common evidence that would be collected from all principals. Instead, supervisors used data collection methods they had typically applied in the past. In some cases, this included collection of school climate surveys from teachers and parents, informal interviews with teachers, discussions with principals, and school-based observations. Supervisors were not required to collect the same types or amounts of evidence.

Principal and Supervisor Samples For the 2005–2006 school year, 76 principals were randomly assigned to be evaluated under the new or old systems. In this district, eight central office administrators supervise principals. At the elementary level, five supervisors evaluate from 10–12 principals (the Elementary Assistant Superintendent supervises four principals). At the secondary level, the middle school supervisor evaluates 14 principals, and the high school supervisor evaluates 11 principals. Because of the likely importance of supervisor characteristics to evaluation outcomes, principals were randomly assigned within supervisor; that is, about one-half of each supervisor’s principals would be evaluated with the new system, and the other half with the old. The 14 principals who had already been exposed to the new system in the pilot were assigned to continue with that system, and were not part of the study. The supervisors were asked to maintain the approach they have always used for those in the “control” group, while utilizing the new standards and procedures for the other half of their principals assigned to the “treatment” group. Supervisors evaluated between four and six principals using the old system and four to six using the new.

Measures, Data Collection, and Analysis Principals’ experiences and perceptions of the evaluation processes and their effects were collected throughout the 2005–2006 school year, using both surveys and interviews. Data collection procedures and the analyses applied to both the quantitative and qualitative measures are described below.

240

SURVEY

Steven Miller Kimball et al. MEASURES , DATA COLLECTION , AND ANALYSIS

All of the principals were asked to complete a survey in April of 2006, after the evaluations were to have been completed. The survey included items referencing system implementation and about the principals’ perceptions of evaluation process effects. To test Hypotheses 1–5 scales were adapted from our earlier research on teacher evaluation (Heneman & Milanowski, 2003). These scales include the following: • Clarity of district performance expectations (4 items). Example: “I was clear about the district’s expectations for my performance this year”; • Feedback quality (4 items). Example: “I received concrete, specific feedback on my performance from my supervisor”; • Evaluation process utility (3 items). Example: “The evaluation helped me learn how I can improve my performance”; • Distributive and procedural fairness (3 items each). Examples: “I got the evaluation scores I deserved this year” and “ My evaluation was conducted in a fair way”; and • System satisfaction (3 items). Example: “Overall, I am satisfied with the performance evaluation system.” To test the sixth hypothesis, principals were asked to indicate from among eight choices the areas on which they spent the most time and effort during the evaluation year. These areas included four explicitly emphasized by the new, but not the old, evaluation system (developing a school mission statement, analyzing student achievement results, understanding student academic standards, improving use of technology), and four common to both. A scale was created based on how many of these four items each principal checked (i.e.,1 for 1 item checked to 4 for all four). Overall survey response was 87% (N = 65), though nonresponse to specific items lowered the sample size for some of the analyses. Table 1 reports on the available

TABLE 1 Characteristics of Obtained Survey Sample.

Characteristic School level Elementary Middle High Multiple Average years of experience as principal Percent female

New system respondents N = 31

Old system respondents N = 34

Difference statistically significant? No

65% 12% 12% 2% 6.5

74% 23% 10% 3% 5.7

No

55%

47%

No

Standards-Based Performance Evaluation

241

demographic characteristics of the obtained sample by the evaluation system assigned. Hypotheses 1–5 were tested by regressing the scale averages for the clarity of expectations, feedback quality, process utility, and procedural and distributive fairness, and system satisfaction scales on an indicator of the evaluation system to which the principal was assigned and indicators for the supervisor (to control for supervisor-specific effects). The analyses were done using the cluster and robust standard error options in the Stata statistical package (StataCorp., 2001) in order to adjust the standard errors for the clustering of respondents within supervisors. The hypotheses would be supported if the coefficient for the evaluation system indicator were positive and statistically significant for each scale. The sixth hypothesis was tested by using ordered logistic regression to estimate the odds of principals in the new system group reporting more time and effort spent on the job facets emphasized in the new evaluation system, controlling for supervisor. Again, analyses were done using the cluster and robust standard error options in Stata. The hypothesis would be supported if the odds ratios for the evaluation system indicator were substantially above 1.0 and statistically significant. These analyses were done and reported for both the full group of principals participating in the randomized trial (the intent to treat approach) and with the six principals whose assignments were switched and subsequently removed.

Qualitative Data Collection and Analysis In addition to the survey, semistructured interviews were conducted with a purposive sample of 14 principals, chosen so that one principal under each evaluation approach for each of the district’s seven primary supervisors would be included. Descriptive information for the sample is included in Table 2. As reflected in Table 2, we chose principals to represent a range of experience at the elementary, middle, and high school levels. In-person interviews were conducted in November and April, and a phone interview was conducted in January. All seven principal supervisors were interviewed in the spring. Interview questions were designed to explore the evaluation system content and processes, perceptions of evaluation utility (e.g., quality of evaluation feedback, relevance to performance), school and district contexts, and implementation fidelity. With the exception of the phone interviews, which were recorded with detailed written notes, all interviews were voice-recorded and transcribed. Transcripts of the interviews were analyzed using NVivo7 qualitative software. In developing the qualitative coding scheme, we engaged in a series of coding exercises, in which two researchers coded the same transcript then shared their decisions on the coding. Subsequent transcripts were coded individually,

242

Male Female Male Female Female Female Female Male Female Male Female Male Male Female

1 2 3 4 5 6 7 8 9 10 11 12 13 14

School*

Blue Sky Park Place Silver Valley Seneca Sage Fountain Lake Mountain View Roller Hill Chinook Wager Basin

*Fictitious names used for anonymity.

Gender

Principal High Elementary Elementary Elementary Middle Elementary Elementary Elementary Elementary High Elementary Elementary Middle Elementary

Level

TABLE 2 Descriptive Information on 14 Principals in Qualitative Sample.

Old Old Old Old Old Old Old New New New New New New New

System evaluated (new/old) 11 1 3 1