Non-cognitive assessment in personnel selection

0 downloads 0 Views 5MB Size Report
tests: a situational judgment test to measure emotion regulation in pedagogical ..... Examples of these models are, for example, Mayr's offer-use model (2012), the ...... Manual Intelligenz-Struktur-Batterie (INSBAT, Version 26.00). ..... after the working day, (14) active recreation behavior in free time and (15) preventive health.
Non-cognitive assessment in personnel selection: Development of new tests for the admission procedure in teacher education

Dissertation submitted for the doctoral degree of natural sciences (Dr.rer.nat.) at the faculty of Natural Sciences of the University of Graz

Maga.rer.nat Corinna Koschmieder

Supervisor & 1st Assessor Univ.-Prof. Dr. phil. Aljoscha C. Neubauer Department of Psychology Karl-Franzens-Universität Graz

2nd Assessor Univ.-Prof. Dr. phil. Andreas Schwerdtfeger Department of Psychology Karl-Franzens-Universität Graz Graz, 2018

Acknowledgements „A truly great mentor is hard to find and impossible to forget.” Auch ich möchte meinen Lehrern, Mentoren und allen Menschen, die mich auf diesem Weg unterstützt haben an dieser Stelle danken. Allen Vorweg Aljoscha Neubauer, der mir immer wieder geholfen hat mich für eine Abzweigung zu entscheiden, und mir den Freiraum gab Gebiete zu erkunden und Wissen zu entwickeln. Sabine Bergner, Mathias Benedek und Georg Krammer für die Hilfe beim Durchqueren von steinigen Schluchten und Erklimmen von schiefen „Verteilungsgipfel“. Emanuel Jauk, Hanna Vollmann, Agnes Diebschlag, Barbara Weissenbacher, Jürgen Pretsch, Beate Dunst, Sonja Walcher und Gabriela Hofer für den inspirierenden Austausch, der so manches Paper vervollständigt hat. Dem großen Sicherheitsnetz an Projektpartnern, wie Hubert Schaupp, Barbara Pflanzl, Florian Müller, Hannes Mayr, Renate Strassegger-Einfalt, Hannelore Knauder, Elke Jantscher und Christine Kapper, durch die sich mir immer wieder neue Gebiete und Perspektiven erschlossen haben. Sowie den Menschen, die den Rastplatz gepflegt und für so manche Ablenkung in dunklen Nächten gesorgt haben - Magda, Dominika, Lisa, Steffi, Sim, Georg, Sigrid, Kurt und einigen weiteren Freunden, mit denen man Pferde stehlen kann. Und nicht zuletzt meinen Eltern, die immer mit voller Kraft hinter mir stehen, egal was auch passiert!

Preface This cumulative dissertation is based on four studies emerged from a HRSM-project regarding the development of the admission procedure for teacher education. The admission consists of a multistage procedure including a non-selective self-assessment, a psychological computer test and a standardized interview, which is offered additionally to all participating institutions. Within the development process, I was part of the scientific research team, which aimed to develop a common model for the admission of teacher education students and to establish a fair, efficient, reliable and valid admission procedure. This should create comparable standards for all participating institutions and be subject to scientific reviews and evaluations. Within my responsibility for the composition, coordination and analysis of the computer test battery, I especially focused on two new tests for emotion regulation and creativity evaluation as well as the utility of personality questionnaires. Further, I initiated a longitudinal follow-up study to promote the assessment validation and investigation of teacher student success and dropout. Within four years, a common base for teacher selection was built throughout Austria by an intensive and very constructive cooperation of experts from universities and university colleges of teacher education. The studies presented in this thesis were supported by the HRSM Fund from the Austrian Federal Ministry of Science, Research and Economy. Original Articles: Neubauer, A., Koschmieder, C., Krammer, G., Mayr, J., Müller, F., Pflanzl, B.,… Schaupp, H. (2017). TESAT - Ein neues Verfahren zur Eignungsfeststellung und Bewerberauswahl für das

Lehramtsstudium: Kontext, Konzept und erste Befunde. Zeitschrift für Bildungsforschung, 7(1), 5-21. Koschmieder, C., & Neubauer, A. (submitted). Test construction of situational judgment tests in the spotlight: Realization through the development of a test for emotion regulation in teacher education. European Journal of Psychological Assessment. Koschmieder, C., Dumfart, B. , Pretsch, J., & Neubauer, A. (accepted). The impact of personality in the selection of teacher students: Is there more to it than the Big Five? Europe's Journal of Psychology. Benedek, M., Nordtvedt, N., Jauk, E., Koschmieder, C., Pretsch, J., Krammer, G., & Neubauer, A.C. (2016). Assessment of creativity evaluation skills: A psychometric investigation in prospective teachers. Thinking Skills and Creativity, 21, 75-84.

Contents Abstract ............................................................................................................................................i Zusammenfassung (German Abstract) ....................................................................................... ii 1.

Theoretical Background ........................................................................................................ 1

1.1. A brief taxonomy of student selection.................................................................................... 1 1.2. A theoretical framework for teacher selection ....................................................................... 3 1.3. Relevant constructs for teacher education and profession ..................................................... 6 1.4. Selection of appropriate methods for non-cognitive testing................................................. 10 1.5. Composition of the admission procedure ............................................................................. 13 1.6. Empirical review and validation ........................................................................................... 14 1.7. The present dissertation ........................................................................................................ 16 2.

Abstracts of the publications and further longitudinal research ..................................... 17

2.1. Selection of appropriate constructs for the admission procedure of teacher education and first results of the longitudinal validation (Study 1) .................................................................... 17 2.2. Development of a novel test with useful test formats for emotion regulation

(Study 2) .. 18

2.3. Development of a novel test with useful test formats for creativity evaluation (Study 3) ... 19 2.4. Analysis of the usefulness of different broad and narrow personality measures (Study 4) . 20 2.5. Focus on future as a base for longitudinal research ............................................................. 21 3.

General discussion ................................................................................................................ 22

3.1. Assumptions on selection criteria for teacher education ...................................................... 22 3.2. Assumptions on measurement .............................................................................................. 25 3.3. Multidimensional prediction of academic achievement and later teaching performance .... 27 3.4. Restrictions ........................................................................................................................... 29

4.

Conclusion ............................................................................................................................. 33

5.

Publications (full texts & dissemination) ........................................................................... 35

5.1. . Study 1: TESAT - Ein neues Verfahren zur Eignungsfeststellung und Bewerberauswahl für das Lehramtsstudium: Kontext, Konzept und erste Befunde ...................................................... 37 5.2. Study 2: Test construction of situational judgment tests in the spotlight: Realization through the development of a test for emotion regulation in teacher education ....................................... 55 5.3. Study 3: Assessment of creativity evaluation skills: A psychometric investigation in prospective teachers..................................................................................................................... 89 5.4. Study 4: The impact of personality in the selection of teacher students: Is there more to it than the Big Five? ................................................................................................................................ 100 6.

References ........................................................................................................................... 125

Theoretical Background

Abstract Based on a project for the development of a multistage admission procedure for teacher education (Teacher student assessment in Austria – TESAT), four studies dealing with the development of this admission procedure emerged as part of this cumulative dissertation. In particular, a theoretically based model for the admission examination for teacher education – described in study one - consisting of a self-assessment, a psychological test battery and a structured interview was developed. In study two and three special attention was paid to the development of two novel tests: a situational judgment test to measure emotion regulation in pedagogical situations (ERIPS) and a performance test for creativity evaluation (CET). In the last study, the usefulness of two different self-report questionnaires for measuring personality was compared regarding their implementation in the assessment. In this framework, the influences of item contextualization as well as the overlap of the measured constructs were analyzed. First validity results show predictive validities for academic achievement, academic satisfaction and the intention to quit studying. A number of methodological issues regarding the development of situational judgment tests, the utility of indices from signal detection theory for non-cognitive assessments and the implementation of a combination of broad and narrow constructs in assessments are addressed. These are discussed due to their potential value for the selection of prospective teachers. In summary, the implemented admission examination shows high psychometric standards and sets a base for longitudinal validation and further research on teacher education.

i

Theoretical Background Zusammenfassung (German Abstract) Basierend auf einem Projekt zur Entwicklung eines mehrstufigen Zulassungsverfahrens für das Lehramtsstudium (Teacher Student Assessment in Austria - TESAT) wurden im Rahmen dieser kumulativen Dissertation vier Studien zur Entwicklung eines Zulassungsverfahrens durchgeführt. Das theoretisch fundierte – in Studie 1 dargelegte – Verfahren besteht aus einem SelfAssessment, einer psychologischen Testbatterie und einem strukturierten Interview. In der zweiten und dritten Studie wurde ein besonderes Augenmerk auf die Entwicklung von zwei neuen Tests gelegt: einem Situational-Judgment-Test zur Messung der Emotionsregulation in pädagogischen Situationen (ERIPS) und einem Leistungstest zur Kreativitätserkennung (CET). In der letzten Studie wurde der Nutzen von zwei verschiedenen Persönlichkeitsfragebögen in Bezug auf den Einsatz im Aufnahmeverfahren verglichen. Dabei wurden die Einflüsse verschiedener Itemdarstellungen mit unterschiedlichem Ausmaß an Situationsbezug und die Überlappung der gemessenen Konstrukte analysiert. Erste Ergebnisse der Studien zeigen prädiktive Validitäten für Studienleistung, Studienzufriedenheit und die Absicht, das Studium abzubrechen. Eine Reihe von methodischen Fragen bezüglich der Entwicklung von Situational-Judgment-Tests, dem Nutzen von Indizes aus der Signalentdeckungstheorie für nicht-kognitive Assessments und der Implementierung einer Kombination von breiten und engen Konstrukten in Assessments werden behandelt. Diese werden aufgrund ihres potenziellen Wertes für die Auswahl der angehenden Lehramtsstudierenden diskutiert. Zusammenfassend lässt sich sagen, dass das Aufnahmeverfahren hohe psychometrische Standards aufweist und eine Grundlage für eine längsschnittliche Validierung und weitere Forschung im Bereich der LehrerInnenbildung darstellt.

ii

Theoretical Background

1. Theoretical Background The use common of standardized admission procedures for university studies started at the end of the 19th century and is also becoming increasingly widespread in Europe. The original purpose was to achieve a uniform, transparent, fair and practical admission regulation (Buckley, Letukas & Wildavsky, 2018). In addition, the aim today is to reduce the number of dropouts and to recruit students who successfully complete their studies. In line with this, many studies investigate predictive validities of academic achievement (Kuncel, Hezlett & Ones, 2004; Lievens & Sackett, 2017; Richardson, Abraham & Bond, 2012; Stajkovic, Bandura, Locke, Lee & Sergent, 2018). 1.1. A brief taxonomy of student selection Currently in most psychological selection assessments, different methodological approaches (construct approach, simulation approach & biographical approach) are combined. The construct approach is used to measure stable traits, which are usually operationalized by psychological tests. The aim of the simulation approach is to assess situational behavior, for example via situational judgment tests, role plays or work samples. The third diagnostic principle is the biographical approach. It collects biographical information with interview questions or test questionnaires (Schuler & Kanning, 2014). Based on the knowledge which constructs are important for the selection, multilevel procedures involving self-assessments, various types of tests, standardized interviews and other assessment tasks can be used in an assessment battery. Self-assessments – for example the career counseling for teachers (CCT, Nieskens, Mayr & Meyerdierks, 2011) – are intended to inform prospective students about the requirements of their chosen studies and encourage them to reflect on their options through personal feedback on self-reported judgments about their own prerequisites. As a 1

Theoretical Background result, prospective students get a better and more realistic overview of the desired degree program. For external selection, tests can be used to survey various cognitive and non-cognitive constructs as well as specific knowledge. Interviews and assessment tasks can be conducted to test specific abilities such as verbal competencies, presentation skills and the capacity for teamwork in face-to-face or group situations. Each desired selection attribute can be operationalized using different methodological approaches with its own principles of validity (Schuler & Kanning, 2014) and assigned to the appropriate level of the admission procedure. The following steps are essential for the development of an economic, objective, reliable and valid selection procedure: 1. Theoretically founded combination of the relevant attributes and the corresponding validation criteria to create a selection model 2. Selection of appropriate, well-designed methods with high test quality to test the attributes 3. Composition of an admission procedure consisting of selection and, if necessary, selfreflection methods 4. Empirical review of the theoretical selection model with regard to test quality, validity and fairness In the next chapters, an overview of research on selection in teacher education will be presented with regard to each of these four steps. Thereby, special focus is given to the characteristics that will be explored in the dissertation studies.

2

Theoretical Background 1.2. A theoretical framework for teacher selection On the basis of 52.000 studies from more than 815 meta-analyses, Hattie (2009) extracted 138 influencing factors on learning success. This represents the largest database of teaching research in history so far. The effects on learning success varied from d=-. 34 for the factor “change of school” to d=1.44 for the factor “self-assessment of one's own performance level”. On average 32% of the differences in performance can be traced back to the behavior of the teacher. These “contributions from the teacher” included teacher training (d=.11), microteaching (d=.88), Teacher subject matter knowledge (d=.09), the quality of teaching (d=.44), teacher-student relationships (d=.72), the professional development (d=.62), teacher’s expectations (d=.43), not labeling students (d=.61) and teacher clarity (d=.75). Further studies even report that up to 60% of the variance in student performance can be explained by differences among teachers and classes (Alton-Lee, 2003). These results led to the conclusion that “the teacher matters”, which is associated with the importance of appropriate education and selection of future teachers. But the question “What does it require to be a good teacher?” can be investigated through different approaches. In the history of educational research, three major research paradigms are dominating: the personality approach, the process-product paradigm and the expert paradigm (Köller, 2008). The personality approach deals with stable characteristics of teachers. In the 1970s, this paradigm was followed by the process-product paradigm (Bromme & Haag, 2004), which focuses on the interaction of teachers and students. A frequently used model of this paradigm is the Offer-andUse-Model of Helmke (2004). The third paradigm, the expert paradigm, emphasizes the trainable competences of teachers and represents the advancement of a teacher from novice to expert. The

3

Theoretical Background steps of the novice and the advanced beginner are accomplished within the studies of teacher education (König, 2010). Thus, within the process of research, the focus on characteristics of a successful teacher has shifted to learnable professional knowledge. However, if we focus on selection at the beginning of teacher education, the combination of these paradigms demands a selection on the basis of characteristics and competencies that are requirements for the development of professional knowledge. All further prerequisites for the teacher profession are part of the professional education. Figure 1 visualizes a model that integrates the paradigms into an integrative approach of teacher education and the additional step of initial selection. In contrast to other academic studies, the teaching profession has a clear professional perspective and is directly linked to a later employment description. Therefore, the admission procedures have the responsibility to focus on appropriate criteria which are a prerequisite for academic success and even for future professional success. These characteristics are intended to increase the probability that prospective students will successfully complete their studies and pursue their future careers competently and in the long term (see Mayr, 2012).

4

Figure 1. Integration approach of the research paradigms (personality approach, the process-product paradigm and the expert paradigm; cf. Helmke, 2004; Köller, 2008; Mayr, 2012) and the additional perspective of teacher education admission. 5

Theoretical Background 1.3. Relevant constructs for teacher education and profession In general, most admission examinations focus on prior school achievements, specific knowledge of subjects and cognitive abilities (Kuncel, Hezlett & Ones, 2001). These constructs proved to be stable and valid predictors for later academic achievement. Nevertheless, they are often criticized if used as the only selection criteria. The choice of other selection criteria becomes especially important with a broader concept of success (for example including professional success) and requires research on more differentiated criteria such as overload, dropout and study satisfaction. In line with this, Hell, Trapmann & Schuler (2008) suggest that further research to predict dropout and study satisfaction is needed. As Rindermann and Oubaid (1999) argue, cognitive tests are useful to predict academic achievement, while personality measures predict study satisfaction. General psychological correlates of academic performance, which were extracted in a systematic review and meta-analysis by Richardson et al. (2012), are presented in figure 2. In research on academic success of teacher education students, the relevant constructs do not differ from those of other degree programs, but a special issue is the opportunity of including jobrelevant criteria. Especially considering that 30% of teachers retire because of mental health problems (Jehle, 1997), personality factors that prevent burnout and are valid for academic and job success are of high importance for teacher education selection. Within the teacher context, a review of Mayr (2012) showed meaningful associations of cognitive predictors (such as intelligence and linguistic competence), intrinsic and extrinsic occupational motives, interests and the Big Five on academic achievement. Besides that, motives, interests and personality are relevant predictors for job success as well. Research on the relevance of creativity, emotional constructs, health and recovery behavior as well as other specific constructs like humor are still rare. Regarding the influence of intelligence on later teacher job achievement, 6

Theoretical Background results are inconsistent and also still rare. Podgursky, Monroe and Watson (2004) actually observed that intelligent people tend to drop out of teaching more often. For a detailed overview, please refer to the article on the concept of the admission procedure (Neubauer et al. 2017) in section 5.1. In the following, a short overview of emotion-related, creativity-related and personality constructs is given, as these are thematically addressed in the corresponding articles. Emotional competence The role of emotional competence in career outcomes has become a topic of increasing interest. Emotional competence predicts a variety of academic (Grehan, Flanagan & Malgady, 2011; Mendoza & Hontiveros, 2017; Parker, Summerfeldt, Hogan & Majeski, 2004) and work related outcomes (Brackett, Rivers & Salovey 2011; Joseph & Newman 2010). Furthermore emotional competence is related to well-being (Brackett & Mayer, 2003) and teacher self-efficacy (Chan, 2004; Di Fabio & Palazzeschi 2008; Vesely, Saklofske & Leschied, 2013). Especially in the teacher profession – as a profession with high emotional demand – the relation of emotional competencies, burnout and stress is a topic of high importance (Brennan, 2006; Kyriacou, 1987; Vesely et al., 2013). Handling emotions has a central role in teacher burnout models (Chang & Davis, 2009; Montgomery & Rupp, 2005). This is supported by results of Chan (2006), showing that emotional coping and regulation strategies are linked to lower rates of teacher burnout. The current state of research suggests that emotional competence is essential for the success in teacher education and the teaching profession as well as for the prevention of occupational stress and burnout.

7

Theoretical Background

Figure 2. Meta-analytic correlates of 42 non-cognitive constructs with university GPA (Richardsonet al., 2012). 8

Theoretical Background Creativity-related constructs Even if the importance of creativity in assessments is theoretically reasonable and emphasized in literature (Kaufman, 2010; Sternberg, 2010), there seems to be a gap between the theoretical importance and the use of creativity in selection procedures (Chamorro-Premuzic, 2006). However, first results of longitudinal research indicate that creativity is related to overall grades of four years and shows incremental validity over and above conscientiousness for the final dissertation mark (Chamorro-Premuzic, 2006). In addition to the relations with academic achievements, the influence of creativity-related constructs on later pupils is of particular importance in teacher education. Sternberg (2010, p.92) yielded: “Yet few teachers’ actions kill creativity more effectively than discouraging creative ideas when they are proposed.” Further, teachers are more likely to judge the behavior of creative children as more disruptive than college undergrads (Scott, 1999). These results are supported by the findings of Westby and Dawson (1995). In classrooms, misconceptions about creativity, as an undesirable deviance, lead to suppression of the creative potential of students (Beghetto & Plucker, 2016). These arguments underline the importance of attitudes to creativity for teacher selection, including in particular the judgment of creativity (Benedek et al., 2016) and the openness to creativity (Jauk et al., under review). Personality Many studies have assessed the role of personality in academic performance. In general, conscientiousness is found to be a stable predictor, which can explain incremental validity over and above intelligence. Further academic achievement is related to agreeableness and openness (Poropat, 2009). Another meta-analysis of Alarcon, Eschleman and Bowling (2009) found relations of the Big Five and the three dimensions of burnout: emotional exhaustion (openness: r=.00, conscientiousness: r=-.16, extraversion: r=-.21, agreeableness: r=-.12, emotional stability: r=-.42), depersonalization (openness: r=-.05, conscientiousness: r=-.20, 9

Theoretical Background extraversion: r=-.20, agreeableness: r=-.27, emotional stability: r=-.32), and reduced personal accomplishment (openness: r=.16, conscientiousness: r=.18, extraversion: r=.29, agreeableness: r=.19, emotional stability: r=.24). Similar results were found in a sample of 447 primary school teachers (Kokkinos, 2007). In a study of Müller (2006) with teacher education students, the Big Five proved to be predictors of student interest, which indirectly affects academic achievement. 1.4. Selection of appropriate methods for non-cognitive testing Independent of the measured construct, objectivity, reliability and validity, need to be given for all selection methods (Schuler, 2014). Additionally, the acceptance of applicants, fairness and low susceptibility to faking must be ensured. Within psychological testing, this is usually the case with performance tests. Non-cognitive constructs show predictive validity with regard to various performance criteria and are commonly used in selection methods (Ryan & Ployhard, 2014). Especially in the past 10 years, research on personality testing has dramatically increased. However, the use of selfreport measures is critical discussed in literature (Morgeson et al., 2007a; Ryan & Ployhard, 2014). One of the primary reasons is the susceptibility to faking of self-report measures and the associated measurement error (Rothstein & Goffin, 2006). On the other hand, selection tests are the second best predictor of predict job performance after GPA (Sackett & Kuncel, 2018) and personality measures produce useful incremental validity for predicting performance (Kuncel & Hezlett, 2010). A few meta-analyses found validities with various work-related criteria (for an overview, see Ones, Dilchert, Viswesvaran & Judge, 2007). Observed multiple correlations of the Big Five predicting work-related criteria show validities of R=.27 for overall job performance, R=.37 for individual teamwork and R=.45 for leadership. In conclusion Ones et al. (2007) argue that personality variables show substantial validities in hundreds of peer-reviewed research studies. 10

Theoretical Background To face problems of methods for non-cognitive testing, it is necessary to develop alternative test formats for non-cognitive constructs or to investigate the influence of different item formats. Researchers recommend to focus future research on finding alternatives to self-report measures (Morgesonet al., 2007b) and on investigations regarding the contextualization of questionnaires and items (Ferguson & Lievens, 2017; Shaffer & Postlethwaite, 2012). Reducing the problems of faking and socially desirable response behavior would greatly increase the value of non-cognitive constructs for personnel selection, even if bias caused by real-life faking seems less critical than the effects caused by instructed faking (Krammer, Sommer & Arendasy, 2017). Approaches to deal with these problems are in particular the development of situational judgment tests (SJT) and forced-choice questionnaires. Situational judgment tests measure various constructs within situational settings. An item consists of an item stem, which describes a challenging situation, and numerous response alternatives, which contain possibilities to react to the given situation. Within the last 30 years, the popularity of SJTs as a predictor of performance has increased. SJTs show predictive validities over and above cognitive abilities (McDaniel, Hartman, Whetzel, & Grubb, 2007; O’Connell, Hartmann, McDaniel, Grubb & Lawrence, 2007;Weekley & Ployhart, 2005) and smaller racial group differences than cognitive ability tests (Clevenger et al., 2001).They have a high face validity and acceptance among applicants in selection procedures (Lievens & Sackett, 2006). Forced-choice tests are test formats with good, partly better validity than self-report measures (Bartram, 2007; Goffin, Jang & Skinner, 2011; Heggestad, Morrison, Reeve & McCloy, 2006) and are less susceptible to faking (Bowen, Martin & Hunt, 2002). However, these formats are far from being as well established as SJTs. The reason for this is the generation of ipsative data (Brown & Maydeu-Olivares, 2012), which can only be used for intrapersonal comparisons but not for comparisons between persons, which is necessary for personnel 11

Theoretical Background selection. Nevertheless, new analysis methods (see Brown & Maydeu-Olivares, 2013) appear to be promising solutions to this issue. Self-report questionnaires show good psychometric properties and acceptable validities, but have disadvantages by reason of their susceptibility to faking. Therefore, the use of these methods is discussed critically. Nevertheless, a stronger situational context (e.g. items that are presented as sentences and not as adjectives) may reduce faking behavior (Ferguson & Lievens, 2017). Other studies addressed under which conditions faking might matter most (Komar, Brown, Komar & Robie, 2008), how faking affected psychometric properties (Krammer & Pflanzl, 2015), or if the effect can be reduced by warnings during testing (Fanet al., 2012). For a comprehensive overview on faking literature we refer to Ziegler, MacCann & Roberts (2012). Other measures: Many non-cognitive constructs cannot be operationalized by performance tests. However, constructs that involve the recognition or evaluation competencies can be assessed and coded like performance measures. Examples of this are the ability emotional intelligence (Mayer-Salovey-Caruso Emotional Intelligence Test, MSCEIT; Mayer, Salovey, Caruso & Sitarenios, 2003), the evaluation of creativity (Benedek et al., 2016) or objective personality tests (Ortner et al., 2007).The meaningfulness of these methods is strongly dependent of the measured construct and has to be decided on a case-by-case basis. The selection of the right methods should always be based on test quality and practicability. In addition, self-report procedures should generally be avoided if possible. In this context, SJTs and forced-choice measurements are promising approaches for the use of non-cognitive constructs in selection procedures.

12

Theoretical Background 1.5. Composition of the admission procedure The composition and design of a selection procedure can increase as well as decrease the criterion validity (Sackett, Dahlke, Shewach & Kuncel, 2017) and reduce redundancies for higher efficacy. Other arguments that should be taken into account during the composition are factors independent of validity such as public image, costs and the acceptance of applicants (Hattrup, 2012). Performance – including academic and later job performance – is a multidimensional construct. It includes both hard criteria (e.g. grades, dropout, students’ performance, salary) and soft criteria such as satisfaction or overload. Very general criteria, for example the ECTS-weighted average of all examinations, or very specific criteria, such as special pedagogical expertise or dealing with challenging situations in the classroom, can also be used to assess performance. The shift in perspective to a broader definition of the criterion performance influences the ways of selecting predictor compositions (Hattrup, 2012). The composition should also follow the principles of Brunswick-Symmetry (Wittmann, 2002) and consider research findings of the bandwidth-fidelity dilemma (Cronbach & Gleser, 1965). However, the bandwidth-fidelity dilemma is controversially discussed in literature. On the one hand, researches argue that in order to predict narrow aspects, narrow traits should be chosen (Barrick & Mount, 2003; Hogan & Roberts 1996; Jenkins & Griffith, 2004; Stewart, 1999). On the other hand, findings suggest that specific variance associated with the narrowly defined construct is not related to the specific variance valid for job performance across situations (Ones & Viswesvaran, 1996). An overview of the debate is summarized in Rothstein and Goffin (2006). Furthermore, some studies examined effects of multi-stage and single-stage selection strategies (Finch, Edwards & Wallace 2009; De Corte, Sackett & Lievens, 2011).

13

Theoretical Background As part of the cooperation project TESAT, a three-stage selection procedure was developed. The contents and composition of the selection procedure are described in study 1 in section 5.1. 1.6. Empirical review and validation The fourth step in the development of the selection procedure (chapter: A brief taxonomy of student selection, section 1.1) includes the empirical review of the theoretical model with regard to test quality, validity and fairness. Even if selection procedures for teacher education have been common in Austria since the 60s, only few longitudinal studies have investigated the validity and effectiveness of these procedures. In a three year longitudinal study, Krammer, Sommer & Arendasy (2016) found predictive validities of realistic job expectations measured in the assessment interview for teacher education with declarative and procedural knowledge as well as the grade of the bachelor thesis. In particular, intelligence was related to the grade of the thesis and declarative knowledge, whereas coping with stress was related to procedural knowledge, operationalized through the grades of courses that students teach. Rieder (2011) analyzed two cohorts of students in teacher education over two years. Within this dissertation, numeric and verbal intelligence, creativity, self-control, persistence, fatalistic externality, sociability, cognitive ability and partly elements of an assessment center were predictors for hard and soft criteria of academic success. Besides the effort of longitudinal studies, a major issue in this context is the identification of moderators and mediators in predicting success. Various models to predict performance include moderators such as self-efficacy, goals, motives, intentions or other situational factors. Examples of these models are, for example, Mayr’s offer-use model (2012), the model of social integration as a predictor of academic success (Tinto, 1975) or other models of academic performance (Corker, Oswald & Donnellan, 2012; Petersen, Louw, Dumont & Malope, 2010). 14

Theoretical Background In the unique course of the project TESAT, a longitudinal study was started as part of the development of the admission procedure. The study intends to investigate the validity of the admission examination in the first semester, after the first internship of the students until their first year of employment. The framework conditions for this study are described in section 5.1.

15

Theoretical Background 1.7. The present dissertation Admission requirements have a long history in teacher education in Austria. Since the late 1960s, selection procedures have been implemented at all university colleges of teacher education. According to a legislative revision in 2014, admission examinations for teacher training must also be carried out by universities. So far, the procedures used have been different at each institution and have rarely been empirically validated. For this reason, a cooperation project was launched in 2013 including university colleges of teacher education and universities throughout Austria. The aim of this project was to develop a common model for the admission of teacher education and to establish a fair, efficient, reliable and valid admission procedure. This should create comparable standards for all participating institutions and be submitted to scientific reviews and evaluations. The present dissertation was written as part of this project and follows the following aims: 1) Selection of appropriate constructs for the admission procedure for teacher education 2) Development of novel tests with useful test formats 3) Analysis of the usefulness of different broad and narrow personality measures 4) First results of the longitudinal validation of the selection procedure The dissertation addresses these four aims for the admission procedure for teacher education. A heterogeneous sample from all over Austria with only low variance restriction by the selection could be used. The following chapters describe an outline and the main conclusions of four studies regarding the presented aims.

16

Abstracts of the Publications and further longitudinal research

2. Abstracts of the publications and further longitudinal research

2.1. Selection of appropriate constructs for the admission procedure of teacher education and first results of the longitudinal validation (Study 1) The first study focus on the aims: (1) selection of appropriate constructs for the admission procedure for teacher education and (4) first results of the longitudinal validation of the selection procedure. It gives a summary of the basic concept of the admission procedure for teacher education. Relevant literature and meaningful constructs for teacher student selection with the focus on academic and job performance are discussed. Additionally, first longitudinal findings of the assessment’s predictive validity are presented. Teacher Student Assessment Austria (TESAT) - Concept, characteristics, and first findings Neubauer, A.C., Koschmieder, C. , Krammer, G., Mayr, J., Müller, F., Pflanzl, B., … & Schaupp, H. Successful school education is not only based on students' aptitude but also on their teachers' behavior and therefore their cognitive and non-cognitive predispositions, thus making the selection of appropriate candidates for this job critical. Apart from attraction-strategies and a good system for teacher education eligible candidates can be recruited by means of selection and self-selection. This paper presents a new, scientifically developed selection tool for teacher education, consisting of three modules. This new selection tool is used for student selection at 20 university-level institutions in Austria. It consists of (1) a self-selection tool, which is only used for this purpose and not for assessment and which deals with motivational and personality-based traits relevant for the job of a teacher and gives information to the candidates about the requirements of the career. (2) a standardized computer-based test

17

Abstracts of the Publications and further longitudinal research battery, assessing cognitive, linguistic, emotional, creativity- and personality-related traits. (3) a face-to-face-assessment which is a strictly standardized interview with a duration of approximately 10 minutes. In this interview, further qualities relevant for becoming a teacher are assessed: verbal and non-verbal communication skills, motivational variables or the ability of self-reflection. Further studies and evaluations encompass a longitudinal view on teacher personality and validations based on real-life criteria from the teacher-life. First results on prognostic validity for academic success are reported.

2.2. Development of a novel test with useful test formats for emotion regulation (Study 2) This study deals with the development of a situational judgment test for the assessment of emotional regulation in pedagogical situations (ERIPS). The specific characteristic of its development is the high standardization and theoretical framework of the items and the statistical analysis using Item Response Theory (IRT) to improve construct clarity. Test construction of Situational Judgement tests in focus: Realization within the development of a test for emotion regulation in teacher education Koschmieder ,C., & Neubauer, A.C. Competencies to manage one’s own emotions and those emotions of others are highly relevant in teacher education. This article describes the development of a novel test implemented in the admission exam for teacher studies in Austria. The study aimed to (1) develop, analyze and validate a situational judgment test for emotion regulation in pedagogical situations, and (2) use a mixed approach of inductive and deductive item construction to improve clarity regarding the measured construct. This allows the analysis of item homogeneity and measurement fairness to improve test quality. The final test version comprises 22 items with four response alternatives, each expressing one of four emotion regulation strategies. In two studies, the psychometric quality, fairness and validity of the test 18

Abstracts of the Publications and further longitudinal research and relations with cognitive ability and personality are examined. Results of the first study support the 1PL Rasch model and show measurement fairness for men and women. Additionally, correlations with other tests for emotion regulation and the dark triad were observed. In study 2, openness and agreeableness were related to higher emotion regulation competencies. Interpersonal emotion regulation predicted higher altruistic professional motives, whereas intrapersonal emotion regulation predicted higher teacher self-efficacy. Both are linked to the intention to quit teacher education.

2.3. Development of a novel test with useful test formats for creativity evaluation (Study 3) In the third as well as the second study, a test with a new construct so far– the judgment of creativity – was developed for the admission procedure. A methodological specialty of its construction was the use of a signal detection approach to investigate causes and effects of judgment biases in the evaluation of creativity. Assessment of creativity evaluation skills: A psychometric investigation in prospective teachers. Benedek, M., Nordtvedt, N., Jauk, E., Koschmieder, C., Pretsch, J., Krammer, G., & Neubauer, A.C. An accurate judgement of the creativity of ideas is seen as an important component underlying creative performance, and also seems relevant to effectively support the creativity of others. In this article we describe the development of a novel test for the assessment of creativity evaluation skills, which was designed to be part of an admission test for teacher education. The final test presents 72 ideas that have to be judged as being common, inappropriate, or creative. Two studies examined the psychometric quality of the test, and explored relationships of creativity evaluation skills with cognitive ability and personality. In 19

Abstracts of the Publications and further longitudinal research the first study, we observed that creativity evaluation skills are positively correlated with divergent thinking creativity and creative achievement, which suggests that evaluation skills are relevant for creative ideation as well as creative accomplishment. Across both studies, people tended to underestimate the creativity of ideas. Openness, intelligence and language competence predicted higher creativity evaluation skills, and this effect was partly mediated by a less negative evaluation bias. These findings contribute to our understanding of why people sometimes fail to recognize the creativity in others.

2.4. Analysis of the usefulness of different broad and narrow personality measures (Study 4) The fourth study compares two self-report personality tests in terms of their benefit for the admission procedure. One test measures the broad factors of the Big Five (Big Five Inventory (BFI-42); John & Srivastava, 1999; German version by Lang, Lüdtke & Asendorpf, 2001), while the other test, including situationally contextualized items, measures three specific personality factors (Inventory for Personality Assessment in Situations (IPS); Schaarschmidt & Fischer, 2007) with 15 narrow domains. The impact of personality in the selection of teacher students: Is there more to it than the Big Five? Koschmieder, C. , Weissenbacher, B. , Pretsch, J., & Neubauer, A. C. The bandwidth-fidelity dilemma is a controversially discussed problem in personality measurement. In this study we contrast the utility of broad versus narrow personality traits in an admission exam for teacher students. We compared the Big Five and narrow personality constructs (social-communicative behavior, achievement behavior, health and recreation behavior) regarding overlap and predictive validity. They were part of an assessment battery for teacher student selection (N = 1120). As criterion variables, academic satisfaction (N = 20

Abstracts of the Publications and further longitudinal research 184) and GPA (N = 680) were assessed later. Reasonableness of including both questionnaires in one assessment can be questioned, regarding the overlap of the personality inventories. Results show that health and recreation behavior cannot be covered by the Big Five in a selection procedure. Empirically, both broad and narrow traits show predictive validity for academic success and satisfaction.

2.5. Focus on future as a base for longitudinal research After the implementation of the admission examination in 2014 and its first validation (for results see Neubauer et al. 2017, section 5.1), two longitudinal follow-ups were started for the cohorts 2015 and 2016. Throughout Austria, 11 institutions (universities and university colleges of teacher education) took part in this follow-ups survey in the first semester. Both cohorts samples (N2015= 939, N2016 =1048) could be linked to the anonymized data of the admission examination by codes. This unique situation enabled a high generalizability of the results due to low variance restriction within the selection process and a nationwide sample. In a second and third follow-up, these cohorts should be tested again shortly before graduation and at the end of the first year as a teacher. In the first cohort, moderators such as selfefficacy, motives, goals and learning opportunities as well as various hard and soft criteria such as grades, study satisfaction, the intention to quit, dropout and work overload were assessed. Results could also be controlled for pedagogical knowledge and pedagogical experience before studying teacher education. In the second follow-up, a special focus will be placed on criteria related to internship experience and job success. In summary, this follow-up research will provide new information concerning the validity of the admission procedure for teacher education and later professional success of teachers.

21

General Discussion

3. General discussion The aim of this dissertation and its four studies was the development of an admission procedure for teacher education including (1) a theoretical framework, (2) the development of new test formats for emotion regulation and creativity evaluation, (3) the comparison of a broad and a narrow personality test and (4) a first longitudinal validation (refer to section 1.7). The studies reported a broad range of predictors with good psychometric properties. Thus, this dissertation forms a basis for further longitudinal research. 3.1. Assumptions on selection criteria for teacher education In study 1, a theoretical framework with a multistage structure – self-assessment, computer assessment and structures interview – was introduced. Well established predictors such as conscientiousness or intelligence and new constructs such as creativity evaluation were implemented in the assessment. Constructs were combined in a trimodal approach (construct approach, simulation approach and biographical approach; Päßler, Hell & Schuler, 2011) and particular attention was paid to high psychometric standards of the assessment methods. Results of study 1 suggest no difference in constructs relevant to academic success in teacher studies and other studies. Intelligence, language competence, emotion regulation and conscientiousness are meaningful predictors for academic achievement in teacher education (study 1) as well as for academic achievement in other studies (Kuncel & Hezlett, 2010; Poropat, 2009; Song et al., 2010). With regard to a broader concept of success including jobrelevant criteria of teachers, a few predictors are particularly relevant. Teacher abilities that foster classroom management, learning opportunities for children and pedagogical competencies are unique requirements of teachers. In this case, competencies related to creativity, emotion management and leadership lead to better student performance (Koh, Steers & Terborg, 1995; Rinkevich, 2011; Sutton & Wheatley, 2003). The importance of

22

General Discussion emotion regulation in the admission for teacher education is supported by the results of study 1 and study 2. Emotion regulation proves to be one of the most promising constructs of the admission examination with several opportunities for further research. In line with a model of Jennings and Greenberg (2009), emotional competencies are expected to influence various student and classroom outcomes and should support the prevention of burnout. With regard to creativity, empirical research on the abilities of teachers who promote creativity in classrooms is still rare. In study 3, a new test for creativity evaluation was developed. Next, predictive validities, especially effects on students’ creativity in classrooms, should to be investigated. Similar to the process model of emotion regulation (Mayer & Salovey, 1997), a process model for creativity judgment could be taken into account. This could consist of (1) the perception of creativity, (2) the understanding of and attitude towards creativity and (3) the management of creativity. If teachers do not recognize students’ creativity, it is not possible for them to react to it or value it (step 1). Further, the attitude towards and openness to creativity in classrooms should be taken into account. Teachers tend to dislike creative behavior of students and value creativity as disruptive (Kettler, Lamb, Willerson & Mullet, 2018; Westby & Dawson, 1995). In support of creativity in classrooms, teacher’s attitudes and perceptions regarding creativity must be recognized (step 2; Fryer & Collings, 1991). According to this, the development of tests for openness to creativity (Jauk et al., under review) are relevant for the admission for teacher education. In the last step (step 3), strategies and behavior to support students’ creative achievements and to foster creativity in classrooms should be investigated in a situational context. However, the last step should be a part of the educational program and not a requirement included in the admission as a pedagogical competence, as it is part of the professional knowledge to be learned when studying. The CET would relate to the model’s first step. Regarding personality factors investigated in study1 and study 4, the Big Five are meaningful predictors in the selection procedure. Furthermore, other characteristics such as career 23

General Discussion commitment and health behavior should not be ignored. Especially health and recovery behavior is an important construct for the prevention of physical and psychical symptoms (Schaarschmidt, 2005). In the future, scientific findings and social changes can lead to a reduction or expansion of the admission’s theoretical framework. With the increase of diversity and individual support in classrooms, further constructs, e.g. openness for diversity (Butrus & Witenberg, 2013; Wang, Castro & Cunningham 2014; Witenberg, 2007) will become relevant for the admission procedure. Further, realistic job expectations are assessed in the self-assessment and in the interview, the third stage, which is only conducted with applicants for primary school teacher education. Most prospective students apply for university directly after school graduation with beliefs about the teaching profession from a pupil’s perspective. This leads to incomplete and partly wrong expectations of teachers’ typical responsibilities. Realistic job expectations reduce dropout, affect academic achievement and increase job satisfaction (Baur, Buckley, Bagdasarov & Dharmasiri, 2014; Earnest & Dwyer, 2010; Krammer et al., 2016). For this reason, it would be important to implement realistic job expectations in the admission for all applicants. In summary, the combination of the predictors is a composition of (1) stable constructs that will always be relevant and (2) flexible attributes that must be adapted to changing demands of students and the teaching profession.

24

General Discussion 3.2. Assumptions regarding measurement Even the best theoretical framework is useless with insufficient operationalization. Measurements have to meet psychometric standards in order to enable an appropriate selection. In line with recommendations from literature (Ferguson & Lievens, 2017; Morgeson et al., 2007b; Shaffer & Postlethwaite, 2012), special attention was paid to the development of new test formats for non-cognitive constructs and the investigation of the situational contextualization of items in study 2, study 3 and study 4. In study 3, a completely new test format for creativity evaluation was developed, which was implemented in the CET. As a result of low item difficulty, the CET shows high measuring accuracy for people with low creativity evaluation competence. Consequently, a test that can distinguish on a low competence level is better suited for “low” stakes than for high stakes selection. In a next step, more difficult items should be developed for a better discrimination in a high stakes selection context. In terms of constructs including the evaluation of judgments, indices from signal detection theory (informedness, sensitivity and specificity) can provide information on biases and their relation to other personality constructs. In study 2, a situational judgment tests was developed for the emotion regulation in pedagogical situations (ERIPS) based on the theoretical framework in study 1. The test shows good psychometric properties of a two factor structure – interpersonal and intrapersonal emotion regulation – with satisfactory convergent and discriminant validities. Results support – in line with previous research – the value of situational judgment tests for selection procedures. A novelty in the construction of the ERIPS was the combination of inductive and deductive item development. This mixed approach was used to increase item standardization and construct clarification. This determines a better understanding of “what” a SJT actually

25

General Discussion measures. Lack of knowledge about the measured construct is a major issue in SJT literature (Gessner & Klimoski, 2006). In research, many articles and experiments investigate variations in the test format to understand the complexity of SJTs. In line with this, Campion, Ployhardt and MacKenzie (2014) summarized future directions for SJTs. This thorough investigation of effects regarding the test format can be used as an example for other approaches to test construction, e.g. forced-choice measures, situational personality tests or objective personality tests. Study 4 showed a large overlap of two personality tests: One presented the items as sentences and one in the context of situation descriptions. Contextualized items showed a higher acceptance among applicants (Bing, Whanger, Davison & VanHook, 2004). However, higher contextualization could increase construct heterogeneity. To face this issue, IRT analyses conducted in study 2 could improve item homogeneity. Whether a higher contextualization of items reduces the susceptibility to faking, needs to be examined in further studies. In terms of faking, it is common to compare faking-good conditions and faking-bad conditions with an honest condition (Viswesvaran & Ones, 1999). However, only a few studies have examined the effects of real-life faking. These first studies indicate that effects of faking-good conditions are considerably higher than effects of real-life selection conditions (Krammer & Pflanzl, 2015). There are many models that describe the faking process (McFarland & Ryan, 2000; Mueller-Hanson, Heggestad & Thornton, 2006, Roulin, Krings & Binggeli, 2016; Snell, Sydell & Lueke, 1999), but they provide less empirical information about what actually happens during the faking process (Krammer, Koschmieder, Müller & Pflanzl 2018). More information about this process would facilitate the development of less fakeable personality tests (for an overview of faking literature see: Ziegler et al. 2012). Another issue with regard to selection is test fairness. In most cases, the question of whether the test psychometrically measures the same within subgroups is ignored. This can be 26

General Discussion addressed with IRT (e.g. conducted in study 2) or with measurement invariance analyses of structural equation models. This should not only include psychological tests, but also other assessment methods such as interviews (Krammer et al., submitted). In summary, the developed tests show high psychometric standards. Further studies on fairness and faking susceptibility could improve the measurement accuracy of personality tests in the admission procedure for teacher education. Additionally, one could implement forced-choice tests for personality constructs to reduce faking or include personality profiles for a more accurate scoring and better validities. 3.3. Multidimensional prediction of academic achievement and later teaching performance First analyses show promising predictive validities compared to validities reported in Schuler (2014) regarding hard criteria – overall grades of the first semester – and soft criteria – study satisfaction and the intention to quit studying. However, the generalizability of these initial results of study 1 should be examined in further studies, as they were calculated on the basis of a sample from the first semester of only one university. For this reason, greater follow-up samples were tested in 2015 and 2016. Furthermore, professional success criteria should also be assessed. Indicators for teachers’ professional success differ compared to other professions. Usually, income is used as a criterion for job performance. In the teacher profession, income is only related to years of employment. In addition, neither numbers of employees nor leadership positions are meaningful criteria. Job performance of teachers is much more related to their students. Indicators such as student evaluations, student knowledge increase, behavior in critical situations and pedagogical knowledge could be used as criteria for teacher job performance.

27

General Discussion Regarding academic retention, a mediation model, which tested indirect effects of emotion regulation over self-efficacy and altruistic motives on the intention to quit studying, could account for 9% of the variance of the intention to quit in study 2. In contrast to findings in study 1, no correlations with the intention to quit were found. So far, we have no knowledge of other studies investigating the influence of emotion regulation on dropout. Studies support correlations of emotion regulation with academic achievement and other predictors of dropout such as self-efficacy (Pfitzner-Eden, 2016), motives (Jungert, Alm & Thornberg, 2014) or satisfaction (Douglas, Douglas & Barnes, 2006). In fact, a bigger and more generalizable sample was used in study 2. A replication of the analyses with data from the 2016 cohort will provide more information in this context. Further validations are also recommended for the other studies. The CET has not been validated with any performance criteria so far. Creativity evaluation is a relevant criterion in classroom situations. Due to this, a validation within a sample of teachers would be important. In study 4, first predictive validities of broad and narrow personality constructs were found. Even if the explained variance was relatively small, low but stable validities are expected throughout the applicants’ later professional life. To structure the information regarding validities in total, multitrait-multimethod matrices (Campbell & Fiske, 1959) could be used. The first calculated validities show an expected pattern in line with the literature. However, the correlations are lower than assumed in some cases. This could be attributed to the heterogeneity of the group of teacher education students. The admission procedure is carried out for applicants of all teaching subjects. Among them, some applicants want to teach in elementary schools and some applicants want to teach in secondary schools (high school, new secondary school, vocational school). The attributes of the admission procedure should be relevant for all future students of teacher education. Probably validities will vary within different subsamples.

28

General Discussion So far, only linear predictor-criterion relations were investigated. Le et al. (2011) found curvilinear relations of conscientiousness and emotional stability with job performance. First studies indicate a non-linear relation of intelligence and teachers’ job fluctuation as well. Public school teachers with a high ability tend to quit their profession more often than teachers with lower cognitive ability (Podgursky et al., 2004). However, no curvilinear correlation between intelligence and the achievement criteria could be found in study 1. In further research, analyses should investigate non-linear relations for other assessment criteria. In summary, initial analyses show promising results. Nevertheless, research on the admission procedure in Austria is still at the beginning. To gather more information, many opportunities for further research exist. On the basis of our findings and additional literature, an overview of the expected prediction of the assessment procedure in the framework of the planned longitudinal study was compiled in table 1. The assessment includes three steps: (1) module A, an online self-assessment, (2) module B, a standardized computer test, and (3) module C, a structured interview (for detailed overview see Neubauer et al. 2017). To provide a facilitated overview, the intended performance criteria are partially summarized in this table. For every predictor, two representative studies are added. 3.4. Restrictions Within the last four years, the common admission exam was implemented in the organizational process of universities and university colleges of teacher education. So far, validations of the entire assessment battery are based only on a small sample from the University of Graz (study 1). In addition, many predictors will become relevant at a later stage of teacher education. As a result, it is not yet possible to make a definite statement about their relevance on the basis of this dissertation. Another issue is the fluctuation within the admission process. Even if the selection in the assessment is small, there seems to be a high fluctuation during the admission process. This self-selection leads to variance restrictions in 29

General Discussion the validation later on. At that moment, it is not clear when and why applicants disappear during the admission process. So far, the validation of the assessment has stayed at a general level. With regard to the heterogeneous nature of teacher education students, more complex and differentiated hypotheses should be investigated in further validation analyses. These should include differences of education for primary and secondary school teachers and different combinations of teaching subjects (e.g. natural sciences and social sciences). Further, some constructs in the assessment battery (e.g. emotion recognition, creativity evaluation and health and recovery behavior) have not been developed in order to be related to typical performance criteria. In this case, specific criteria need to be developed. These were not assessed in the present dissertation.

30

Table 1: Expected predictors and moderators for academic and professional performance Nr.

Academic Performance

x

Module B

Intelligence

5

linguistic competence

x

6

emotion regulation

x

7

emotion recognition

8

creativity recognition

9

health and recovery behavior

knowledge grades (T1, T2)

4

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x x

x

Openness to pupils’ achievements (T2, T3)

pedagogical experience

x

satisfaction with career choice (T1, T2, T3)

3

x

x

References

methodical competence (T2, T3)

x

x

classroom management (T3)

teacher personality

x

pedagogical knowledge (T1, T2, T3)

2

x

internship grades (T2)

x

pupil rating (T2, T3)

Pedagogical interests

Professional performance

study overload (T1, T2)

Module A

1

intention to quit & dropout (T1, T2)

Admission exam

academic satisfaction (T1, T2)

Module

x

x x

x

x

x

x x

x

x x

Schiefele, Krapp & Winteler (1992)1 Kaub, Karbach, Spinath & Brünken (2016)1,2,3 Ref. to line 10-14 Wernimont & Campbell (1968)2 Depaepe & König (2018)1,3 Kuncel & Hezlett (2010)1 Podgursky et al. (2004)1,3 Neubauer et al. (2017)1,3 Aloe & Becker (2009)2,3 Kuncel et al. (2001)1 Neubauer et al. (2017)1,3 Chan (2006)2,3 Joseph & Newman (2010)2 Koschmieder & Neubauer (submitted)1,3 Neubauer et al. (2017)1,3 Grehan et al. (2011)1 Jauk et al. (under review)2,3 Davidovitch & Milgram (2006)2,3 Krammer et al. (2016)1,2,3 Schaarschmidt (2005)2,3

31

Nr.

x

12

extraversion

x

13

agreeableness

x

x x

x

x

x

x

x

x

x

x

x

16

pedagogical experience

x

x

x

17

task reflection skills

Module C

x

x

x

x

x

x

x

x Baur et al. (2014)3 Earnest & Dwyer (2010)1 Krammer et al. (2016)1,2,3 Ref. to line 2

x

x x

x

x

x

x x

x

x

Taylor & Small (2002)2 Beauchamp (2015)2,3

Ref. to line 10-14 Schaarschmidt & Fischer (2007)2,3 verbal competence x x x Aloe & Becker (2009)2,3 19 Coleman et al. (1966)2,3 1 Notes: T1=follow-up in the first semester, T1=follow-up after the first internship, T3=follow-up in the first year as a teacher; academic achievement, ²professional performance, ³teacher context; 18

teacher personality

x

Trapmann, Hell, Hirn & Schuler (2007)1 Müller (2006)1,3 Mayr & Mayrhofer (1994)1,3 Mayr & Neuweg (2006)2,3

x

Realistic job expectations

15

x

x

neuroticism

14

Openness to pupils’ achievements (T2, T3)

x

References satisfaction with career choice (T1, T2, T3)

x

methodical competence (T2, T3)

conscientiousness

classroom management (T3)

11

pedagogical knowledge (T1, T2, T3)

x

x

internship grades (T2)

x

Professional performance

pupil rating (T2, T3)

openness

knowledge grades (T1, T2)

10

study overload (T1, T2)

Module B

Academic Performance

intention to quit & dropout (T1, T2)

Admission exam

academic satisfaction (T1, T2)

Module

x

hypothesized moderators: self-efficacy (Pfitzner-Eden, 2016), motives (Jungert, Alm & Thornberg, 2014), use of learning opportunities (Müller, 2006); further differences regarding gender, school type and the combination of subjects (e.g. natural sciences and social sciences students) are expected.

32

Conclusion

4. Conclusion Study admissions procedures and the creation of interventions to improve study conditions are two very important and widely discussed topics in science and politics. The extension of admission procedures for academic institutions is an issue of high relevance in Austria and Germany. In the winter semester of 2015/2016, the number of students at Austrian universities was 370,000. Until the winter semester of 2035/2036, this number is expected to increase to 423,000 students. In 2016, the government paid 12,275 € on average for every new student per semester (University Report: Bundesministerium für Wissenschaft, Forschung und Wirtschaft, 2017). However, one third of all new students drop out of their studies. At the present time, there is “little reliable information available on which factors are most important in connection with dropping out of studies and to what extent they can be addressed by interventions ” (Unger et al., 2009, p.7). Some countries, e.g. Finland, are trying to increase the social standing of the teacher profession to recruit more applicants for teacher education. A large pool of applicants enables a selection of the most qualified candidates with higher admission requirements. Through the development and empirical validation of the common nationwide admission procedure, Austria plays an important role in research on academic admission and teacher education. This dissertation addresses the development of a common, fair, reliable and valid admission procedure for teacher education with high psychometric properties. This included (1) the selection of appropriate constructs, (2) the development two of novel tests with useful test formats, (3) the analysis of two different – broad and narrow – personality measures and (4) first results of a longitudinal validation. The admission procedure is theoretically and empirically founded with a high generalizability due to the cooperation with institutions all over Austria. Results support the importance of emotion regulation in the admission examination and show first promising results for 33

Conclusion creativity recognition. In the test development, new approaches for test construction provided further information and increased test quality. An inductive and deductive item development and the use of IRT led to a higher standardization and better construct clarity of the ERIPS. Signal detection indices supplied relevant information related to judgement biases. The comparison of two personality tests with similar constructs in a real-selection settings was important to adapt the procedure specifically to the context. In line with the results, a combination of the broad Big Five extended by the factor health and recovery behavior is recommended for the admission procedure for teacher education. Nevertheless, it is important to develop tests with less susceptibility to faking (e.g. forced-choice measures; Rothstein & Goffin, 2006) and to increase research on the faking process within real-life selection. Taken together, these studies build a strong theoretical and empirical framework for further longitudinal research. In his study, Hattie (2009) mentions a shift in attention from the teacher’s personality to the teacher’s professional behavior (“what teachers do matters”). For the development of this professional knowledge, teacher education studies are of great importance. The basis of this knowledge and behavior acquisition is an admission procedure that focuses on the applicants’ personality as a prerequisite for their later professional knowledge and behavior as a teacher. In the case of the admission, results support the importance of the statement: “Teachers make the difference!” and “What teachers do matters!” Hattie (2009)

34

Full Texts & Dissemination

5. Publications (full texts & dissemination) These four publications are the basis of the present dissertation: Study 1: Neubauer, A., Koschmieder, C., Krammer, G., Mayr, J., Müller, F., Pflanzl, B., … Schaupp, H. (2017). TESAT - Ein neues Verfahren zur Eignungsfeststellung und Bewerberauswahl für das Lehramtsstudium: Kontext, Konzept und erste Befunde. Zeitschrift für Bildungsforschung, 7(1), 5-21. Study 2: Koschmieder, C., & Neubauer, A. C. (submitted). Test construction of Situational Judgment tests in the spotlight: Realization through the development of a test for emotion regulation in teacher education. European Journal of Psychological Assessment. Study 3: Benedek, M., Nordtvedt, N., Jauk, E., Koschmieder, C., Pretsch, J., Krammer, G., Neubauer, A.C. (2016). Assessment of creativity evaluation skills: A psychometric investigation in prospective teachers. Thinking Skills and Creativity, 21, 75-84. Study 4: Koschmieder, C., Dumfart, B. , Pretsch, J., Neubauer, A. (accepted). The impact of personality in the selection of teacher students: Is there more to it than the Big Five? Europe’s Journal of Psychology. Furthermore, these publications served as a basis for several national and international conference presentations: Koschmieder, C., Pretsch, J., Vollmann, H., & Neubauer, A. C. (2017). Development of a Selection Tool for Prospective Teacher Students. Symposium at the Conference of the International Society for the Study of Individual Differences (ISSID) in Warsaw, POL, July 2017. 35

Full Texts & Dissemination Pretsch, J., Koschmieder, C., & Neubauer, A. C. (2016). Teacher Student Selection in Austria – First results on predictive validity. Invited symposium at the 18th European Conference on Personality in Timişoara, ROU, July, 2016. Koschmieder, C., Pretsch, J., & Neubauer, A. C. (2015). Erfassung von Persönlichkeitsmerkmalen und sozialen Kompetenzen im Aufnahmeverfahren von Lehramtsstudierenden. Beitrag präsentiert auf dem Kongress der Österreichischen Gesellschaft für Forschung und Entwicklung im Bildungswesen (ÖFEB) in Klagenfurt, AUT, September 2015. Koschmieder, C., Pretsch, J., & Neubauer, A. C. (2015). Emotional Intelligence, Personality and General Mental Ability in teacher Student Selection: An Examination of Predictive Validity and Overlap. Presentation at the Conference of the International Society for the Study of Individual Differences (ISSID) in London (Ontario), CAN, July 2015. Koschmieder, C., Pretsch, J., & Neubauer, A. C. (2014). Kooperationsprojekt „Entwicklung und Durchführung eines einheitlichen Aufnahme- und Auswahlverfahrens für Lehramtstudierende in Österreich“. Poster präsentiert auf der Tagung „Beratung und Förderung von Lehramtsstudierenden“ in Kassel, AUT, April 2014.

36

Full Texts & Dissemination

5.1. Study 1: TESAT - Ein neues Verfahren zur Eignungsfeststellung und Bewerberauswahl für das Lehramtsstudium: Kontext, Konzept und erste Befunde

Neubauer, A.C., Koschmieder, C. , Krammer, G., Mayr, J., Müller, F., Pflanzl, B., Pretsch, J. & Schaupp, H.

37

Full Texts & Dissemination

38

Full Texts & Dissemination

39

Full Texts & Dissemination

40

Full Texts & Dissemination

41

Full Texts & Dissemination

42

Full Texts & Dissemination

43

Full Texts & Dissemination

44

Full Texts & Dissemination

45

Full Texts & Dissemination

46

Full Texts & Dissemination

47

Full Texts & Dissemination

48

Full Texts & Dissemination

49

Full Texts & Dissemination

50

Full Texts & Dissemination

51

Full Texts & Dissemination

52

Full Texts & Dissemination

53

Full Texts & Dissemination

54

Full Texts & Dissemination

5.2. Study 2: Test construction of situational judgment tests in the spotlight: Realization through the development of a test for emotion regulation in teacher education

Koschmieder, C. & Neubauer, A.

55

Full Texts & Dissemination Test construction of situational judgment tests in the spotlight: Realization through the development of a test for emotion regulation in teacher education

Corinna Koschmieder a Aljoscha C. Neubauer a

a – Dept. of Psychology, University of Graz, Austria

Universitätsplatz 2 8010 Graz Austria

Corresponding Author: Corinna Koschmieder Dept. of Psychology, University of Graz, Austria Universitätsplatz 2 8010 Graz Email: [email protected] Phone: ++43 (0) 316 / 380 - 8505

56

Full Texts & Dissemination Abstract Competencies to manage one’s own emotions and emotions of others are highly relevant in teacher education. This article describes the development of a novel test implemented in the admission exam for teacher studies in Austria. The study aimed to (1) develop, analyze and validate a situational judgment test for emotion regulation in pedagogical situations, and (2) use a mixed approach of inductive and deductive item construction to improve clarity regarding the measured construct. This allows the analysis of item homogeneity and measurement fairness to improve test quality. The final test version comprises 22 items with four response alternatives, each expressing one of four emotion regulation strategies. In two studies, the psychometric quality, fairness and validity of the test and relations with cognitive ability and personality are examined. Results of the first study support the 1PL Rasch model and show measurement fairness for men and women. Additionally, correlations with other tests for emotion regulation and the dark triad were observed. In study 2, openness and agreeableness were related to higher emotion regulation competencies. Interpersonal emotion regulation predicted higher altruistic professional motives, whereas intrapersonal emotion regulation predicted higher teacher selfefficacy. Both are linked to the intention to quit teacher education. Keywords: situational judgment test; Teacher Student Assessment Austria (TESAT); emotion regulation; item response theory;, emotion regulation in pedagogical situations (ERIPS)

57

Full Texts & Dissemination Introduction Situational judgment tests (SJTs) are a well-established, effective and widely used test format in personnel selection procedures (Christian, Edwards, & Bradley, 2010; Ryan & Ployhart, 2014). This could be due to their good predictive validities, their lower susceptibility to faking, their good acceptance among applicants and their similarity to job simulations (Lievens, 2006; Lievens, Peeters, & Schollaert, 2008; McDaniel & Nguyen, 2001; Schuler & Marcus, 2006). In meta-analyses, predictive validities for SJTs vary between .21 and .41 for job performance and show incremental validities over and above cognitive ability and personality (McDaniel, Morgeson, Finnegan, Campion, & Braverman, 2001). Further evidence of good criterion validity with regard to academic performance has been found (Lievens & Sackett, 2012). Despite these advantages, SJTs face issues in test development. “In fact, the communality between these tests that share a format has not been clearly defined” (Gessner & Klimoski, 2006, p.29). McDaniel et al. (2001) describe SJTs as a measurement method that can be used to assess a variety of constructs. As for all psychological tests, it is important to have a clear definition of the measured construct, which can be reviewed by structure analysis and construct validity. However, in the past, developers of SJTs may have paid less attention to the construct validity of their measures than the developers of other psychological instruments (Bergman, Drasgow, Donovan, Henning, & Juraska, 2006; Christian et al., 2010), as well as to the definition of the construct they measure and its structure. Researchers additionally argue that the constructs best measured with SJTs are heterogeneous in nature (Lievens & Coetsier, 2002). This could be due to the specific situational context and the fact that the critical incident technique (CIT; Flanagan, 1954) – which is typically used to develop SJTs – generates items in an inductive approach, which could increase heterogeneity in the construct. Lack of clarity about 58

Full Texts & Dissemination the measured construct of this test format is still an issue in terms of the SJT literature, even though numerous attempts have been made in recent years to overcome it (De Meijer, Born, van Zielst, & van der Molen, 2010; Jackson, LoPilato, Hughes, Guenole, & Shalfrooshan, 2017; Westring et al., 2009). In the complex construction process, it is important to focus on this topic, because already small changes implicate differences in the validities (Muck, 2013). For example, validities vary depending on whether situational knowledge or behavioral tendency is measured (Freudenthaler & Neubauer, 2007; McDaniel, Hartman, Whetzel, & Grubb, 2007). The construction process of an SJT entails many decisions. Test developers have to ask themselves the following questions, among others: What is the exact construct I want to measure? How do I choose the response alternatives? Which type of scoring do I want to apply? What kind of instruction do I want to use? Who has the appropriate expertise to rate the response alternatives? Do I use a specific situational context? Bearing this information in mind, it would be important to investigate the influence of different decisions and levels of standardization on the construct (Campion, Ployhart, & MacKenzie, 2014). In the last ten years, research on the effect of different response formats or scorings has increased (Bergman et al., 2006; De Leng et al., 2017; Weng, Yang, Lievens, & McDaniel, 2018), but other topics, such as the identification of new ways to construct situations and items, remain neglected. Research indicates that 81.8% of the SJTs are developed by critical incidents (Campion et al., 2014). Weekley, Ployhart and Holtz (2006) recommend that one could manipulate various features across cells (e.g. item stem development) and that the effects on SJT psychometric equivalence could be examined. In this study, we address the lack of research within the development of an SJT to measure emotion regulation in pedagogical situations (ERIPS) as part of the admission procedure for 59

Full Texts & Dissemination teacher education. Within this process, we pay attention to a high standardization of the construct in item stem and response alternatives and use item response theory (IRT) for testing psychometric equivalence (in gender, performance, item difficulty and different item stem contexts) in order to improve construct clarification. Emotion Regulation Measurement Paradigms in an SJT Setting In literature, some SJTs for emotional constructs already exist (some examples: Situational Test of Emotional Understanding (STEU) and Situational Test of Emotion Management (STEM) developed by MacCann & Roberts, 2008; Sharma, Gangopadhyay, Austin, & Mandal, 2013) and the number of publications containing the term “emotion regulation” is consistently increasing (Gross, 2007). Over recent years, many different concepts have emerged to describe how emotions are dealt with (for a review see Neubauer & Freudenthaler, 2005). In their widely accepted process model, Mayer and Salovey (1997) differentiate between four factors of emotional intelligence (EI): (1) perception of emotion; (2) emotional facilitation of thought; (3) understanding emotions; and (4) managing emotions. The ERIPS is an SJT that focuses on the model’s fourth step and on definitions of emotional competence by Petermann and Wiedebusch (2008) and Gardner (1983). Both include the regulation of one’s own emotions as well as the ability to regulate the emotions of others and therefore differentiate between intra- and interpersonal emotion regulation. Nevertheless, there is some evidence that EI is a predictor of performance in diverse contexts (Van Rooy & Viswesvaran, 2004), such as academic performance (Neubauer et al., 2017; Parker, Summerfeldt, Hogan, & Majeski, 2004), job performance (Côté & Morgan, 2002) and life satisfaction (Sharma et al., 2013). Emotion regulation shows the highest correlation with job 60

Full Texts & Dissemination achievement in all facets of EI (Joseph & Newman, 2010). This relation is moderated by the amount of emotional labor in the job. For jobs with high emotional labor, emotion regulation is a positive predictor (β=.17), whereas for jobs with low emotional labor, emotion regulation is a negative predictor (β=-.11, Joseph & Newman, 2010). For the teaching profession – a profession with high demands of emotional labor – EI seems to be a meaningful predictor, although relevant literature is still rare. Research indicates that high EI has a positive effect on pedagogical competencies and a negative effect on perceived job pressure (Mayr, 2012). In addition, emotionally competent teachers describe themselves as being better at classroom management as well as student engagement (Di Fabio & Palazzeschi, 2008). The Present Research The current study addresses the development and psychometric examination of the emotion regulation test in pedagogical situations (ERIPS). This test was designed in the course of the project “Teacher Student Assessment Austria (TESAT; Neubauer et al., 2017) to be included in the admission exam for teacher education. Emotion regulation competencies are regarded as an important prerequisite for burnout prevention and enable teachers to support pupils in their everyday school life. In the process, we paid particular attention to the theoretical framework for the development of situations (as recommended by Campion et al., 2014) and the response alternatives. The given response alternatives were standardized based on an underlying model for dealing with emotions. In fact, all response alternatives were constructed according to four emotion-regulation strategies (Aldao, Nolen-Hoeksema, & Schweizer, 2010). This led to a mixed approach of inductive (use of critical incidents, CIT; Flanagan, 1954) and deductive (use of a theoretical framework standardized up to the response alternatives) item construction. The second aim was to test psychometric fairness for gender. Due to the fact that 1) measurement 61

Full Texts & Dissemination equivalence for men and women in SJTs is unexplored, and 2) in most SJTs, women perform better than men (Whetzel, McDaniel, & Nguyen 2008), it is important to scrutinize that the test performance cannot be traced back to any measurement errors. Finally, we followed the recommendation of Weekley et al. (2006) and manipulated the pedagogical situational context. Items with a classroom context or other pedagogical contexts were constructed to test for psychometric equivalence of the different item stems. We created an initial test version consisting of 40 items. In study 1, a thorough test analysis was performed using IRT to test for psychometric equivalence and to verify a 1PL Rasch Model and first validation. The examined test version was used in the teacher admission test in 2015 with a longitudinal follow-up in the first semester (study 2). The second study was analyzed for purposes of structural inspection with confirmatory factor analysis (CFA) regarding a bigger sample and validation. This resulted in the final test version consisting of 22 items. Study 1: Test Development Methods Participants A total of 188 high school students close to graduation (117 women (62.2%) and 71 men (37.8%)) participated in study 1. This sample was chosen because most of the applicants in the admission procedure apply for university during their last year in school. The youngest participants were 16 and two participants were older than 20 years (M=17.42, SD=0.96). At the time of participating in the study, 31.9% of the participants were considering studying teacher education. Tests and Measures 62

Full Texts & Dissemination Situational judgment test for emotion regulation in pedagogical situations (ERIPS). Test construction. The collected situations were – as is usual for SJT development – directly derived using CIT (Flanagan, 1954) via a structured expert interview. Teachers and experts from other pedagogical contexts (scout groups, fire brigades, theater groups etc.) were interviewed. Particular attention was paid to creating situations including the emotions of the emotion construct (fear, joy, anger, sadness, disgust, surprise, interest, contempt, shame, guilt) by Izard (1994). Based on statistical analyses, the following emotions remained in the final test version: anger, shame, fear, grief, joy, disgust, guilt. In addition, the information in the item stem was standardized. All situations described have a similar length and are linguistically designed to minimize associations with verbal intelligence. The following guideline questions can be answered for each situation: 1.) Where is the teacher?; 2.) What is happening in the situation?; 3.) Who experiences the emotion and why?; 4.) What does the class/group do? Possible response alternatives were generated by experts in interviews and in an online survey, which was completed by teachers and pedagogical experts. Subsequently, response alternatives were selected so that every alternative corresponded to one of the four emotion regulation strategies (rumination, suppression, reappraisal, acceptance). Interpersonal emotion regulation items focus on providing guidance or support to others based on these strategies. As far as intrapersonal emotion regulation items are concerned, care has been taken to ensure that the response alternatives were as independent as possible from other people. The displayed reactions described in the response alternatives should be as immediate to the situation as possible. With this procedure, the construct was standardized down to the level of the response alternatives and,

63

Full Texts & Dissemination in addition to the previously used inductive construction, a theory-based deductive construction was applied. Coding. SJTs can be coded in various ways. Bergman et al. (2006) refers to six different scoring methods, which again can be broken down into further subcategories. The final items of the ERIPS were coded by the expert ratings of five clinical psychologists experienced in the emotion processing of children and adolescents. A dichotomous coding was selected, which was determined on the basis of five raters (ICC = .79), who rated the quality of the emotion regulation strategy in every alternative on a four-point Likert scale. Only items in which the best rated alternative was unambiguous were selected. The pilot test version of the ERIPS included 40 items, which met the established criteria: 12 items to measure intrapersonal emotion regulation and 28 items to measure interpersonal emotion regulation. In the instruction, participants were asked to choose the most appropriate alternative. The full item list cannot be disclosed here, because it is part of the mentioned admission test. Situational Test of Emotional Understanding (STEU) and Situational Test of Emotion Management (STEM). For assessing the convergent validity, the German version of two well-established SJTs for EI were used (STEM & STEU; MacCann & Roberts, 2008; German Versions: Hilger, Hellwig, & Schulze, 2012). The tests show acceptable convergent and discriminant validities (Austin, 2010; Libbrecht & Lievens, 2012) as well as psychometric properties (Allen, Weissman, Hellwig, MacCann, & Roberts, 2014). The STEU was scored dichotomously and the STEM by the original expert scoring weights. 64

Full Texts & Dissemination Dark Triad Dirty Dozen The dark triad was assessed with the German translation of the Dark Triad Dirty Dozen (DTDD; Jonason & Webster, 2010), which measures narcissism, machiavellianism and psychopathy. Every factor consists of four items, each rated on a seven-point Likert scale. The DTDD shows good discrimination and difficulty parameters (Webster & Jonason, 2013) as well as adequate validities. The growing body of literature examining the relationship between the dark triad and EI (Jauk, Freudenthaler, & Neubauer, 2016) convinced us to include the dark triad for validation purposes as well. In our sample, the reliability was acceptable (αnarcissism=.79, αmachiavellianism=.82, αpsychopathy=.71). Procedure Participants were tested group-wise in computer classes during school lessons. All tests were administered with the online survey software LimeSurvey (Version 2.05., www.limesurvey.org). The total test session took up to one and a half hours. For all Rasch model analyses, R (version 3.4.3.) with the Extended Rasch Modeling package (eRm; Mair & Hatzinger, 2007) and the Rasch Sampler package (Verhelst, Hatzinger & Mair, 2007) were used. Results Test Analysis Out of the initial pool of 40 items, we wanted to construct a shortened test with a high psychometric standard. In a first step, we removed five items owing to a low item difficulty parameters (pi