THE PREDICTIVE VALIDITY OF PERSON ... - OhioLINK ETD

2 downloads 3624 Views 1MB Size Report
Second, using the top 20 person matches produced the highest hit rates as ... I am indebted to the American Medical Association of Medical Colleges (AAMC).
THE PREDICTIVE VALIDITY OF PERSON MATCHING METHODS IN INTEREST MEASUREMENT

A dissertation submitted to the Kent State University College of Education, Health, and Human Services in partial fulfillment of the requirements for the degree of Doctor of Philosophy

by Stephanie T. Burns May 2012

© Copyright, 2012 by Stephanie T. Burns All Rights Reserved

ii

A dissertation written by Stephanie T. Burns B.F.A., University of Akron, 1994 M.Ed., Kent State University, 2007 Ph.D., Kent State University, 2012

Approved by

_________________________, Director, Doctoral Dissertation Committee Mark L. Savickas

_________________________, Member, Doctoral Dissertation Committee Jason M. McGlothlin

_________________________, Outside Member, Doctoral Dissertation Committee Cynthia W. Symons Accepted by _________________________, Director, School of Lifespan Development and Mary Dellmann-Jenkins Educational Sciences

_________________________, Dean, College and Graduate School of Daniel F. Mahony Education, Health, and Human Services

iii

BURNS, STEPHANIE T., Ph.D., May 2012

Counseling and Human Development Services

THE PREDICTIVE VALIDITY OF PERSON MATCHING METHODS IN INTEREST MEASUREMENT (300 pp.) Director of Dissertation: Mark L. Savickas, Ph.D.

This study determined and recommended empirically-based criteria for performing person matching as a psychometric scoring methodology to predict specialty selection for a profession. This study compared the predictive hit rates of standard scoring and three calculations of person matching based upon ex post facto data collected on 5,143 medical students who had taken an interest inventory and had entered their medical residency. The results suggested the following conclusions to the five hypotheses. First, the inclusion of all 150 raw item scores when person matching produced more accurate hit rates than when using the 18 scale scores. Person matching on the item level appears to be more accurate than person matching on the factor or scale level. Second, using the top 20 person matches produced the highest hit rates as compared to the top match and top 5 and 10 singular matches. Third, standard scoring outperformed person matching for the top match. However, the 150 items and the 30 items were able to outperform standard scoring when looking beyond the top match to offer medical students several medical specialties to research for further consideration. Fourth, gender differences were less pronounced for person matching than standard scoring. Fifth, the predictive hit rates were slightly higher when combining standard scoring and person matching psychometric scoring methodologies to person match based upon only 30 items on the inventory. In conclusion, person matching is worthy of

considerable research attention in interest inventories as the benefits to women and a quickly-changing, global workforce could be immense.

ACKNOWLEDGEMENTS The completion of a dissertation is achieved only through the many contributions of time and knowledge made by many individuals. I feel grateful to the mentorship provided to me by my advisor Dr. Mark Savickas. His enduring encouragement was the catalyst for pursing a Ph.D. and becoming a counselor educator. Thank you for your constant support and guidance to persevere with both occupational and personal challenges, which has allowed me to find the field that I love. I would like to thank Dr. Jason McGlothlin, Dr. Cynthia Symons, and Dr. Tracy Lara for their time, encouragement, and constructive feedback during this dissertation process. I am indebted to the American Medical Association of Medical Colleges (AAMC) for allowing me access to the data required for completion of this research. I would like to specifically thank Dr. George Richard, Director of AAMC’s Careers in Medicine program, for his helpfulness and support during this entire process. Dr. Erik Porfeli graciously gave of his time and talents to help me to understand how to perform the kappa coefficient top match calculation and confirmed standard scoring protocols. Many individuals supported me in navigating the intellectual and emotional landscape necessary to complete this dissertation. First and foremost, I would like to thank Dr. Daniel Cruikshanks for all of his time and energy spent providing me with empathic listening, support, and encouragement to persist in this process. In addition, I would like to thank my parents, family, and friends who were all very supportive and very understanding of the incredible demands required of me to finish this degree. Thank you for your contributions and patience during this time. iv

TABLE OF CONTENTS Page ACKNOWLEDGEMENTS .......................................................................................... iv LIST OF TABLES ........................................................................................................ xi CHAPTER I. INTRODUCTION AND LITERATURE REVIEW ..................................................1 History and Scoring Methodologies of Interest Inventories .......................................4 Importance of Interest Inventories ...........................................................................4 The Concept of Interest............................................................................................5 Value of Interest Inventory Usage ...........................................................................8 Sex Differences in Interest Inventories ..................................................................10 Three Major Psychometric Scoring Systems in Interest Inventories .....................12 Empirical scoring methodology ..........................................................................13 Theoretical scoring methodology .......................................................................16 Interest and personality ....................................................................................17 Prediction .........................................................................................................19 Conclusion .......................................................................................................20 Rational and person matching scoring methodologies .......................................21 Summary .............................................................................................................25 Specialty Selection Within a Profession ..................................................................27 Factors Influencing Medical Specialty Choice .....................................................32 Psychometric Scoring Methodologies of Medical Specialty Inventories .................37 Strong and Tucker .................................................................................................37 Holland ..................................................................................................................38 Zimny ....................................................................................................................38 Gough ....................................................................................................................40 Comparison of the MSPS and the MSPI ...............................................................41 The MSPI’s Current Role in Medical Specialty Selection ......................................42 Literature Review of Kuder’s Person Matching Model for Prediction.....................46 Research Problems ...................................................................................................49 Research Questions ..................................................................................................49 Hypotheses ...............................................................................................................52 Conclusion ...............................................................................................................52 II. METHODOLOGY ....................................................................................................54 Participants................................................................................................................54 Random Sample .....................................................................................................56 Measurers ..................................................................................................................58 Early Versions of the MSPI ...................................................................................58 MSPI-R ..................................................................................................................60 Data Collection .........................................................................................................64 v

Data Analysis Procedures .........................................................................................64 The 150 Items ........................................................................................................68 Recording singular hit rates for the top match ....................................................68 Recording singular hit rates for the top 5 matches .............................................69 Recording singular hit rates for the top 10 matches ...........................................69 Recording singular hit rates for the top 20 matches ...........................................70 Recording dominant hit rates for the top 5 matches ...........................................70 Recording dominant hit rates for the top 10 matches .........................................71 Recording dominant hit rates for the top 20 matches .........................................71 Recording cases where the medical specialty entered is dominant for a member of the stratified random sample across the top 1, 5, 10, and 20 matches ............................................................................................................72 Recording cases where the medical specialty entered for a member of the stratified random sample is different from the medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches .......................................72 Recording cases where there is a different medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches............................................73 The 18 Scales .........................................................................................................74 Recording singular hit rates for the top match ....................................................74 Recording singular hit rates for the top 5 matches .............................................75 Recording singular hit rates for the top 10 matches ...........................................75 Recording singular hit rates for the top 20 matches ...........................................76 Recording dominant hit rates for the top 5 matches ...........................................76 Recording dominant hit rates for the top 10 matches .........................................77 Recording dominant hit rates for the top 20 matches .........................................77 Recording cases where the medical specialty entered is dominant for a member of the stratified random sample across the top 1, 5, 10, and 20 matches ............................................................................................................78 Recording cases where the medical specialty entered for a member of the stratified random sample is different from the medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches .......................................78 Recording cases where there is a different medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches............................................79 The 30 Items ..........................................................................................................80 Recording singular hit rates for the top match ....................................................81 Recording singular hit rates for the top 5 matches .............................................81 Recording singular hit rates for the top 10 matches ...........................................81 Recording singular hit rates for the top 20 matches ...........................................82 Recording dominant hit rates for the top 5 matches ...........................................82 Recording dominant hit rates for the top 10 matches .........................................83 Recording dominant hit rates for the top 20 matches .........................................83 Recording cases where the medical specialty entered is dominant for a member of the stratified random sample across the top 1, 5, 10, and 20 matches ............................................................................................................84 vi

Recording cases where the medical specialty entered for a member of the stratified random sample is different from the medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches .......................................84 Recording cases where there is a different medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches............................................85 Standard Scoring Hit Rates ....................................................................................86 Standard Scoring Kappa Coefficients ....................................................................86 Kappa Coefficients for the Top Match ..................................................................87 Chance Expectancy Hit Rates ................................................................................88 Limitations and Delimitations...................................................................................89 Expected Results .......................................................................................................90 Chapter Summary .....................................................................................................90 III. RESULTS .................................................................................................................92 Descriptive Analyses ................................................................................................92 Comparison of Person Matching Singular Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender .....................................92 Comparison of Person Matching Singular Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender for the 12 Groups With Over 100 in the Sample..............................................................................97 Comparing table 1 with table 2 .......................................................................100 Comparison of Person Matching Singular Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender for the 10 Groups With Under 100 in the Sample..........................................................................105 Comparing table 1 with table 3 .......................................................................108 Comparison of Person Matching Dominant Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender ...................................114 Comparison of Person Matching Dominant Hit Rates Utilizing the 150 Items, the 18 Scales and the 30 Items Overall and by Gender for the 12 Groups With Over 100 in the Sample............................................................................119 Comparing table 4 with table 5 .......................................................................122 Comparison of Person Matching Dominant Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender for the 10 Groups With Under 100 in the Sample..........................................................................128 Comparing table 4 with table 6 .......................................................................131 Comparison of Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where The Medical Specialty Entered is the Only One Dominant Across the Top 1, 5, 10, and 20 .......................................................137 Comparison of Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Student Had Only One Specialty Dominant Across the Top 1, 5, 10, and 20, but Did Not Enter the Medical Specialty............................................................................................................139

vii

Comparison of Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Student Had a Different Specialty Dominant Across the Top 1, 5, 10, and 20 .......................................................142 Person Matching Means, Standard Deviations, and Top Match Scores for the 150 Items; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 ....................................................................................145 Overview .........................................................................................................145 Females ...........................................................................................................146 Males ...............................................................................................................147 Comparison of females and males .............................................................149 Top match predicted specialty entered .............................................................149 Top match did not predict specialty entered .....................................................150 Actual specialty was predicted as dominant across the top 1, 5, 10, and 20 ....152 Specialty predicted across the top 1, 5, 10, and 20 was not entered .................152 Different specialty predicted across the top 1, 5, 10, and 20 ............................152 Comparison ....................................................................................................153 Person Matching Means, Standard Deviations, and Top Match Scores for the 18 Scales; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 ........................................................................155 Overview ...........................................................................................................155 Females .............................................................................................................156 Males .................................................................................................................157 Comparison of females and males .................................................................159 Top match predicted specialty entered .............................................................159 Top match did not predict specialty entered .....................................................160 Actual specialty was predicted as dominant across the top 1, 5, 10, and 20 ....161 Specialty predicted across the top 1, 5, 10, and 20 was not entered .................162 Different specialty predicted across the top 1, 5, 10, and 20 ............................162 Comparison ....................................................................................................163 Person Matching Means, Standard Deviations, and Top Match Scores for the 30 Items; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 ....................................................................................165 Overview ...........................................................................................................165 Females .............................................................................................................166 Males .................................................................................................................168 Comparison of females and males .................................................................169 Top match predicted specialty entered .............................................................169 Top match did not predict specialty entered .....................................................170 Actual specialty was predicted as dominant across the top 1, 5, 10, and 20 ....172 Specialty predicted across the top 1, 5, 10, and 20 was not entered .................172 Different specialty predicted across the top 1, 5, 10, and 20 ............................172 viii

Comparison ....................................................................................................173 Inferential Analyses ................................................................................................175 Standard Scoring Hit Rates Including Kappa Coefficients; Overall and by Gender ...............................................................................................................175 Standard Scoring Hit Rates Including Kappa Coefficients; Overall and by Gender for the 12 Groups With Over 100 in the Sample .................................180 Standard Scoring Hit Rates Including Kappa Coefficients; Overall and by Gender for the 10 Groups With Under 100 in the Sample ...............................185 Kappa Coefficients for the Top Match ................................................................190 Chance Expectancy Hit Rates for the Top Match by Specialty Group for Standard Scoring and Person Matching Via the 150 Items, the 18 Scales, and the 30 Items ................................................................................................192 Descriptive and Inferential Statistics for the 22 Medical Specialties .....................195 Summary .................................................................................................................202 IV. DISCUSSION .........................................................................................................206 Conclusions .............................................................................................................207 Hypotheses ...........................................................................................................207 First hypothesis .................................................................................................207 Second hypothesis .............................................................................................208 Third hypothesis................................................................................................209 The 150 items.................................................................................................210 The 18 scales ..................................................................................................212 The 30 items...................................................................................................213 Summary ........................................................................................................215 Fourth hypothesis ..............................................................................................216 Fifth hypothesis .................................................................................................220 Singular hit rates ............................................................................................222 Dominant hit rates ..........................................................................................223 Standard scoring.............................................................................................224 Summary ........................................................................................................225 Additional Conclusions........................................................................................225 Dominance in person matching ........................................................................226 Significance in means, standard deviations, and top match scores ...................227 Observed agreement and chance expectancy rates ...........................................228 Individual Medical Specialties.............................................................................230 Interpretations .........................................................................................................231 General Comparisons ...........................................................................................232 Singular Hit Rates ................................................................................................235 Dominant Hit Rates..............................................................................................236 Means, Standard Deviations, and Top Match Scores ..........................................237 Individual Medical Specialties.............................................................................240 Implications.............................................................................................................240 Theory ..................................................................................................................241 ix

Research ...............................................................................................................244 Practice.................................................................................................................247 Limitations ..............................................................................................................249 Recommendations for Future Research ..................................................................251 Summary .................................................................................................................253 APPENDICES ...........................................................................................................256 APPENDIX A. IRB APPROVAL FOR PROTOCOL #10-382 .............................257 APPENDIX B. HIT RATES, MEANS, STANDARD DEVIATIONS, TOP MATCH SCORES, AND KAPPA COEFFICIENTS FOR THE 22 MEDICAL SPECIALTIES ..................................................................................259 REFERENCES ..........................................................................................................290

x

LIST OF TABLES Table

Page

1. Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender .......................................................................................96 2. Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender for the 12 Groups With Over 100 in the Sample........104 3. Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender for the 10 Groups With Under 100 in the Sample......113 4. Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10, or 20; Overall and by Gender ...............................................................................................118 5. Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10, or 20; Overall and by Gender for the 12 Groups With Over 100 in the Sample ..................127 6. Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10, or 20; Overall and by Gender for the 10 Groups With Under 100 in the Sample................136 7. Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is the Only One Dominant Across the Top 1, 5, 10 and 20; Overall and by Gender ..............................................................138 8. Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items With the Same Medical Specialty Dominant Across the Top 1, 5, 10, and 20, but the Individual Did Not Go Into That Medical Specialty; Overall and by Gender .....141 9. Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where There Is a Different Medical Specialty Dominant Across the Top 1, 5, 10, and 20; Overall and by Gender ..................................................................................144 10. Person Matching Means, Standard Deviations, and Top Match Scores for the 150 Items; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 .............................................................................................154 xi

11. Person Matching Means, Standard Deviations, and Top Match Scores for the 18 Scales; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 .................................................................................164 12. Person Matching Means, Standard Deviations, and Top Match Scores for the 30 Items; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 .................................................................................174 13. Standard Scoring Hit Rates for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender ...............................................................................................179 14. Standard Scoring Hit Rates for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender for the 12 Groups With Over 100 in the Sample..................184 15. Standard Scoring Hit Rates for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender for the 10 Groups With Under 100 in the Sample................189 16. Kappa Coefficients for the Top Match ......................................................................191 17. Chance Expectancy Hit Rates for Top Match by Specialty Group for Standard Scoring and Person Matching Via the 150 Items, the 18 Scales, and the 30 Items ...194

xii

CHAPTER I INTRODUCTION AND LITERATURE REVIEW Since its inception with Frank Parsons in 1908, members of the vocational guidance (now called career counseling) field have aspired to use theory-driven, evidenced-based interventions to help individuals select an occupation. Although a very appropriate developmental struggle, career indecision affects a client’s life profoundly as long periods of preparation and instruction are undertaken before an individual becomes employed in an occupation (Creed, Patton, & Prideaux, 2006). Career indecision has been suggested to resolve with access to information and clarification of interests (Creed et al., 2006; Ihle-Helledy, Zytowski, & Fouad, 2004). As such, researchers are challenged with improving the theory and science behind interest inventories currently used to predict an individual’s career interest or specialty preference within an occupation. Psychometrically sound interest inventories can save time, money, and frustration for individuals and institutions. The present research investigated a new psychometric scoring methodology to improve the predictive validity of career interest inventories. This was accomplished through the examination of predictive hit rates achieved between two psychometric scoring methodologies: the empirical method of Strong (developed in 1927) and the person matching method of Kuder (developed in 1977). Unlike Strong’s interest inventories, which were comprised of occupational titles with the final score comparing individuals to occupational groups, Kuder developed interest inventories made up of occupational activities with scores being matched to individuals from the reference group 1

2

database. Through the elimination of occupational groups, person matching could eliminate the sex bias favoring men currently found in standard psychometric scoring methodologies. To achieve these goals, Kuder created a pool of people who were enthusiastic about their work and then compared the test-taker’s scores to the reference group to offer the test-taker biographies (including current occupations, past occupations, lifestyles, and descriptions of what they like best and least about their occupation) of the 20 closest matches from the reference group (Kuder, 1977a). With person matching, the scoring report provides more extensive and detailed narrative information to consider themes when making career decisions as compared to standard psychometric scoring methodologies which provide lists of occupations (Kuder, 1977b). Kuder called this psychometric scoring methodology person matching. In order to make the comparison between the two psychometric scoring methodologies, the career interest inventory that would be selected needed to include a large sample with item scores, longitudinal data, and demographic data. While any profession could have been utilized to empirically examine the benefits and weaknesses of the person matching psychometric scoring methodology, medicine was used because of the many specialty choice problems that arise for medical students. Specifically, the present research investigated predictive hit rates for the Medical Specialty Preference Inventory-Revised (MSPI-R) to determine if they would increase by changing its current modernist scoring system, based upon the occupational groups of Strong, to a post-modernist scoring system, based upon Kuder’s person matching model. In addition to potentially increasing predictive hit rates for the MSPI-R, person matching could also

3

deliver results for every medical specialty instead of only the 16 offered by the standard scoring methodology used today. This would allow medical students to have the possibility to be person matched to over 100 different medical specialties to help foster medical specialty exploration, which has been called for in the literature. Ultimately, this study examines the benefits and weaknesses of utilizing person matching as a psychometric scoring methodology to predict specialty selection for a profession and is only the second attempt in the literature to use person matching as a psychometric scoring methodology. Further, this study was unique in how it used the scores of test-takers with no experience in the field who took an interest inventory. The empirical method of Strong (as is currently used with the MSPI-R) compares the scores of a test-taker with no experience in the field to the scores of professionals already engaged in the occupation. By using the professionals’ scores to develop occupational scales for the MSPI-R, the student with no experience in the field can be compared to professionals already engaged in the occupation. This method directly compares those with no experience in the field to those practicing in a specialty. Using Kuder’s person matching method, the scores of students with no experience in the field who took the MSPI-R formed the criterion pool after enough time had passed for them to enter into their medical residency. In this study, Kuder’s person matching method compared the scores of test-takers with no experience in the field to the scores of those with no experience in the field who later went on to practice in a specific specialty.

4

History and Scoring Methodologies of Interest Inventories Importance of Interest Inventories Before a discussion can begin on the scoring methodologies used in interest inventories, there must first be an examination of the history of interest inventories and a definition of what constitutes interests. The birth of the profession of vocational guidance has been credited to Frank Parsons in 1908 (Zytowski, 2001). At that time, the United States was making a rapid transformation from an agricultural to an industrialized economy (Zytowski, 2001). As such, schools were being created to sustain occupations in the new industrialized economy. After speaking to public groups about the need to assist youth in making choices among occupations, Parsons was asked to provide individual vocational guidance sessions (Zytowski, 2001). Parsons believed that vocational guidance would aid in choosing a vocation, preparing for the vocation, finding a job in the field, and being successful in that chosen field (Zytowski, 2001). Parsons’ vision of vocational guidance involved assisting individuals to make satisfying career choices by applying true reasoning to match self-knowledge to information about the world of work so that individuals could be fully empowered within their occupational and social roles (Hartung & Blustein, 2002; O'Brien, 2001). Parsons assessed interest by interviewing his clients. Parsons’ work formed the first formal model of career decision making (the matching model of person-environment fit), which has led to current models of career theory, assessment, and development (Hartung & Blustein, 2002; O'Brien, 2001).

5

The Concept of Interest The study of interests led to the development of inventories to assist with career decision making. Currently, there are two ways to formulate how interests are constructed by an individual. The first conceptualization of interest comes from a modernist approach, which originated over 300 years ago and dominated the foundation of industrialized nations (MacDonald, 2008). Modernism matches the individual’s interests with specific occupational roles through prediction (Bujold, 2004) based upon the accumulation of information about the self and the world, which are both viewed as static and separate concepts that can be objectively measured (Brott, 2001; Cohen, Duberley, & Mallon, 2004). Using a modernist philosophy, predicting occupational choice can be performed because objective truth can be obtained through the fixed, correct answers to questions on interest inventories, which can then be matched to the world of work (Bujold, 2004; Cohen et al., 2004). Through these tenets of modernism, individuals looking to overcome career indecision are described as “patients” who take on a passive role by waiting for answers to be provided to them by assessments, others, or the environment so that they can make an occupational selection (Brott, 2004; Bujold, 2004). The modernist approach developed several common understandings of interests and how they relate to career decisions. Strong defined vocational interests as occurring when an individual automatically or effortlessly placed their attention on an activity that was subjectively judged as being beneficial to the future self and felt pleasurable (Savickas, 1999; Strong, 1943). Interests result from needs, values, motivations, and goals and are attempts to express the

6

individual’s personality and self-concept (Savickas, 1999; Strong, 1943). An individual will organize their interests into a hierarchy of interests to determine which of many possible selves to actualize (Savickas, 1999). Interests are an adaptive effort to use the environment to satisfy needs and fulfill values to integrate the self successfully into society (Savickas, 1999; Strong, 1943). Individuals have enduring interest in activities where they know that they can be successful and anticipate positive results (Savickas, 1999). This leads to persistence in pursuing activities of interest even in the face of obstacles and avoidance of activities deemed unimportant or unpleasant (Strong, 1943). As such, occupational interests are often defined as patterns of subjective likes, dislikes, and indifferences regarding occupational preferences (Savickas, 1999). It is therefore more important to focus on the function that the interest serves rather than on the interest itself (Savickas, 1999). Along these lines, occupational interests are not the outcome from experience or training in the occupation and are present to a large degree prior to entrance into the occupation (Strong, 1943). Research has demonstrated that 10 years experience in an occupation does not have any effect upon interest scores in that same occupation (Strong, 1943). In fact, occupational interest patterns are well-established in many individuals by 15 years of age (Strong, 1943; Su, Rounds, & Armstrong, 2009) and become more stable as they grow older (Kuder, 1977a; Su et al., 2009). As such, interest profiles of students and adults in the same occupation are quite similar (Strong, 1943). These similarities in preferences between a person and an occupational group were found to extend beyond the occupation and into lifestyle activities as well (Donnay, 1997). This led to the belief

7

that a profession may represent a kind of lifestyle (Donnay, 1997). Therefore, an occupational group’s characteristic interests are presumably a factor in the selection of an occupation and provide a working environment where those interests may be satisfied (Savickas, 1999). This allows the individual to identify with and derive satisfaction from occupational group membership (Savickas, 1999; Strong, 1943). While abilities and interests contribute to ultimate success in an occupation, there is no relationship between aptitudes and interests (Strong, 1943). Therefore, interests result not from actual abilities, but from the individual’s perception of their abilities (Savickas, 1999). Interest and a feeling of self-efficacy are necessary before an individual will pursue an occupation (Campbell & Borgen, 1999; Donnay, 1997). Interest, as a result, is an indicator of satisfaction in an occupation, but not necessarily of success (Strong, 1943). Success is judged by others, but satisfaction is the individual’s estimate of their performance, which often involves a desire for social approval (Strong, 1943). As such, occupational interests are not primarily associated with intelligence (Strong, 1943). The second conceptualization of interest is based upon a postmodernist approach, which was developed in the middle of the 20th century to challenge the tenets of modernism (MacDonald, 2008). In this paradigm, it is believed that individuals make meaning through social experiences (Brott, 2001; Cohen et al., 2004). In addition, postmodernism assumes that (a) multiple realities and truths exist, (b) each person’s life is a story being written and not a set of traits, (c) the context is needed to understand individuals and their behaviors, (d) individuals constantly change, (e) individuals make

8

meaning out of daily activities, and (f) reality is something that individuals’ co-construct with their environments (Brott, 2001; Peavy, 1995). Therefore, an individual’s interests would not be the result of internal processes as believed in modernist thinking, but would instead be the result of social and environmental systems (Brott, 2001; Cohen et al., 2004). Further, assessments and career counseling would not offer the individual a specific occupation to pursue, but would rather have the individual constructing their reality by taking action to make their preferred future a reality (Brott, 2004; Bujold, 2004). This philosophy shifts the modernist approach of using assessments to help individuals match their interests to occupations to the postmodernist approach of using assessments to help individuals define their preferred self and preferred work environment to construct their preferred future (Brott, 2004; Cohen et al., 2004). As will be discussed later in this chapter, the work of Frederic Kuder would change the goals of interest assessments from the modernist tradition of the report offering a list of occupations to the postmodernist tradition of offering the individual lifestyle narratives describing the lives of individuals who scored the interest assessment in a similar manner. This change in interest assessments from a listing of occupations to helping individuals come into contact with narrative data that will help them construct their preferred future is the basis of the research of this dissertation. Next, the value of interest inventory usage will be documented. Value of Interest Inventory Usage Most often, an individual’s interest in specific occupations is based upon misunderstandings, misconceptions, and glamorized stereotypes (Kuder, 1977a). These

9

stereotypes and assumptions must be replaced by carefully considered measures of an individual’s abilities and interests (Strong, 1943) to yield useful, unbiased occupational suggestions (Kuder, 1977a). Interest inventories have been the primary means of addressing the question of occupational interest for over 80 years (Donnay, 1997; Savickas, 1999). Interest inventory items stimulate feelings of like and dislike when engaging in work and results in measuring the degree of fit between an individual’s interest patterns and the interest patterns of occupational groups (Donnay, 1997; Savickas, 1999; Strong, 1943). Successful interest inventories help the individual to make better occupational decisions than the individual would have made alone and narrows down the exploration of occupations (Strong, 1943). The value of an interest inventory lies in its ability to predict satisfying occupations so that an individual’s interests are in harmony with the interests of adults found in that occupation (Donnay, 1997; Strong, 1943). Prediction is the assignment of an individual to an occupation based upon an individual’s interest inventory scores (Strong, 1943). A young person who enters an occupation consistent with his or her interests is more likely to be satisfied than a person who does not enter into such an occupation (Kuder, 1977a). Interest inventories have been based on empirical approaches to assess the occupational preferences of individuals (Savickas, 1999). Because of the rigor involved in interest inventory development, there are two measures of the validity of an interest inventory (Strong, 1943). First, the validity of an interest inventory rests on its ability to

10

differentiate between specific occupational groups (Kuder, 1977a; Strong, 1943). Second, the interest inventory needs to assign individuals to membership in one or more occupational groups based upon their interest inventory scores (Strong, 1943). Research for over 60 years has shown that entry into occupations can be predicted from scores on interest inventories at a rate better than chance (Donnay, 1997). Follow-up studies ranging from 3 to 18 years suggest predictive hit rates for interest inventories can be obtained between 32% and 69%, which is well above the chance expectancy rate (Donnay, 1997). Ultimately, interests are permanent enough (0.75) and sufficiently unaffected by vocational training and experience to furnish a basis for the prediction of future behavior (Strong, 1943). Sex Differences in Interest Inventories While research has demonstrated that sex differences are minimal to absent for most psychological variables, interest inventories are an exception to this rule (Su et al., 2009). As interest is influenced greatly by parents’ expectations, societal values, culture, and the child’s exposure to permissible activities, it comes as no surprise that sex differences play an important role in interest inventory development and occupational selection (Su et al., 2009). Strong (1943) noted that sex differences in interest are apparent by 15 years of age and are never unlearned. In general, men prefer to work with things over people and women prefer to work with people over things (Strong, 1943). Strong’s (1943) research suggested that women’s interests appear to be universalized and the heterogeneity in female interests makes it harder to specify occupational interest patterns for women (Strong, 1943). In addition, the duties of the average female worker

11

are not the same as those of the average male worker in the same occupation (Kuder, 1977a; Strong, 1943). A meta-analysis by Su, Rounds, and Armstrong (2009) suggests that sex differences in interest inventories are still a concern as females still prefer social, artistic, and conventional interest categories (aligning with an interest in people) and males still prefer realistic and investigative categories (aligning with an interest in things) (Su et al., 2009). These differences remain consistent despite changes in age and life-span development (Su et al., 2009). Ultimately, sex differences in interest inventories explain trends in females refraining from occupations in science, technology, engineering, and mathematics (Su et al., 2009). Further creating a gap, interest inventories may be initially designed for testing men (with the later inclusion of women into the already developed instrument), and often occupational scales do not include women because of few women working in an occupation (Kuder, 1977a; Strong, 1943). Thus, research has suggested that for interest inventories to offer meaningful results to both sexes there must be women’s scales for women and men’s scales for men as combined sex scales are less valid predictors of occupational choice than single-sex scales (Donnay, 1997; Kuder, 1977a; Strong, 1943). Further, studies of interest inventories have revealed that each sex resembles others of the same sex in a different occupation more than members of the opposite sex in the same occupation (Kuder, 1977a). As such, there has remained a considerable sex bias in interest inventory development and scoring.

12

Current research has considered possibilities for reducing sex differences, which can be detected at the item and scale levels (Su et al., 2009). Research exploring sex balancing in interest inventories suggests incorporating only those items in interest inventories that have little to no disparity in response rates between the sexes. However, this approach has led to interest inventories with reduced construct validity rates, and hence a still unsettled debate about society’s needs versus construct validity in the development of interest inventories (Su et al., 2009). Three Major Psychometric Scoring Systems in Interest Inventories Parsons’ concern about career decision making became the impetus for many interest inventories to be developed during the 20th century. James Burt Minor was the first to create a rationally-based interest inventory in 1915; however, the instrument was never scored and was instead used with a counselor to promote the discussion of interests in the counseling session (Strong, 1943; Walsh & Savickas, 2005). Since then, three psychometric traditions have dominated the field of interest inventory construction (Donnay, 1997). Strong’s empirically-based group criterion related measurement of interest, Holland’s theoretically-based prototype criterion related measurement of interest, and Kuder’s rationally-based measurement of interest serve as the foundation of interest measurement (Donnay, 1997). Strong’s approach uses likes and dislikes to discriminate among occupational groups (Donnay, 1997). Holland’s approach uses likes and dislikes to match individuals to prototypes of six vocational personalities. Lastly, Kuder’s approach directly measures homogeneous dimensions of interest (Donnay, 1997). The following discussion on the psychometric scoring methodologies of interest

13

inventories will focus on the work of Strong’s empirical method, Holland’s theoretical method, and Kuder’s rational and person matching methods. Empirical scoring methodology. Strong’s work in the field is so extensive (for example, The Strong Interest Inventory has the longest history of any currently used psychological test) that it is hard to find any interest inventory that does not have some background in Strong’s work (Donnay, 1997). Strong began his development of interest inventories while working with the United States Military in World War I when he set out to find a method to identify experienced carpenters and tradespeople from inexperienced carpenters and tradespeople (Strong, 1943). By 1927, he had published his first inventory, the Strong Vocational Interest Blank (SVIB) (Strong, 1943). Strong discovered empirically that occupational groups produced an average pattern of responses on an interest inventory, which could then differentiate between the average member of that occupational group and the average individual in general (Crites, 1969; Strong, 1943). This became the basis for Strong’s underlying philosophy that interests common to everyone were of little significance (Ihle-Helledy et al., 2004; Strong, 1943). Strong’s occupational scales eliminated similarities in interests and only used the differences in interest to set members of an occupation apart from general interest patterns (Campbell & Borgen, 1999; Case & Blackwell, 2008; Strong, 1943). This led to his development of occupational scales based upon contrasted groups that were derived from a primitive form of discriminate function analysis (Donnay, 1997). He believed that creating occupational scales for each occupation were invaluable

14

because interest in certain kinds of activities linked the test-taker to real-world behavior in that occupation (Campbell & Borgen, 1999). Strong collected data from satisfied members of an occupation who were acceptably performing the work to establish occupational scoring scales and norms (Campbell & Borgen, 1999). His research suggested that a large reference group was better for the development of sound occupational scales (Strong, 1943). Strong looked at interest inventory scores from 250 to 500 criterion group members made up of individuals from a specific occupation and then contrasted those scores with the scores of nearly 5,000 men-in-general and women-in-general by weighting the items in terms of importance to the occupational group (Campbell & Borgen, 1999; Kuder, 1977a; Strong, 1943). By weighting an individual’s rating of activities as liked or disliked on the interest inventory in terms of how it resembles the answers from an occupational group, the final score on the interest inventory could classify an individual as a member of this or that occupational group by factoring out common interests (Campbell & Borgen, 1999; Kuder, 1977a; Strong, 1943). For example, if 180 of the criterion group members of an occupation marked that they liked the item on the inventory, 75 marked that they were indifferent to the item on the inventory, and 45 marked that they disliked the item on the inventory, then the interest inventory item for that occupational scale would be scored 180 for like, 75 for indifferent, and 45 for dislike (Strong, 1943). The more homogeneous an occupational group, the easier it is to differentiate them from other occupational groups, which makes it more reliable and valid when

15

developing occupational scales (Kuder, 1977a; Kuder, 1977b; Strong, 1943). This occurs because the greater degree of specialization in an occupation, the more homogeneous the group, which allows for a greater degree of confidence in assessing the similarity of interests in an area of specialization within an occupation (Kuder, 1977b). For example, the interests of clinical psychologists are more homogeneous than those of psychologists in general (Kuder, 1977b). In addition, social work, medicine, engineering, and the skilled trades also find their respective specialties more homogeneous than the occupation in general (Kuder, 1977b; Strong, 1943). However, when the occupational group is heterogeneous, it closely resembles the general reference group and results in a lack of good discrimination for that occupational scale (Kuder, 1977a). The profile of the interest inventory taker in Strong’s interest inventories is compared with each occupational scale on the inventory to see which occupation fits best (Strong, 1943). The higher the score on any occupational interest scale, the more the individual’s interests resemble the interests of individuals in that occupation as contrasted to individuals in general (Strong, 1943). The final score informed the individual of the occupational group that they most resembled based upon how their responses matched the pattern of distinctive responses found within each occupational group (Ihle-Helledy et al., 2004) to suggest if the candidate possessed those characteristics which were essential for satisfaction in those occupations (Kuder, 1977a; Strong, 1943). Hence, Strong’s work was an empirical method that used inventory items that were heterogeneous and based upon occupational titles.

16

While Strong’s work made great strides in empirically solving the problem of interest measurement, Guilford (1952) sharply criticized Strong for using factor analysis to generate occupational scales for interest inventories where the same items are scored for more than one scale. Guilford believed that the results of the factor analysis of ipsative scores were likely to be misleading because the correlation between any two scales is determined largely by the scoring system and could be regarded as an artifact (Kuder, 1977a). Even Strong noted that if different individuals made up the men-in-general and the women-in-general reference groups that correlations changed dramatically in his occupational scales (Kuder, 1977a). Strong’s empirical work improved the quality of interest inventories and would influence both Holland’s and Kuder’s development of interest inventories. Theoretical scoring methodology. Holland believed that interest inventories based upon Strong’s occupational scales were difficult to interpret because the scores did not indicate interests, but rather indicated resemblances to occupational groups, and were based upon only a handful of occupational scales (Nauta, 2010). Additionally, he was frustrated that there was no simple way to link the client’s interest inventory scores to their work environment (Nauta, 2010). He believed that most psychological tests were ambiguous, required that the individual be dependent upon the counselor to score the test, and required the client to wait for the results (Holland, 1971; Nauta, 2010). Holland wanted to develop inexpensive and practical interest inventories that had a high degree of scientific validity and client effectiveness (Holland, 1971). He succeeded in his goal by developing simple, inexpensive, self-help interest inventories that could be scored

17

immediately (Holland, 1961) to provide career guidance inexpensively and with minimal to no counselor contact (Gottfredson & Johnstun, 2009). Holland used his clinical experience with vocational clients, his desire to construct a personality inventory from interest materials, scoring keys for the SVIB, and a factor analysis of existing instruments to create six scales that grouped occupations together by a personality prototype (Holland, 1966b; Nauta, 2010). This was the first time that an interest inventory was created out of a theory instead of the accumulation of data (Nauta, 2010). Holland ultimately used a vector of resemblance to six personality prototypes to create a typical example of a worker in the environment of an occupation. While Strong’s scales indicated resemblances to occupational groups, Holland’s scales indicated resemblance to six theoretical personality types in order of most to least resemblance. Hence, Holland’s work was a theoretical method based upon inventory items that were homogeneous and included only occupational titles. Interest and personality. As a classification interviewer in the Army, Holland noted that people tended to exemplify one of only a few vocational personality types and came to the conclusion that vocational interest was an aspect of the personality (Holland, 1966b; Nauta, 2010). Further, he surmised that interest inventories were personality inventories because personality inventories reveal how the individual perceives the self and their environment (Holland, 1958). Holland believed that vocational choice was an expression of an individual’s motivation, knowledge, personality, and ability (Holland, 1958; Holland, 1966b). Working in an occupation meant having a certain status, community role, and pattern of living (Holland, 1958). As such Holland, stated that

18

satisfaction, stability, and success in an occupation depended upon congruence between the individual’s personality type and the personality of the occupational environment (Furnham, 2001; Holland, 1966b). In 1959, Holland proposed an occupational classification system comprised of six categories: Realistic, Intellectual, Artistic, Social, Enterprising, and Conventional (Holland, Whitney, Cole, & Richards, 1969). From this, he developed a subtle coding system to describe a typical prototype for both persons and environments based upon initials from the six scales (Holland, 1961; Holland et al., 1969). A Holland code (which is created using the initials of the highest three scoring scales in order of most to least resemblance) can be generated on the basis of the scores from an interest inventory (Nauta, 2010). This coding scheme allows for 720 different variations of personality (Holland, 1966b). To empirically validate the six categories, a factor analysis was performed in 1968, which suggested that each of the six scales measured something different from the other six scales and that there are at a minimum six different kinds of people (Holland et al., 1969). All six of his scales were statistically significant for men, yet women’s data yielded only four scales: Intellectual, Artistic, Social, and Conventional (Holland et al., 1969). In addition to classifying individuals, Holland characterized work environments by using a census of its inhabitants (Gottfredson & Johnstun, 2009). Holland classified all occupations into a code based upon the empirical procedure of averaging scores of workers in an occupation based upon on his six personality scales (Holland, 1966a). Holland’s ability to assess both individuals and work environments via stereotypical

19

examples (or prototypes) offers a parallel way to link the two together and increases the effectiveness of career interventions (Holland, 1966b; Nauta, 2010). By understanding the code of the individual in relation to the code of the working environment, predictions could be made on the outcome of the pairing (Holland, 1966b). Holland’s theory thus served to provide a single vocabulary to describe people and occupations. Prediction. Historically, interest inventories were viewed as the only way to predict what would happen in an individual’s vocational future (Holland, 1967). To empirically validate that assumption, Holland examined the predictive validity of students’ expressed choice for a vocation compared to the predictive validity of scores on his Vocational Preference Inventory (VPI) (Holland, 1967). The results suggested that counselors should make greater use of a person’s expressed vocational choice as the VPI performed with a lower predictive hit rate (Holland, 1967). Additionally, the Strong Vocational Interest Blank (SVIB) was not as efficient as expressed choice given that predictive hit rates equaled 28.2% for the SVIB and 56.3% for a student’s expressed choice (Holland, 1967). From his research findings, Holland suggested that interest inventories would not be able to surpass the expressed choice predictive hit rate of 56.3% (Holland, 1967). Holland’s research proposed that future vocational choices are most accurately obtained from a person’s stated occupational choice, current occupation, or work history rather than from any interest inventory (Holland, 1971). Holland recommended giving an interest inventory only to students who were undecided or to help individuals resolve conflicts about vocational choices (Holland, 1967).

20

Conclusion. Holland’s classification system pervades career counseling research and practice today (Nauta, 2010). Holland grounded his theory in research and spent much of his time testing his theory and his interest inventories (Nauta, 2010). Holland’s classification system helps when conducting research because it frames the research in a way that simplifies the data and makes the research more understandable, while at the same time retaining complexity (Campbell & Borgen, 1999). Extensive research has supported many aspects of Holland’s theory; however, a meta-analysis in 2002 by Larson, Rottinghaus, and Borgen suggests that most vocational interests are distinct from personality. At the same time, the meta-analysis also suggested that incorporating both personality and vocational interests into interest inventories allowed for a more comprehensive and defined picture of the client. One major criticism of Holland’s theory is that is it is entirely based upon studies of men and is less useful for understanding the vocational behavior of women (Holland, 1966b). Holland called for research to create a vocational decision making theory for women as he had done for men (Holland, 1966b). A third criticism of Holland’s theory notes that it is unable to differentiate between jobs categorized within the same three letter Holland code, which results in the inability of his prototypal coding system to distinguish between specialties within an occupation (Furnham, 2001). This is especially problematic when individuals are interested in specializing in a field of work, such as medicine, business, law, or engineering. Both Strong’s and Holland’s work in interest inventories would inspire and provoke Kuder’s work in the psychometrics of interest inventories.

21

Rational and person matching scoring methodologies. Unlike Strong and Holland, who remained focused on a specific psychometric scoring methodology during their careers, Kuder’s contributions over the course of 70 years to interest inventory psychometric development would change significantly. Kuder’s work was based upon his observation that individuals rarely stayed in the same job role their entire lives and rarely had the same tasks or the same career path even if the individual stayed in the same occupation (Kuder, 1977a). Further, he noted that some occupations, such as clerks or tradesmen, had such a wide range of potential job activities that it became difficult to create reliable and valid occupational scales for those occupational groups (Kuder, 1977b). Strong himself even noted difficulties in creating occupational scales for many occupations (Strong, 1943). Kuder rejected Holland’s premise of using occupational titles in interest inventories, stating that the purpose of an interest inventory was to give the person an occupational title (Kuder, 1977a). As such, he believed that occupational titles should not be used as items on interest inventories because it makes them susceptible to faking and reflecting the false assumptions/stereotypes a test-taker has about occupations (Kuder, 1977a; Strong, 1943). Kuder instead called for interest inventories to focus on the specific tasks of a job to determine interest and confidence in the activities of the job itself (Betz & Rottinghaus, 2006; Zytowski, 1992). As such, items on his interest inventories are well distributed throughout the world of work, based upon activities that are generally well understood, written clearly, and free of bias (Kuder, 1977a).

22

Kuder also noted that each occupation included individuals from all personality types and therefore saw little value in matching occupations to individuals based upon personality (Kuder, 1977a). Ultimately, Kuder’s work would improve upon both of the interest inventory scoring systems of Strong’s occupational groups and Holland’s ideal personality prototypes (Zytowski, 1992). Initially, Kuder challenged Strong’s methods by proving that the fundamental validity of an interest inventory could be substantially improved by eliminating the use of men-in-general and women-in-general reference groups when developing occupational scales (Kuder, 1977a). His method measured similarity with both the general and the unique interests of an occupational group at the same time (Donnay, 1997; Ihle-Helledy et al., 2004). This was accomplished by using the lambda coefficient of correlation to deal with the problem of homogeneity in interest inventories. The lambda coefficient expresses the correlation between all of the test-taker’s responses on an interest inventory to all of the responses of the members of an occupational group without imposing the requirement of a normal distribution (Kuder, 1977a). Research suggested that Kuder’s use of the lambda coefficient combined with the scoring of all the items on the interest inventory improved predictive accuracy over Strong’s general reference group method. When the two were compared, Strong’s method produced 47% more errors than Kuder’s method. (Kuder, 1977a). In addition, research suggested that the lambda coefficient was better at discriminating similar occupations from each other, which had previously been an issue with Strong’s general reference group method (Kuder, 1977a). Further, the lambda coefficient offered the advantage of rank-ordering interests and occupations,

23

which helped to solve the problem of indexing a person’s similarity to a criterion group (Donnay, 1997). During this time, Kuder’s work was a rational method based upon inventory items that were homogeneous and based upon occupational activities. This focus would change in the 1970’s. In the 1970’s, Kuder began challenging Holland’s theory of categorizing people into stereotypical prototypes. Kuder believed that people were not just looking for a job, but were instead looking for a career, which was a highly individualized matter (Kuder, 1977a). As no two people ever performed exactly the same activities in a job, and no two careers were exactly alike, Kuder believed that there was a potential to better match people to jobs if the test-taker was matched to other people who were enthusiastic about their work and scored the interest inventory in the same way (Kuder, 1977a). Kuder called this technique person matching, and it ended two dilemmas in interest inventory construction. First, it eliminated the problem of creating occupational scales based upon a small group of individuals from an occupation, which was the basis of Strong’s work (Kuder, 1977a). Second, it ended trying to make people fit a stereotypic mold, which was the basis of Holland’s work (Kuder, 1977a). Unlike the interest inventories of Strong (where the final score compared the individual to normed occupational groups) and Holland (where the final score would offer a personality code to be matched to occupational environments), Kuder wanted clients to take an interest inventory and then be matched to the narrative biography of individuals who had answered the interest inventory in a similar manner.

24

To perform person matching, a large criterion pool would need to be generated. Kuder estimated 5,000 cases to be sufficient for the criterion pool for an interest inventory utilizing person matching as a psychometric scoring methodology (Kuder, 1977b). Kuder created a criterion pool of people who were enthusiastic about their work from every occupation, which could not be accomplished by Strong’s use of occupational group scales. Next, he compared the test-takers’ scores to the reference group to offer the test-taker biographies (including current occupations, past occupations, lifestyles, and descriptions of what they like best and least about their occupation) of the closest 20 reference group members (Kuder, 1977a). In this way, the scoring report provided the test-taker with more extensive and detailed information than what was received from occupational scale scores and three-digit personality codes (Kuder, 1977b). Person matching offered the test-taker the ability to consider career themes found within the narratives to improve career decision making. Further, person matching overcomes the problem of sex bias in norms or prototypes since the criterion pool includes members of both sexes in a wide variety of occupations representing a great diversity of backgrounds (Kuder, 1977b). As such, person matching may be one way to overcome sex bias found in the psychometric scoring methodologies of Strong and Holland (Kuder, 1977a). Additionally, new occupations can become part of the interest inventory as soon as the data from a few people in the occupation are collected without waiting for a large number of cases from which to build a new occupational scale or measure the stereotypic personality type of the individuals who comprise the new occupation (Kuder, 1977b). Moreover, person matching does not

25

assume stable occupations (as is the case in Strong’s empirical method and Holland’s theoretical method), which is important in a global economy demanding flexibility and evolution in the career paths of individuals due to more outsourcing and contractual work. By the end of his career, Kuder’s focus was a person matching model based upon inventory items that were homogeneous and based upon occupational activities. Kuder’s use of homogeneous item pools, lambda coefficient scoring, and person matching have not been given enough attention, especially with the technology available today to perform complex person-to-person matching (Donnay, 1997). Currently, Kuder’s person matching model is the least used of the three psychometric scoring methodologies for interest inventories. Additionally, Kuder’s person matching model may be one way to improve upon the validity of interest inventories making specialty predictions. Summary. No matter the psychometric source, interest inventories have become popular tools career counselors use to help clients overcome career indecision and increase job satisfaction (Ihle-Helledy et al., 2004; Savickas, Taber, & Spokane, 2002). Ultimately, interest inventories provide clients with information about their interests and then match those interests to specific occupations (Borges, 2007; Ihle-Helledy et al., 2004). In this way, interest inventories offer a projective quality to career counseling interventions by allowing clients to explore career options and test out various possible futures in a way that is less likely to activate a client’s defense mechanisms (Ihle-Helledy et al., 2004; Mc Guire, 1961).

26

Increasingly, there is growing support for integrating traditional psychometric quantitative approaches (which can be found in the work of Strong and Holland) with qualitative data (which can be found through Kuder’s person matching model) into interest inventory psychometrics to assess multiple aspects of individuals and contexts to improve the career decision making process (Hartung, 2005). However, person matching has little to no empirical evidence of effectiveness and has not been studied and evaluated for validity, which may be accounted for because of (a) the large criterion pool required to perform person matching successfully and (b) the longitudinal data needed to document the criterion pool members’ final occupational choice for the match to result in an occupational prediction. Fortunately, there is an existing pool of data (based upon the occupational specialties found within the career of medicine) that includes a large criterion pool of students’ scores on an interest inventory. Additionally, these students are also linked via longitudinal tracking to their chosen occupation to allow for person matching to be successfully studied as a psychometric scoring methodology. The medical career data provided a context in which all the parameters were met to successfully study person matching as a psychometric scoring methodology for interest inventories. To help readers understand the complexities of the occupational specialties found within the career of medicine, a discussion of the factors involved in medical specialty selection and the history of interest inventories utilized to help medical students select their medical specialty will follow.

27

Specialty Selection Within a Profession Many individuals are familiar with using interest inventories to select a profession. However, many professions have profoundly different specialties offering divergent occupational roles within the profession, which requires yet another level of career decision making (Rottinghaus, 2009; Sodano & Richard, 2009b; Stratton, Witzke, Elam, & Cheever, 2005). While any profession could have been utilized to empirically examine the benefits and weaknesses of the person matching psychometric scoring methodology, medicine was used because of the many specialty choice problems that arise for medical students. Therefore, the issue of medical specialty selection will be explored. As noted earlier, medicine has the most complex and diverse array of specialties from which to choose, which requires decidedly different abilities, skills, and talents (Rogers, Creed, & Searle, 2009; Sodano & Richard, 2009b; Stratton et al., 2005). A physician’s specialty choice has important personal, economic, and societal consequences (Savickas, Brizzi, Brisbin, & Pethtel, 1988). Each medical specialty is a distinct occupation, which has divergent experiences, lifestyles, skill sets, and income levels. With over 100 medical specialties to choose from, medical specialty selection is a crucial part of a medical student’s career development. It is the biggest and most enduring decision made during their tenure at medical school, and is on par with choosing medicine as a career in general (Borges, 2007; Reed, Jernstedt, & Reber, 2001). Medical students receive similar core classes while in medical school to prepare them to ultimately engage in their chosen medical specialty. In addition, developments in

28

medical education (which includes three-year medical school programs, all electives for the fourth year, the substitution of an internship for fourth-year electives, specialty programs in the third and fourth year of medical school, specialty tracks beginning in the first year of medical school, and elimination or modification of the internship requirement) have resulted in placing pressure upon students for earlier consideration of specialty choice (Zimny & Senturia, 1974). Medical students quickly find themselves confronted with the task of medical specialty selection by family, classmates, medical school staff, and medical school professors (Borges, 2007; Sodano & Richard, 2009b; Stratton et al., 2005). Medical students commonly change their minds about their specialty (Scott, Gowans, Wright, & Brenneis, 2007) during the first two years of medical school while they are studying the basic sciences and have many questions about the different medical specialties (Gough, 1979; Reed et al., 2001). Longitudinal studies have suggested that between 60% to 75% of medical students change their specialty while still in medical school (Markert, 1983; Savickas, Alexander, Jonas, & Wolf, 1986). Zimny and Senturia (1974) performed a longitudinal study to determine when the initial specialty choice was made during medical school and the consistency of students’ choices. Their study suggested that few medical students manifest a high degree of consistency in medical specialty choice (Zimny & Senturia, 1974). Only 1% of the medical students in the study chose the same specialty each of the four years of medical school (Zimny & Senturia, 1974). Additionally, the results suggested that many students have difficulty in attempting to resolve the question of specialty choice during medical

29

school and that the earlier the question is faced, the greater the number of students who have difficulty (Zimny & Senturia, 1974). It is not surprising that medical students become more certain of their medical specialty as they approach their final year of medical school (Bhattacharya, 2005; Borges & Savickas, 2002; Reed et al., 2001). This suggests that choosing a medical specialty is a multifaceted and developmental process that begins with the medical student entering medical school and ends with the medical student completing their residency program and hence entering their medical specialty (Reed et al., 2001). Many students seek counseling when making this choice because they must choose a specialty before they have sufficient experience and information (Savickas et al., 1988). It has been suggested that first-year medical students need help with decision-making processes, second-year medical students need help exploring their abilities and interests along with medical specialty options, third-year medical students need to relate their interests to medical specialties and explore medical specialties that match their interests, and fourth-year medical students need to decide upon their preferred medical specialty (Savickas et al., 1986). The fourth and last year of medical school involves the medical student having contact with various medical specialties through clinics and ultimately making the decision to choose and apply for a residency and hence selecting their medical specialty (Bhattacharya, 2005; Reed et al., 2001; Richard et al., 2007). The residency matching process involves connecting a residency program’s preferences with a medical student’s preferences, and involves at least a 3 year learning commitment on the part of the medical

30

student depending upon the requirements of the medical specialty (Bhattacharya, 2005). The National Intern and Residency Matching Program (NIRMP) is a complex computer-based process in which the ranked preferences of medical students for a particular medical specialty are matched with the ranked preferences of residency programs for particular students (Zimny, 1980). The matching process is carried out midway through the last year of medical school by a vast majority of all prospective medical school graduates (Zimny, 1980). The NIRMP results pertain to the first postgraduate year of medical school, which requires the least commitment to a specialty. However, medical students are required to commit to medical specialty for the second postgraduate year (Zimny, 1979; Zimny, 1980). Some physicians decide after the first residency year that they do not want to continue their training in that specialty and thus must select a different specialty (Zimny, 2002). Zimny (1980) studied changes medical students made between their first and second year of their postgraduate residency program. The data suggested that 76% had the same specialty in both the first and second postgraduate years (Zimny, 1980). Current research suggests that approximately 10% to 27% of residents change their residency (Borges, Savickas, & Jones, 2004; Borges, Gibson, & Karnani, 2005; Reed et al., 2001; Richard et al., 2007; Savickas et al., 1986), which requires finding a residency opening mid-stream at another site in another medical specialty and then completing the new residency. Although medical students can swap or change residencies at any time if they can find a vacancy in a new residency program, changing a residency costs hospitals time and money in training the resident. Furthermore, the resident becomes frustrated

31

and loses time and money while finding and then completing a second residency (Borges et al., 2005). At the conclusion of the full term of the residency, the medical student is able to complete testing by their medical specialty’s board. Passing the board examination transforms the medical student resident into a physician practicing in their medical specialty (Bhattacharya, 2005). At this time, the new practicing board-licensed physician can practice in their chosen specialty or they may take on another sub-specialty residency to enter a more specialized field such as cardiology or neurosurgery (Bhattacharya, 2005). Once a physician has entered into a medical specialty, it becomes a large burden financially and chronologically to change medical specialties by taking on a new residency program and completing a new board examination (Rogers et al., 2009). It has been established that between 13% to 26% of practicing physicians who became board-licensed in a medical specialty changed their medical specialty and thus undertook a new residency (Jarecky, Schwartz, Haley, & Donnelly, 1991; Savickas et al., 1986; Tardiff, Cella, Seiferth, & Perry, 1986). Further, one research study suggested that 40% of physicians were uncertain if they would pick the same medical specialty again (Leong, Hardin, & Gaylor, 2005). It is unknown how many physicians find themselves unsatisfied in their medical specialty, yet continue to practice in that medical specialty due to time and financial constraints. Zimny and Senturia’s (1973) study of 76 medical schools, suggested that medical schools provided little formal, organized help to medical students when selecting a medical residency. While medical schools are giving students opportunities for early

32

specialty experiences, they also need to help students with the problem of medical specialty choice (Zimny & Senturia, 1974). Ultimately, medical students need help planning electives, selecting a residency, and applying for the NIRMP (Zimny, 1979; Zimny, 2002). The data reported in this section demonstrate the importance for medical students to select their preferred medical specialty before a large investment of time and money is spent in a residency program. Because of the diversity in medical specialties, the medical profession is one area where the accurate prediction of medical specialty preference can save time and money not only for medical students and their matched residency sites, but also for state-subsidized medical schools (Glavin, Richard, & Savickas, 2007; Rottinghaus, 2009). In addition, there is concern about a plethora of Internists, Surgeons, Psychiatrists, Pathologists, and Pediatricians and a detriment of other important specialties such as Primary Care (Leong et al., 2005). A discussion of the factors that impact medical specialty selection will follow to assist the reader in understanding the complexities that exist in trying to assist medical students with career decision making when choosing a medical specialty. Factors Influencing Medical Specialty Choice There is an abundance of research which has tried to determine the various factors that impact the medical student as they enter medical school and ultimately graduate and enter their medical specialty (Bhattacharya, 2005; Borges et al., 2005; Richard et al., 2007; Rogers et al., 2009; Scott et al., 2007; Sodano & Richard, 2009b). These factors have included academic performance, role models, family history, values, personality,

33

interests, anticipated insurance problems, gender, culture, work-life balance, malpractice risk, prestige, and federal support for the medical specialty (Leong et al., 2005). Upon reflection of the research, it appears that two major themes emerge. The first theme is based upon the medical student’s perception of the medical specialty, which includes the lifestyle that the medical specialty would provide the medical student. The second theme is based on the demographic qualities, values, and interpersonal struggles of the medical student. The first theme’s factors include the medical student’s perception of the medical specialty in terms of the lifestyle experienced while engaged in the medical specialty (such as being on call, work hours, set schedules, income, flexibility, etc.). This set of factors can be influenced by the student’s medical school characteristics, encouragement by the student’s faculty and mentors, information provided to the student on over 100 medical specialties, economics/politics associated with the medical specialty, concerns about working relationships within the specialty, perceived enjoyment of the medical specialty, suitability for the work involved, positive and negative clinical exposure to the medical specialty, the perception of the skills required for the medical specialty, and ease of obtaining residency in that medical specialty. Lambert, Davidson, Evans, and Goldacre (2003) found that the perceptions of working hours, workload, and working conditions made the biggest impact on a medical student rejecting a medical specialty. Women were more likely than men to cite quality of life reasons as the primary reason for rejecting a medical specialty. However, males also listed quality of life issues as the most important factor when rejecting a medical specialty.

34

The second theme’s factors include demographic qualities, values, and interpersonal struggles of the medical student. One study found that students who selected primary care as a medical specialty were more likely to be married, and placed a greater value on people skills over technical skills (Newton, Grayson, & Whitley, 1998). A study by Reed, Jernstedt, and Reber (2001) suggested that the medical student’s self-efficacy (as defined by the medical student’s grades) mitigated medical specialty choice. Medical students receiving poor grades selected less rigorous and less competitive residency programs than medical students in the top of their class. This suggests that medical students are not selecting medical specialties based upon personal interest, but upon grades and perceived success. A study by Borges, Stratton, Wagner, and Elam (2009) suggested that emotional intelligence does not predict medical specialty preference. Lastly, Lambert, Goldacre, Davidson, and Parkhouse (2001) studied the impact of age at entry to medical school on medical specialty selection. However, their study was not able to predict a student’s medical specialty based upon age at admission. Zimny and Shelton (1982) investigated differences between men and women medical students and their preferences for medical specialties. Results of the study suggested that male students had a significantly higher preference for Internal Medicine and Surgery while female students had a preference for Obstetrics and Gynecology, Pediatrics, and Psychiatry (Zimny & Shelton, 1982). The male medical students scored significantly higher on the factors relating to serious complex medical problems, the knowledge base employed in medicine, complex technical procedures, and ancillary services (Zimny & Shelton, 1982). The female medical students scored significantly

35

higher on preventative care and patient participation (Zimny & Shelton, 1982). Overall results suggested that men may be more cognitively and technically oriented while women may be more patient oriented (Zimny & Shelton, 1982). A separate study demonstrated that gender plays a role in medical specialty selection as 70% of females expressed an interest in primary care, while males were three times more likely to express an interest in surgery (Reed et al., 2001). Additionally, the same study found that males were more often influenced in medical specialty selection by objective scores on exams, while women were more influenced in medical specialty selection by subjective faculty evaluations (Reed et al., 2001). The hypothesis that personality types predict medical specialty preference began in 1957. Since that time, there have been attempts to create a personality profile for each medical specialty (Borges & Osmon, 2001; Stratton et al., 2005). However, using personality traits or demographic characteristics produces conflicting medical specialty profiles (Savickas, Alexander, Osipow, & Wolf, 1985). This premise was supported through research suggesting that all personality types were found equally in medical school admissions (Stilwell & Wallick, 2000). Further, a variety of personality types can be found in each medical specialty, and research has demonstrated that utilizing personality inventories alone cannot predict a medical student’s medical specialty (Borges & Savickas, 2002; Gough, 1979). A 2009 review of the literature between personality-type variables and medical specialty choice suggested that personality-type was not a dependable predictor of medical specialty (Borges, Stratton, Wagner, & Elam, 2009). In addition, a 2009 review of the literature suggested that lifestyle preferences,

36

financial incentives, role-models, and gender, although not conclusive factors, were more robust factors in medical specialty choice than personality variables (Borges et al., 2009). It is interesting to note, however, that many practicing physicians believe that the fit between the medical specialty and their personality was the most important factor in selecting that medical specialty (Leong et al., 2005). Although attempts have been made to predict medical specialty preference based upon factors that include the qualities and perceptions of the medical student, none of these factors has demonstrated a consistent ability to help medical students predict their medical specialty (Borges & Savickas, 2002; Borges, 2007). One may conclude that a variety of these factors impact and influence medical students to some degree when selecting a medical specialty. Helping medical students to understand their assets, knowledge, feelings, relationships, and interpersonal capacities can promote an ideal fit to a medical specialty, (Reed et al., 2001) and can ultimately lead to higher job satisfaction (Borges et al., 2005). As such, interest inventories focusing on task performance were suggested to be more helpful in predicting a medical specialty than personality inventories, self-reported qualities, and perceptions of the medical student (Borges & Savickas, 2002; Gough, 1979). The measurement of interests has been the most widely used approach to help medical students select their medical specialty (Sodano & Richard, 2009b). Interest inventories promote self-knowledge and an understanding of how interests match the demands of differing medical specialties in an effort to increase sound decision making (Leong et al., 2005). It therefore becomes crucial that medical students are given the best

37

interest inventories possible to assist them in making medical specialty decisions (Borges et al., 2005; Borges et al., 2005; Glavin et al., 2007; Richard, 2005). General interest inventories (such as the Strong Vocational Interest Bank or Self-Directed Search) demonstrate very limited success in helping medical students to select their medical specialty since the generic interests measured in these types of inventories share similarities across all the medical specialties (Glavin, Richard, & Porfeli, 2009; Sodano & Richard, 2009b). A review of the literature has suggested that instruments like the Medical Specialty Preference Inventory-Revised (MSPI-R), which was created to help medical students select their medical specialty based upon task performance, has had more success in accurately helping medical students select their medical specialty than instruments like the Medical Specialty Preference Scales, which performs in the same range as the chance expectancy rate (Glavin et al., 2009; Sodano & Richard, 2009b). Due to its superior performance in research, the MSPI-R has become the preferred interest inventory for medical specialty selection. To clarify these findings, a review of the psychometric scoring methodologies of medical specialty inventories will follow. Psychometric Scoring Methodologies of Medical Specialty Inventories Strong and Tucker Strong and Tucker used the Strong Vocational Interest Blank and the Medical Specialist Reference Blank in 1952 with 3,600 physicians to construct a new scale for physicians and the first medical specialty scales for Internists, Surgeons, Pathologists, and Psychiatrists (Strong & Tucker, 1952). Unfortunately, in the first analysis of their

38

data, Strong and Tucker were only able to differentiate Psychiatry from the other three medical specialties (Strong & Tucker, 1952). A 10-year follow-up on Strong and Tucker’s medical specialty scales suggested that they entirely failed to predict the eventual choice of medical specialty; even Psychiatry (Tucker & Strong, 1962). In 1966, Campbell reevaluated the Strong and Tucker study by rescoring and recomputing the findings, however, the results remained the same and there was no ability to predict which specialty a medical student would enter (Campbell, 1966). Campbell’s reevaluation did suggest that well-established medical professionals answered the interest inventory very differently from students preparing for entry into the same medical specialty. He believed that this difference explained the failure of the scales (Campbell, 1966). Campbell theorized that measurable specialty interests for physicians may not appear until they have practiced in a specialty for several years (Savickas et al., 1988). Holland Holland did not attempt to create a medical specialty inventory. However, Holland’s inventories are currently used in medical schools as part of career counseling. Research has suggested that the data gathered from Holland’s interest inventories are not as helpful once the medical student is looking for their medical specialty since the majority of medical specialties classify with the same Holland code as the profession of medicine in general (Borges et al., 2004). Zimny Although Strong and Tucker were unsuccessful in their attempts to create a medical specialty inventory, Zimny took Strong’s empirical method and created the

39

Medical Specialty Preference Inventory (MSPI). The MSPI was developed to provide information about specialty preferences to medical students facing the question of choosing a medical residency (Zimny, 1979; Zimny, 1980). The MSPI compares the medical student’s score to a normed occupational group comprised of physicians who practice a specific medical specialty. However, instead of following in Strong’s footsteps and looking at the generic concepts of an occupation for the MSPI’s items, Zimny based the MSPI’s items on specific medical activities performed in specific medical settings (Gough, 1979; Savickas et al., 1988). The basic rationale for the instrument was to compare the preferences of medical students for certain factors in the practice of medicine with the description given by physicians about how characteristic those factors were in the practice of their medical specialty (Zimny, 2002). Ultimately, the MSPI connected what students wanted in medical practice to the tasks of medical specialties (Zimny, 2002). The first version of the MSPI consisted of 199 items, which were answered by students using a seven-point scale to indicate the degree of preference for each item (Zimny, 2002). The MSPI was able to garner reliability for 6 medical specialties (Internal Medicine, Family Medicine, Obstetrics and Gynecology, Surgery, Psychiatry, and Pediatrics) in the .70 to .90 range (Richard, 2005). In addition, predictive validity was tested by comparing the overall specialty preference scores of medical students voluntarily taking the MSPI to the first-year graduate program they later obtained in the National Intern and Resident Matching Program (Zimny, 1980). A specialty was predicted for a student when the overall specialty score on the MPSI was at or above the

40

lower cutoff score of 70 and was the highest of the six overall scores (Zimny, 1979; Zimny, 1980). In four different examinations of medical students, a 51% predictive accuracy rate was achieved by the MSPI, which is greater than the conservative chance expectancy of hit rate 1 out of 6 or 17% (Zimny, 1979; Zimny, 1980; Zimny, 2002). The 51% predictive hit rate of the MSPI compared very favorably with the predictive validity of the Strong Vocational Interest Blank, and was a vast improvement over Strong’s initial attempts at predicting medical specialty preference (Zimny, 1980). After development and successful validation of the instrument, Zimny and Senturia (1973) next sought to determine the number of medical students who would, on a voluntary basis, utilize a source of information about their medical specialty preferences. Half of the students in the study returned a completed MSPI in order to obtain an MSPI report, which suggested that specialty decision was a substantial concern among students (Zimny & Senturia, 1973). The two main reasons cited for not completing an MSPI were that a specialty had already been chosen and doubt existed that the instrument could actually produce useful information (Zimny & Senturia, 1973). Further, 68% of students in the study said that taking the MSPI was worthwhile and 78% said it should be offered to students next year (Zimny & Senturia, 1973). Gough Gough was not satisfied with a medical specialty inventory that could only be used by individuals in medical school. Like Zimny, he also based the construction of his scales on the pioneering work of Strong and Tucker (Gough, 1979). Gough utilized the Strong-Campbell Interest Inventory (SCII) as the basis for his attempt at medical

41

specialty preference scales designed for individuals prior to entering medical school (Gough, 1979). The items on the SCII are made up of school subjects, occupational titles, and types of people (Savickas et al., 1988). To obtain Gough’s medical specialty scales, first an individual would have to pay for the initial general scoring of the SCII and then pay extra to have Gough’s medical specialty scales calculated from the general SCII scores. To develop his scales, Gough administered the SCII to freshman medical students and analyzed the responses after they had begun practicing in their medical specialty to identify items that differentiated medical students who had entered different medical specialties (Savickas et al., 1988). His Medical Specialty Preference Scales (MSPS) were created to predict 10 medical specialties (Internal Medicine, Obstetrics and Gynecology, Pediatrics, Psychiatry, Surgery, Family Medicine, Anesthesiology, Otolaryngology, Pathology, and Radiology) (Gough, 1979). Ultimately, the results of Gough’s MSPS resembled the failed outcome by Strong and Tucker (Gough, 1979). An important contribution to note: Gough’s research into developing medical specialty scales did suggest the prevalence of sex differences in the selection of medical specialties (Gough, 1979). Comparison of the MSPS and the MSPI A study by Savickas, Brizzi, Brisbin, and Pethtel (1988) examined the predictive validity of the Medical Specialty Preference Scales (MSPS) and the Medical Specialty Preference Inventory (MSPI). To test the predictive validity of both inventories, each student’s predicted specialty on the inventory was compared to the specialty entered by

42

the student. The study used an overall hit rate and Cohen’s kappa to determine agreement between the predicted and actual choices for each inventory. For this study, hit rate was defined as the portion of students who actually entered the specialty that received the highest score on the inventory. The maximum value of kappa is 1.0, and occurs when there is perfect agreement between predicted and actual outcomes. A kappa value of less than .20 represents poor agreement; between .21 and .40 represents fair agreement; between .41 and .60 represents moderate agreement; between .61 and .80 represents good agreement; and between .81 and 1.0 represents very good agreement beyond chance (Landis & Koch, 1977). The MSPS yielded a 19% hit rate and an overall kappa of .15. Overall, the MSPS displayed poor predictive validity, little beyond what one would expect simply by chance. The MSPI yielded a 59% hit rate and an overall kappa of .48. The MSPI displayed moderate predictive validity and accurate predictions that were well beyond chance. Based upon predictive validity, the results suggested that the MSPI appeared to be more useful than the MSPS for counseling medical students about specialty choice. The MSPI’s Current Role in Medical Specialty Selection To help medical students select their preferred medical specialty, and to help alleviate shortages in medical specialty staffing, medical schools now invest time and money in career education and counseling programs that offer interest inventories, values surveys, occupational information, and other resources (Borges et al., 2004). The Association of American Medical Colleges (AAMC) has been constantly improving Careers in Medicine (CiM). This student-driven program helps medical students

43

throughout the United States with medical specialty decision making through understanding their career values and medical specialty preferences. The CiM program is a four stage process that includes understanding self, exploring options, choosing a specialty, and getting into residency (Richard et al., 2007). The CiM program can be self-administered by a student on-line, or used in conjunction with the career guidance programs offered at the student’s medical college. While all aspects of the CiM program are helpful to medical students in selecting their medical specialty, each assessment in the CiM program has its limits in terms of helpfulness. For example, inventories such as the Self-Directed Search (SDS) (Form R, 4th ed.) (Holland, Powell, & Fritzsche, 1994) are not as effective in determining medical specialty preferences because medical students often receive a primary code of “Investigative” (Borges et al., 2004). Borges et al. (2004) noted that medical students with a well-defined primary code of “Investigative” (where the highest score was significantly distant from the second and third-highest score) were more likely to pursue technique-oriented and service specialties. Further, medical students with a well defined “Investigative-Social” code (where the two highest scores were significantly distant from the third-highest score) were more likely to pursue patient-oriented specialties (Borges et al., 2004). Inventories such as the SDS can be helpful to a medical student working to define their medical specialty preference by directing the student toward technique-oriented or patient oriented specialties. However, the SDS has not been found to be statistically significant in helping medical students select their medical specialty.

44

The Medical Specialty Preference Inventory (MSPI) was purchased by the American Association of Medical Colleges (AAMC) to be used as part of its Careers in Medicine (CiM) program to specifically help medical students select their medical specialty (Richard, 2005; Sodano, Savickas, & Richard, 2007). The MSPI was updated and revised in 2001 by George Richard and Mark Savickas. At this time, a reduction was made in the number of items, there was a change in the content of items, and the instrument was made accessible for on-line administration and scoring (Glavin et al., 2007). Preliminary studies of the MSPI-2 demonstrated good internal consistency (Sodano et al., 2007). Richard (2005) also noted that reliability and validity remained consistent between the original and revised versions of the MSPI. Glavin, Richard, and Savickas (2007) performed a study on the predictive validity of the MSPI-2. Results suggested an overall predictive hit rate of 45% when looking a medical student’s highest overall MSPI-2 score (Glavin et al., 2007). Results also suggested that when a student’s score on a medical specialty was at 72 or higher, the MSPI-2 accurately predicted that student’s medical specialty 59% of the time (Glavin et al., 2007). Borges, Gibson, and Karnani (2005) performed a second study of the predictive validity of the MSPI-2 and found a 33% predictive hit rate. In addition, their results suggested that physicians who had picked the medical specialty predicted by the MSPI-2 as medical students were happier with their work than those physicians who had not picked the predicted medical specialty. Further, Borges et al. (2005) noted that the revised MSPI only predicted scores for the “big six” medical specialties (Internal Medicine, Family Practice, Pediatrics, Obstetrics and Gynecology, Psychology, and

45

Surgery) when there were well over 100 medical specialties. They called for an increase in the number of medical specialties from six to include at least eight or nine to help improve the predictive hit rates for medical students. The past and more recent studies suggest a consistency in terms of predictive validity between the first and second revision of the MSPI. Research has consistently demonstrated that the MSPI-2 offers medical students a valuable resource to help them select a medical specialty that is well beyond the chance expectancy rate. Since then, improvements to the MSPI-2 have resulted in a new version: the Medical Specialty Preference Inventory-Revised (MSPI-R). However, it has been suggested that further research needs to be conducted to improve the MSPI’s predictive hit rate accuracy. Specifically, Borges and Savickas (2002) called for the MSPI-R to incorporate the psychometric scoring methodology of Kuder’s person matching model in order to increase predictive hit rates and emphasize the concept of individuality by providing the medical student with the biographies of practicing physicians who share similar profile patterns (Borges & Savickas, 2002). If utilized, the medical student would receive narrative biographical information that would illuminate the types of individuals who practice a medical specialty to ensure a better fit between the medical student and the medical specialty (Borges & Savickas, 2002). In addition, by using person matching as the psychometric scoring methodology, the MSPI-R may be able to overcome the sexism found in the psychometric scoring methodologies of Strong and Holland because there are no occupational group norms or ideal prototypes developed. Finally, person matching could also eliminate the need to

46

develop group norms for each medical specialty, which is important as new medical specialties are being created routinely. As soon as a practicing physician would take the MSPI-R and become part of the reference group, a medical student could be matched to that medical specialty instead of waiting for enough physicians to become part of a medical specialty to create a scale. If it can be demonstrated that Kuder’s person matching model is a successful psychometric scoring property for the MSPI-R, the next step would be to obtain biographies for physicians in the MSPI-R’s reference group. The collected biographies would allow medical students taking the MSPI-R to garner more data than a specialty title, which could be useful to assist medical students further in making medical specialty decisions. If Kuder’s person matching model is ineffective as the psychometric scoring methodology for the MSPI-R, then the AAMC would not be motivated to collect the biographies of physicians in the reference group so that the full person matching model could be offered to medical students. As Kuder’s person matching model is examined by this research, a discussion of the literature based upon person matching as a psychometric scoring methodology will be offered. Literature Review of Kuder’s Person Matching Model for Prediction To help explore the question of utilizing Kuder’s person matching model as the psychometric scoring property for the MSPI-R, a literature review was conducted using the databases Academic Search Complete, Academic Search Premier, Education Research Complete, ERIC, Psychology and Behavioral Sciences Collection, and Vocational and Career Collection for the terms Kuder, person matching, predictive,

47

career assessment, career, vocation, specialty, and choice. A combination of twenty-one journal articles, monographs and dissertations mentioned Kuder’s efforts in some way. Four discussed the history of Kuder’s work, four discussed the creation of Kuder’s own interest inventories, four examined the validity of his interest inventories, four discussed the interpretation of his interest inventories, four discussed the validity of the Kuder-Richardson coefficient, one discussed how scores on the Kuder Career Interest Assessment changed with a career intervention, and two researched using the person matching model for prediction. The dissertation and journal article that utilized person matching for prediction are reviewed below as they pertain directly to this line of research. A doctoral dissertation by Seling (1979) compared the person matching model to a standard model based upon the work of Strong to evaluate psychodynamic scoring methodologies. The researcher studied 192 undergraduates who took the Jackson Personality Inventory (JPI) and the Kuder Occupational Interest Survey (KOIS) twice over a three-week period. Results of the study noted that the person matching model reliability (0.79) was equal to the standard scoring model reliability (0.80) in career inventory data. A second part of the dissertation looked at validity by comparing the person matching model to the standard scoring model in determining if the similarities between clients and counselors accounted for outcomes in counseling. Twenty-one graduate student counselors and clients were randomly paired and took the JPI and the KOIS before engaging in counseling. Results of the second phase of the dissertation noted that similar person matching scoring on the KOIS between the counselor and client

48

predicted counseling outcomes, while similar standard scoring on the JPI between the counselor and client did not predict counseling outcomes. Results suggested that the person matching model had enough preliminary reliability and validity to warrant further research as a psychometric scoring methodology. Unfortunately, since 1979, modest research has been performed utilizing person matching as a psychometric scoring methodology in interest inventories. A second study utilized the person matching model with scores on the Sixteen Personality Factor Questionnaire (16PF) to predict the medical specialty of 420 medical students (Hartung, Borges, & Jones, 2005). Participants took the 16PF their first year in medical school between the years 1995 and 2000. After graduation and selection of their medical residency, the 16PF scores were matched to 358 members of a reference group who had taken the 16PF and had graduated from medical school between the years 1995 and 1999. By using the person matching model with medical students’ scores on the 16PF, results suggested 43% to 60% predictive accuracy. This study supports this dissertation’s use of person matching for medical specialty prediction by noting a moderate predictive hit rate between 16PF scores and the medical student’s later chosen medical specialty. The two sources reviewed suggest that Kuder’s person matching model has enough preliminary reliability and validity to warrant further research to improve the predictive hit rates of the MSPI-R. As very little research has been performed utilizing Kuder’s person matching model as a psychometric scoring methodology, the current study attempts to provide additional data to consider the use of person matching to

49

predict specialization in an occupation by utilizing inventory items based upon work activities. Research Problems This research covers several important research problems. First, there is the problem of choosing interest inventory item, scale, or factor scores to use to perform person matching. Second, there is the problem of how many close person matches are enough to achieve robust predictive hit rates. Third, there is the problem of comparing the predictive hit rate achieved by standard scoring on an interest inventory to the predictive hit rate achieved by person matching on the same inventory. Fourth, there is the problem of potential gender differences in predictive hit rates for standard scoring and person matching methodologies. Lastly, there is the problem of understanding if a combination of standard scoring and person matching psychometric scoring methodologies on the same inventory would improve predictive hit rates. This dissertation addresses these five research problems. Research Questions Kuder’s person matching model (stemming from a postmodernist theoretical orientation) has received scant research attention and use in the field of career counseling and interest inventories. This dissertation uses longitudinal data from one career field, medicine, to study and evaluate the use of Kuder’s person matching model as a psychometric scoring methodology in interest inventories. As person matching has had little psychometric research since its inception in 1977, a fundamental need exists to determine how best to use person matching as a

50

psychometric scoring methodology. For example, the scores from an interest inventory can be matched on two different levels in Kuder’s person matching model: (a) the item level where the scores are matched person-to-person on each item separately and (b) the profile level where the scores from scales or factors within the instrument are matched person-to-person for each scale. The first research question of this dissertation asks: “How predictive is Kuder’s person matching model when using items, scales, and factors?” Next, person matching takes each test-taker’s scores and compares them to a criterion pool comprised of thousands of people. As such, person matching results in ranking the entire criterion pool from the individual with the closest score to the test-taker to the individual with the score furthest from the test-taker. A second need exists to determine how many of the closest matches are needed to make the most accurate prediction when using person matching as a psychometric scoring methodology. The second research question of this dissertation asks: “How predictive is Kuder’s person matching model when using the top 1, top 5, top 10, and top 20 matches?” Over 4 years worth of longitudinal data exists for medical students’ MSPI-R scores and their selected residency. Specifically, the AAMC wants to know if predictive hit rates increase when using Kuder’s person matching model to interpret MSPI-R scores as compared to using the standard scoring paradigm. The third research question asks: “How do the predictive hit rates compare between Kuder’s person matching model and Zimny’s norm referenced group matching model when examining medical students’ scores on the MSPI-R?” The fourth research question of this dissertation asks, “How do

51

gender differences impact predictive hit rates for standard scoring and person matching methodologies on the MSPI-R?” In addition, supplemental scoring could be useful in increasing predictive hit rates. This would adopt a fractured foundationalism as an epistemology and combine elements of Zimny’s modernist model of scoring the MSPI-R with Kuder’s postmodernist model of scoring. Here the longitudinal data would be scored through a combination of discriminant function analysis (Zimny) and person matching (Kuder) to see if utilizing both psychometric scoring methodologies would increase the predictive hit rates over either alone. Therefore, the fifth research question asks, “Does the predictive hit rate of medical students’ MSPI-R scores increase if the data is scored by combining the person matching model with aspects of Zimny’s norm referenced group matching model?” The research questions raised in this dissertation are of interest to all developers and users of interest inventories along with supporters of Kuder’s work (such as Zytowski and Rottinghaus). Because the sample utilized in the research was comprised of medical students, the AAMC, medical colleges in the United States, teaching hospitals in the United States, medical students in the Unites States, and medical educators in the United States are interested in this line of research. Further, if Kuder’s postmodernist tradition of person matching is found to be more predictive than the current modernist tradition of matching to occupational group or prototype, it could herald changes in the scoring of other interest inventories. Alternatively, if combining both the person matching model and the group matching model increases medical specialty predictions

52

for medical students, it could also lead to changes in the way that interest inventories are scored. Hypotheses H1: Person matching achieves the highest hit rates on the MSPI-R when scored utilizing 150 items. H2: Person matching achieves the highest hit rates on the MSPI-R when scored utilizing the 20 closest person matches. H3: Person matching achieves higher hit rates on the MSPI-R than standard scoring. H4: Gender differences in hit rates are less pronounced for person matching on the MSPI-R than standard scoring. H5: Predictive hit rates are at their highest on the MSPI-R when combining standard scoring and person matching psychometric scoring methodologies. Conclusion Career counseling has aspired to use theory-driven, evidenced-based interventions to help individuals select an occupation. Career indecision has been suggested to resolve with access to information and assistance with clarifying interests (Creed et al., 2006; Ihle-Helledy et al., 2004). As such, researchers are challenged with improving the theory and science behind interest inventories currently used to predict an individual’s career interest or specialty preference within an occupation. Strong’s empirical method, Holland’s theoretical method, and Kuder’s rational and person matching methods have been discussed and evaluated in terms of their

53

successful use as psychometric scoring methodologies for interest inventories. Person matching (an underutilized and scantily researched psychometric scoring methodology) presents an innovative alternative, and may solve several problems inherently found in previous psychometric scoring methodologies. Namely, person matching (a) dramatically increases the number of occupations included in the scoring of an interest inventory, (b) may overcome the sex bias found in traditional psychometric scoring methodologies, (c) allows for the scores of students taking the interest inventory to be compared to the scores of students who took the interest inventory and have since entered an occupation or a specialty, (d) offers test-takers the ability to receive biographic data which may offer a more robust career exploratory experience than receiving an occupational title alone, and (e) does not assume stable occupations in a global economy demanding flexibility and evolution in the career paths of individuals facing outsourcing and contractual work. Specifically, the research investigated if the predictive hit rates for the Medical Specialty Preference Inventory-Revised (MSPI-R) would increase by changing its current modernist scoring system, based upon the occupational groups of Strong, to a post-modernist scoring system, based upon Kuder’s person matching model. To explore this possibility, the next chapter will discuss the participants, measure, data collection, data analysis procedures, delimitations, and expected results of the research.

CHAPTER II METHODOLOGY The current study investigated (a) how to best perform the person matching psychometric scoring methodology and (b) the comparison of predictive hit rates of the Medical Specialty Preference Inventory-Revised (MSPI-R) utilizing two different psychometric scoring methodologies. This chapter explains the methodology used to collect and study the data and describes the participants, measure, data collection procedures, data analyses, delimitations, and expected results. Participants The participants in this study voluntarily took the MSPI-R as medical students and have since entered their residency. The participants were 5,143 (2,898 female and 2,245 male) medical students enrolled in medical schools across the United States who took the MSPI-R between January 2005 and April 2008 from several ethnicities: White (3,447), Asian (767), African American (343), Hispanic (250), Other (53), American Indian/Alaska Native (27), and Native Hawaiian/Pacific Islander (6) with 250 participants not identifying an ethnicity. Ages ranged between 57 and 24 at the time of taking the MSPI-R with a median age of 30. Medical schools are assigned to one of 4 regions in the United States by the AAMC. In this study, 1,706 participants were enrolled in Southern medical schools; 1,461 participants were enrolled in Midwestern medical schools; 1,425 participants were enrolled in Northern medical schools; and 551 participants were enrolled in Western medical schools. Participants took the MSPI-R during all four years of medical school: 340 as first year medical students, 1,045 as 54

55

second year medical students, 3,411 as third year medical students, and 347 as fourth year medical students. Each medical student who took the MSPI-R was matched in their fourth year of medical school to a residency program. The following list includes the number of participants who were matched to each residency program: Internal Medicine (1,007); Pediatrics (695); Emergency Medicine (514); Family Medicine (507); Obstetrics and Gynecology (379); Surgery (330); Anesthesiology (297); Psychiatry (245); Orthopedic Surgery (201); Radiology (194); Pathology (149); Internal Medicine Pediatrics (139); Otolaryngology (94); Ophthalmology (65); Neurology (64); Dermatology (58); Physical Medicine and Rehabilitation (55); Urology (33); Neurological Surgery (27); Radiation Oncology (18); Plastic Surgery (16); Pediatrics/Child and Adolescent Psychiatry (11); Internal Medicine/Psychiatry (4); Preventive Medicine (4); Psychiatry Family Practice (4); Pediatrics/Physical Medicine and Rehabilitation (3); Child and Adolescent Psychiatry (2); Child Neurology (2); Infectious Disease (2); Internal Medicine/Dermatology (2); Internal Medicine/Emergency Medicine (2); Internal Medicine/Family Practice (2); Internal Medicine/Neurology (2); Nuclear Medicine (2); Pediatric Emergency Medicine (2); Pediatric Hematology/Oncology (2); Pediatrics/Medical Genetics (2); Vascular Surgery (2); Cardiovascular Disease (1); Gastroenterology (1); Medical Toxicology (1); Pediatrics/Dermatology (1); Psychiatry/Neurology (1); and Sports Medicine (1). All 5,143 medical students served as reference group members since all had raw scores, scale scores, and a selected medical specialty. This information allowed the researcher to perform person matching and standard scoring methodologies.

56

Random Sample To test both psychometric scoring methodologies, a subset of the full reference group had to be selected. A stratified random sample of 500 medical students was chosen from the reference group of 5,143 medical students. To define the medical students making up the sample, the researcher formulated a selection rubric to aid in understanding variances in the different psychometric scoring methodologies. First, the researcher wanted to observe possible gender variances in the methodologies. Therefore, half of the stratified random sample members were female and half were male. Second, the researcher wanted to observe possible variances in predictive hit rates for the different medical specialties. Twenty-two medical specialties [Internal Medicine (1,007); Pediatrics (695); Emergency Medicine (514); Family Medicine (507); Obstetrics and Gynecology (379); Surgery (330); Anesthesiology (297); Psychiatry (245); Orthopedic Surgery (201); Radiology (194); Pathology (149); Internal Medicine Pediatrics (139); Otolaryngology (94); Ophthalmology (65); Neurology (64); Dermatology (58); Physical Medicine and Rehabilitation (55); Urology (33); Neurological Surgery (27); Radiation Oncology (18); Plastic Surgery (16); and Pediatrics/Child and Adolescent Psychiatry (11)] were selected to be part of the stratified random sample as they contained at least 10 medical students who had entered the medical specialty. By using 10 as a cutoff number, it gave greater assurance that each psychometric scoring methodology would have a chance at producing accurate predictive hit rates. Third, the researcher wanted to match the proportion of medical students in each medical specialty in the total sample to the proportion of medical students in a medical specialty in the stratified random sample

57

while at the same time providing a robust examination of the accuracy of the psychometric scoring methodologies for medical specialties with smaller memberships. Based upon this premise, the researcher determined the proportionate size of each of the 22 medical specialties, making sure that the total could be divided by two into whole numbers to accommodate an equal number of males and females in each specialty. The following reports each specialty and the number of members included in the random sample: Internal Medicine (56), Pediatrics (44), Emergency Medicine (40), Family Medicine (40), Obstetrics and Gynecology (30), Surgery (30), Anesthesiology (24), Psychiatry (24), Orthopedic Surgery (24), Radiology (20), Pathology (20), Internal Medicine Pediatrics (20), Otolaryngology (14), Ophthalmology (14), Neurology (14), Dermatology (14), Physical Medicine and Rehabilitation (14), Urology (14), Neurological Surgery (14), Radiation Oncology (10), Plastic Surgery (10), and Pediatrics/Child and Adolescent Psychiatry (10). The random selection of each strata was conducted by sorting the database of 5,143 medical students first by their chosen medical specialty, and then by gender. Next, a medical specialty was selected. For this example, we will select Pediatrics. It was noted that female Pediatricians were found between rows 3,374 and 3,911. An on-line random number generator based upon atmospheric noise (http://www.random.org) was accessed, which allowed for a minimum and maximum number to be entered (3,374 and 3,911 for female Pediatricians) and then a button was pushed to generate a true random number between the stated minimum and maximum range. For female Pediatricians, a total of 22 strata members were necessary, and this procedure was performed until 22

58

different female medical students who entered Pediatrics were randomly selected. It was noted that male pediatricians were found between rows 3,912 and 4,070. The minimum and maximum numbers were entered (3,912 and 4,070 for male Pediatricians) into the on-line random number generator, and then a button was pushed to generate a true random number between the stated minimum and maximum range. For male Pediatricians, a total of 22 strata members were necessary, and this procedure was performed until 22 different male medical students who entered Pediatrics were randomly selected. After randomly selecting equal numbers of female and male participants from a medical specialty, the researcher had randomly selected the entire sample for that strata. This procedure was duplicated for the remaining 21 medical specialties to generate the stratified random sample of 500 medical students. Measures The Medical Specialty Preference Inventory-Revised (MSPI-R) is an inventory that measures interest in 18 areas of medical practice and predicts entrance into 16 major medical specialties. The MSPI-R provides information to medical students to help them choose a medical specialty appropriate to their interests following graduation from medical school. Using the MSPI-R, medical students can match their interests to the daily tasks of physicians in 16 specialties. Early Versions of the MSPI The Medical Specialty Preference Inventory (MSPI) was originally developed in 1979 (Zimny, 1979). It was subsequently updated in 2000, acquired by the American Association of Medical Colleges (AAMC) in 2003, and released on the Careers in

59

Medicine (CiM) website in January, 2005 (Richard, 2010). The original MSPI consisted of 199 items describing medical tasks, which were answered by students using a seven-point desirability scale representing degrees of low (1, 2), moderate (3, 4, 5) and high (6, 7) preference for each item (Zimny, 1979). The stem for each item is “A practice in which I…” Two examples of items are “A practice in which I can make precise diagnoses,” and “A practice in which I discuss death and dying with patients.” The original MSPI provided scores on 40 factors, or areas of practice in medicine. Those 40 factors were used to calculate preference scores for six major medical specialties (Family Medicine, Internal Medicine, Pediatrics, Psychiatry, Obstetrics and Gynecology, and Surgery) (Zimny, 1979). The second edition, MSPI-2, released in 2000, utilized the 150 items to calculate 38 factors and six major medical specialties (Family Medicine, Internal Medicine, Pediatrics, Psychiatry, Obstetrics and Gynecology, and Surgery) (Zimny, 2002). Early research indicated support for the MSPI and MSPI-2’s validity and use with medical student populations. The MSPI was able to garner reliability for six medical specialties (Internal Medicine, Family Medicine, Obstetrics and Gynecology, Surgery, Psychiatry, and Pediatrics) in the .70 to .90 range (Richard, 2005). In addition, four different examinations of medical students taking the MSPI suggested a 51% level of predictive accuracy, which is greater than the conservative chance expectancy of hit rate 1 out of 6 or 17% (Zimny, 1979; Zimny, 1980; Zimny, 2002). Unfortunately, half of the 38 factor scales used in the MSPI and MSPI-2 were composed of only two items and another eleven scales consisted of three items (the remaining eight scales contained four

60

to five items per scale) (Richard, 2010). It is preferable to have a minimum of four items per scale, as a small number of items generally yields low reliability estimates (Sodano & Richard, 2009a). Therefore, current standards in factor analytic research would not support the existence of thirty-eight factors (Floyd & Widaman, 1995). Further, the original factors were grouped logically into five broad content themes with little to no empirical support (Richard, 2010). As such, the original MSPI would not be supported using today’s methodological standards (Richard, 2010). There is a growing call for the MSPI to provide broader coverage of the specialty options available in medicine (Borges & Savickas, 2002). Currently, applicants to United States residency training programs can choose from among 21 medical specialties (Richard, 2010). The MSPI and MSPI-2 offered medical specialty preference scores for only six of those specialties (Family Medicine, Internal Medicine, Obstetrics and Gynecology, Pediatrics, Psychiatry, and Surgery), which represented only 45% of the active physicians in the United States (Richard, 2010). Hence, medical students were on their own to determine how the 38 interest factors and six preference scores may or may not align with other medical specialties (Richard, 2010). This limited the potential value of the MSPI-2 in assisting students to make important decisions in their professional lives (Richard, 2010). MSPI-R The MSPI-R was developed to address the limitations listed above, and is the result of work performed using more advanced technology and updated statistical methodologies to yield a valid measure of medical student interests (Richard, 2010).

61

There are 150 items included in the MSPI-R; however, only 102 items are used to score the instrument (Richard, 2010). Of those, 88 items are used to score the 18 Medical Interest Scales, and 30 items are used to score the 16 Specialty Choice Probabilities (Richard, 2010). Sixteen of the items are scored in both the Medical Interest Scales and the Specialty Choice Probabilities (Richard, 2010). The remaining 48 items are not scored, and may be used in the future for possible replacement of items as needed to improve the predictive ability of the instrument and to support the development of new specialties (Richard, 2010). Two of the 150 items were revised in the MSPI-R to reflect current terminology in medicine (“use the results of a proctoscopic examination” was changed to “use the results of endoscopic examinations,” and “use the results of arteriograms” was changed to “use the results of vascular imaging”) (Richard, 2010). Research by Sodano and Richard (2009) noted that a confirmatory factor analysis did not support Zimny’s original 38 factors for the MSPI-R. Instead, their research suggested an 18 sub-scale model, which was created by conducting a factor analysis of all medical students who completed the MSPI-R between 2005 and 2008. The result was a homogeneous grouping of items that represent 18 Medical Interest Scales: Complex Problems, Comprehensive Care, Diagnostic Precision, Emergency-Critical Care, History Taking, Home Health Care, Immediate Results, Knowledge of Anatomical Structures, Knowledge of Organ Systems, Laboratory Results, Palliative Care, Patient Counseling, Prevention and Education, Procedural Care, Psychological Care, Reproductive Care, Social Context, and Technology in Medicine (Richard, 2010). In addition, the MSPI-R allows for calculating preferences for 16 medical specialties (Anesthesiology,

62

Dermatology, Emergency Medicine, Family Medicine, Internal Medicine, Neurology, Obstetrics and Gynecology, Orthopedic Surgery, Otolaryngology, Pathology, Pediatrics, Physical Medicine and Rehabilitation, Psychiatry, Radiology, Surgery, and Urology) to offer more accurate predictions of specialty choice, with the potential to add more specialties over time (Richard, 2010). Further improving the functionality of the MSPI-R, Porfeli, Richard, and Savickas (2010) researched an empirical measurement model (using discriminant function analysis to determine the test-taker’s pattern of responses to occupations later entered) in addition to the inductive measurement model (using confirmatory factor analysis to match a test-taker’s interests with the presumed interests of an occupational environment) explored by Sodano and Richard (2009). Their results suggested that discriminate function analysis achieved a 53.6% predictive hit rate by using only 30 of the 150 MSPI-R items. This improved upon Sodano and Richard’s (2009) predictive hit rate of 46.0% using 18 factors derived from confirmatory factor analysis (Porfeli, Richard, & Savickas, 2010). Based on the current research undertaken with the MSPI-R, medical students now receive scores based upon both empirical and inductive measures. Today, the MSPI-R is garnering better validity and reliability data than its predecessors. Cronbach’s alpha was calculated to determine reliabilities for each of the 18 Medical Interest Scales. Reliability coefficients ranged from a low of .77 for History Taking and Diagnostic Precision to .94 for Psychological Care, indicating good internal consistency ratings for all scales (Richard, 2010). Information about the reliability of the 16 Specialty Choice Probabilities is not currently available (Richard, 2010). The

63

predictive validity of the MSPI-R has been analyzed by calculating the percentage of successful matches between the chosen specialty and the 16 Specialty Choice Probabilities with an overall predictive hit rate of 52% (Richard, 2010). In terms of administration and scoring, the MSPI-R was adapted for the World Wide Web as part of the AAMC’s CiM Program in 2005, where it is now available to medical students in schools throughout the United States free of charge (Richard, 2005). To take the MSPI-R, a medical student selects one of seven scale points to indicate the degree of desirability for each item on the inventory; the next item is displayed until the MSPI-R is completed. Medical students instantaneously receive a report of results, including the 16 Medical Specialty Choice Probabilities along with the 18 Medical Interest Scales (Richard, 2010). For each of the 16 medical specialties, a percentage score is reported that indicates the likelihood that the student will enter into the specialty. The 16 medical specialty percentages combined total 100 percent and are presented in order from the highest to the lowest likelihood that the student would enter each of the 16 medical specialties (Richard, 2010). Students are instructed to select the two or three specialties with the highest probabilities to explore further. Next, students receive their Medical Interest Scale scores to identify their highest and lowest scoring interests in 18 areas of medical practice that are experienced in varying degrees in each medical specialty. The 18 areas involve knowledge and information, services and procedures, and types of problems that are important in understanding medical specialties (Richard, 2010). The scales can be used to help clarify and describe medical interests in more

64

detail, compare interest profiles with the profiles of various specialties, and explore other specialties not included in the MSPI-R (Richard, 2010). Data Collection This study is an ex post facto non-experimental quantitative study; meaning that the study involved data which existed prior to the study’s conception. As such, study participants were not contacted. The MSPI-R was taken by 5,143 medical students between the years 2005 to 2008 on the American Association of Medical Colleges’ (AAMC) Careers in Medicine (CiM) website. In addition to offering the inventory, the AAMC tracks medical students’ residency matches to measure medical students’ full development into the medical profession. The anonymity of study participants was maintained by the AAMC removing study participants’ names and replacing them with code numbers in the database used by the researcher. As such, the data contained unidentifiable demographic information, raw scores, scale scores, and residency matches for all 5,143 study participants. Research procedures complied with APA ethical guidelines and Kent State University’s Institutional Review Board guidelines (See Appendix A). Data Analysis Procedures This study controlled variance by being a non-experimental ex post facto quantitative study. Since the scores on the MSPI-R were collected at one time from the study subjects, only the psychometric scoring system changed, which allowed for a direct comparison. The only difference in the study was how the raw MSPI-R scores were utilized to determine a medical student’s preference for a medical specialty.

65 Career interest inventories commonly take the participant’s scores and compare them to a standardized norm based upon an occupational group, which was derived from a large subject sample. In Kuder’s person matching model, the participant’s scores on the career interest inventory are matched directly to the scores of others who have taken the same interest inventory. Further, the scores from an inventory can be matched on two different levels in Kuder’s person matching model: (a) the item level where the scores are matched person-to-person on each item separately and (b) the profile level where the scores from scales within the instrument are matched person-to-person for each scale. This study is concerned with analyzing the MSPI-R’s scores in three different ways (all 150 items on the inventory, all 18 Medical Interest Scales scored on the inventory, and the 30 items from the inventory that were used to calculate the 16 Medical Specialty Probabilities for standard scoring) to see how Kuder’s person matching method of scoring compares to the current norm based scoring method. All person matching analyses used Cronbach and Gleser’s (1953) difference squared (D2) values, a person matching statistic, to determine the linear distance of profile similarity. The D2 statistic is a simple and effective measure to compare the differences between individuals’ scores on the same inventory (Cronbach & Gleser, 1953). D2, the sum of squared Euclidian distances between self- and other-ratings of traits, reflects differences in elevation, scatter, and shape (Cronbach & Gleser, 1953). In addition, the D2 coefficient is preferred because it computes profile similarity in score levels, while correlational measures only demonstrate parallels in shape (Cronbach & Gleser, 1953). The D2 statistic is a descriptive statistic and as such does not include the

66

concepts of statistical power or statistical precision (Cronbach & Gleser, 1953). When using the D2 statistic, score 1 (from a test-taker) is subtracted from score 2 (an individual from a reference group) with the resulting difference being squared (Hartung et al., 2005). When the differences between the two scores are squared, the result becomes normally distributed (Cronbach & Gleser, 1953). In the D2 statistic, there is no upper limit on the distance between two scores (Cronbach & Gleser, 1953). However, the closer the D2 calculation is to 0, the closer the two individuals scored similarly on the inventory, which signifies a close person-to-person match (Cronbach & Gleser, 1953). The D2 coefficient (score 1 minus score 2 squared) was calculated in this study to represent the degree of profile similarity between one member of the stratified random sample who took the MSPI-R (score 1), and separately the scores for each member of the reference group who took the MSPI-R (score 2) to determine the predicted medical specialty for that member of the stratified random sample. The D2 coefficient was computed in this manner in three different ways for all 500 members of the stratified random sample to find the most accurate hit rates for person matching based upon the MSPI-R’s (a) 150 items, (b) 18 scales, and (c) 30 items.

67

68

The 150 Items In the first analysis, D2 values were calculated by subtracting the 150 MSPI-R item scores for each member of the stratified random sample from the 150 MSPI-R item scores of each individual in the reference group pool with the resulting differences being squared (Figure 1 Step 2A). The 150 squared differences comparing the two individuals were summed to obtain a final score. This allowed each member of the stratified random sample to be compared to all 5,142 members of the reference group. Scores from each of the 5,142 members of the reference group were then placed into rank order from the smallest score (indicating the lowest D2 value or closest match to the member of the stratified random sample) to the largest score (indicating the largest D2 value or most distant match from the member of the stratified random sample). This generated a list of scores ranking the 5,142 person matches for that member of the stratified random sample in descending order from the closest/top match to the most distant/differing match. Several types of documentations were made based upon the rank ordered list comparing them to the 5,142 members of the reference group. Those documentations are operationalized below. Recording singular hit rates for the top match. First, the lowest D2 value was considered the top match (Figure 1 Step 3A). Then, the lowest D2 value and that reference group member’s medical specialty (which became the predicted medical specialty for that member of the stratified random sample based upon the top match utilizing the 150 items) were recorded. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the predicted

69 specialty based upon the lowest D2 value. Lastly, it was documented if the predicted medical specialty matched the actual medical specialty for that member of the stratified random sample. This was performed 499 more times for all members of the stratified random sample. Recording singular hit rates for the top 5 matches. First, the five lowest D2 values were considered the top 5 matches (Figure 1 Step 4A). Then, a record was made of the D2 values for the top 5 matches and those five reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialty of the five reference group members. Lastly, if one of those five reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon a singular match within the top 5 matches utilizing the 150 items. This was performed 499 more times for all members of the stratified random sample. Recording singular hit rates for the top 10 matches. First, the 10 lowest D2 values were considered the top 10 matches (Figure 1 Step 5A). Then, a record was made of the D2 values for the top 10 matches and those 10 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 10 reference group members. Lastly, if one of those 10 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon a singular match within the top 10 matches

70

utilizing the 150 items. This was performed 499 more times for all members of the stratified random sample. Recording singular hit rates for the top 20 matches. First, the 20 lowest D2 values were considered the top 20 matches (Figure 1 Step 6A). Then, a record was made of the D2 values for the top 20 matches and those 20 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 20 reference group members. Lastly, if one of those 20 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon a singular match within the top 20 matches utilizing the 150 items. This was performed 499 more times for all members of the stratified random sample. Recording dominant hit rates for the top 5 matches. First, the five lowest D2 values for a member of the stratified random sample were documented (Figure 1 Step 7A). Then, a record was made of the D2 values for the top 5 matches and those five reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the five reference group members. Lastly, if a majority of those five reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon dominance in the top 5 matches utilizing the 150 items. This was performed 499 more times for all members of the stratified random sample.

71 Recording dominant hit rates for the top 10 matches. First, the 10 lowest D2 values for a member of the stratified random sample were documented (Figure 1 Step 8A). Then, a record was made of the D2 values for the top 10 matches and those 10 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 10 reference group members. Lastly, if a majority of those 10 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon dominance in the top 10 matches utilizing the 150 items. This was performed 499 more times for all members of the stratified random sample. Recording dominant hit rates for the top 20 matches. First, the 20 lowest D2 values for a member of the stratified random sample were documented (Figure 1 Step 9A). Then, a record was made of the D2 values for the top 20 matches and those 20 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 20 reference group members. Lastly, if a majority of those 20 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon dominance in the top 20 matches utilizing the 150 items. This was performed 499 more times for all members of the stratified random sample.

72

Recording cases where the medical specialty entered is dominant for a member of the stratified random sample across the top 1, 5, 10, and 20 matches. First, the recorded hit rates for the top match, dominant in the top 5, dominant in the top 10, and dominant in the top 20 utilizing the 150 items were examined for the 500 members of the stratified random sample (Figure 1 Step 10A). Then, only those members of the stratified random sample who had the medical specialty that they entered predicted as the top match while simultaneously dominant in the top 5, 10, and 20 utilizing the 150 items were identified and placed into a new file. For example, a member of the stratified random sample with a specialty of Psychiatry might have Psychiatry predicted as the top match, Psychiatry predicted as the dominant match in the top 5, Psychiatry predicted as the dominant match in the top 10, and Psychiatry predicted as the dominant match in the top 20. This allowed the researcher to gather valuable descriptive data about members of the stratified random sample who had the specialty that they entered be simultaneously dominant across the top 1, 5, 10 and 20 matches. Recording cases where the medical specialty entered for a member of the stratified random sample is different from the medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches. First, the recorded hit rates for the top match, dominant in the top 5, dominant in the top 10, and dominant in the top 20 utilizing the 150 items were examined for the 500 members of the stratified random sample (Figure 1 Step 11A). Then, only those members of the stratified random sample who had a medical specialty predicted as the top match while simultaneously dominant in the top 5, 10, and 20 utilizing the 150 items be different from the medical specialty that

73

they entered were identified and placed into a new file. For example, a member of the stratified random sample with a specialty of Urology might have Pathology predicted as the top match, Pathology predicted as the dominant match in the top 5, Pathology predicted as the dominant match in the top 10, and Pathology predicted as the dominant match in the top 20. This allowed the researcher to gather valuable descriptive data about members of the stratified random sample who had the specialty that they entered be different from the simultaneously dominant predicted medical specialty found across the top 1, 5, 10 and 20 matches. Recording cases where there is a different medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches. First, the recorded hit rates for the top match, dominant in the top 5, dominant in the top 10, and dominant in the top 20 utilizing the 150 items were examined for the 500 members of the stratified random sample (Figure 1 Step 12A). Then, only those members of the stratified random sample who had four different medical specialties predicted as dominant across the top 1, 5, 10, and 20 utilizing the 150 items that did not predict the medical specialty entered were identified and placed into a new file. For example, a member of the stratified random sample with a specialty of Radiation Oncology might have Pediatrics predicted as the top match, Surgery predicted as the dominant match in the top 5, Internal Medicine predicted as the dominant match in the top 10, and Family Medicine predicted as the dominant match in the top 20. This allowed the researcher to gather valuable descriptive data about members of the stratified random sample who had four different predicted medical

74

specialties across the top 1, 5, 10 and 20 where none of them predicted the correct medical specialty. The 18 Scales In the second analysis, D2 values were calculated by subtracting the 18 MSPI-R scale scores for each member of the stratified random sample from the 18 MSPI-R scale scores of each individual in the reference group pool with the resulting differences being squared (Figure 1 Step 2B). The 18 squared differences comparing the two individuals were summed to obtain a final score. This allowed each member of the stratified random sample to be compared to all 5,142 members of the reference group. Scores from each of the 5,142 members of the reference group were then placed into rank order from the smallest score (indicating the lowest D2 value or closest match to the member of the stratified random sample) to the largest score (indicating the largest D2 value or most distant match from the member of the stratified random sample). This generated a list of scores ranking the 5,142 person matches for that member of the stratified random sample in descending order from the closest/top match to the most distant/differing match. Several types of documentations were made based upon the rank ordered list comparing them to the 5,142 members of the reference group. Those documentations are operationalized below. Recording singular hit rates for the top match. First, the lowest D2 value was considered the top match (Figure 1 Step 3B). Then, the lowest D2 value and that reference group member’s medical specialty (which became the predicted medical specialty for that member of the stratified random sample based upon the top match

75

utilizing the 18 scales) were recorded. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the predicted specialty based upon the lowest D2 value. Lastly, it was documented if the predicted medical specialty matched the actual medical specialty for that member of the stratified random sample. This was performed 499 more times for all members of the stratified random sample. Recording singular hit rates for the top 5 matches. First, the five lowest D2 values were considered the top 5 matches (Figure 1 Step 4B). Then, a record was made of the D2 values for the top 5 matches and those five reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the five reference group members. Lastly, if one of those five reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon a singular match within the top 5 matches utilizing the 18 scales. This was performed 499 more times for all members of the stratified random sample. Recording singular hit rates for the top 10 matches. First, the 10 lowest D2 values were considered the top 10 matches (Figure 1 Step 5B). Then, a record was made of the D2 values for the top 10 matches and those 10 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 10 reference group members. Lastly, if one of those 10 reference group members had the same medical

76

specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon a singular match within the top 10 matches utilizing the 18 scales. This was performed 499 more times for all members of the stratified random sample. Recording singular hit rates for the top 20 matches. First, the 20 lowest D2 values were considered the top 20 matches (Figure 1 Step 6B). Then, a record was made of the D2 values for the top 20 matches and those 20 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 20 reference group members. Lastly, if one of those 20 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon a singular match within the top 20 matches utilizing the 18 scales. This was performed 499 more times for all members of the stratified random sample. Recording dominant hit rates for the top 5 matches. First, the five lowest D2 values for a member of the stratified random sample were documented (Figure 1 Step 7B). Then, a record was made of the D2 values for the top 5 matches and those five reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the five reference group members. Lastly, if a majority of those five reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon

77

dominance in the top 5 matches utilizing the 18 scales. This was performed 499 more times for all members of the stratified random sample. Recording dominant hit rates for the top 10 matches. First, the 10 lowest D2 values for a member of the stratified random sample were documented (Figure 1 Step 8B). Then, a record was made of the D2 values for the top 10 matches and those 10 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 10 reference group members. Lastly, if a majority of those 10 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon dominance in the top 10 matches utilizing the 18 scales. This was performed 499 more times for all members of the stratified random sample. Recording dominant hit rates for the top 20 matches. First, the 20 lowest D2 values for a member of the stratified random sample were documented (Figure 1 Step 9B). Then, a record was made of the D2 values for the top 20 matches and those 20 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 20 reference group members. Lastly, if a majority of those 20 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon dominance in the top 20 matches utilizing the 18 scales. This was performed 499 more times for all members of the stratified random sample.

78

Recording cases where the medical specialty entered is dominant for a member of the stratified random sample across the top 1, 5, 10, and 20 matches. First, the recorded hit rates for the top match, dominant in the top 5, dominant in the top 10, and dominant in the top 20 utilizing the 18 scales were examined for the 500 members of the stratified random sample (Figure 1 Step 10B). Then, only those members of the stratified random sample who had the medical specialty that they entered predicted as the top match while simultaneously dominant in the top 5, 10, and 20 utilizing the 18 scales were identified and placed into a new file. For example, a member of the stratified random sample with a specialty of Psychiatry might have Psychiatry predicted as the top match, Psychiatry predicted as the dominant match in the top 5, Psychiatry predicted as the dominant match in the top 10, and Psychiatry predicted as the dominant match in the top 20. This allowed the researcher to gather valuable descriptive data about members of the stratified random sample who had the specialty that they entered be simultaneously dominant across the top 1, 5, 10 and 20 matches. Recording cases where the medical specialty entered for a member of the stratified random sample is different from the medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches. First, the recorded hit rates for the top match, dominant in the top 5, dominant in the top 10, and dominant in the top 20 utilizing the 18 scales were examined for the 500 members of the stratified random sample (Figure 1 Step 11B). Then, only those members of the stratified random sample who had a medical specialty predicted as the top match while simultaneously dominant in the top 5, 10, and 20 utilizing the 18 scales be different from the medical specialty that

79

they entered were identified and placed into a new file. For example, a member of the stratified random sample with a specialty of Urology might have Pathology predicted as the top match, Pathology predicted as the dominant match in the top 5, Pathology predicted as the dominant match in the top 10, and Pathology predicted as the dominant match in the top 20. This allowed the researcher to gather valuable descriptive data about members of the stratified random sample who had the specialty that they entered be different from the simultaneously dominant predicted medical specialty found across the top 1, 5, 10 and 20 matches. Recording cases where there is a different medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches. First, the recorded hit rates for the top match, dominant in the top 5, dominant in the top 10, and dominant in the top 20 utilizing the 18 scales were examined for the 500 members of the stratified random sample (Figure 1 Step 12B). Then, only those members of the stratified random sample who had four different medical specialties predicted as dominant across the top 1, 5, 10, and 20 utilizing the 18 scales that did not predict the medical specialty entered were identified and placed into a new file. For example, a member of the stratified random sample with a specialty of Radiation Oncology might have Pediatrics predicted as the top match, Surgery predicted as the dominant match in the top 5, Internal Medicine predicted as the dominant match in the top 10, and Family Medicine predicted as the dominant match in the top 20. This allowed the researcher to gather valuable descriptive data about members of the stratified random sample who had four different predicted medical

80

specialties across the top 1, 5, 10 and 20 where none of them predicted the correct medical specialty. The 30 Items In the third analysis, D2 values were calculated by subtracting the 30 MSPI-R item scores for each member of the stratified random sample from the 30 MSPI-R item scores of each individual in the reference group pool with the resulting differences being squared (Figure 1 Step 2C). The 30 items selected from the 150 total items on the MSPI-R were used in the standard scoring procedure to calculate probability scores for the 16 medical specialties. This third analysis addressed the fractured foundationalism hypothesis where elements of standard scoring and person matching were combined to determine the hit rate obtained when utilizing elements from both methodologies. The 30 squared differences comparing the two individuals were summed to obtain a final score. This allowed each member of the stratified random sample to be compared to all 5,142 members of the reference group. Scores from each of the 5,142 members of the reference group were then placed into rank order from the smallest score (indicating the lowest D2 value or closest match to the member of the stratified random sample) to the largest score (indicating the largest D2 value or most distant match from the member of the stratified random sample). This generated a list of scores ranking the 5,142 person matches for that member of the stratified random sample in descending order from the closest/top match to the most distant/differing match. Several types of documentations were made based upon the rank ordered list comparing them to the 5,142 members of the reference group. Those documentations are operationalized below.

81 Recording singular hit rates for the top match. First, the lowest D2 value was considered the top match (Figure 1 Step 3C). Then, the lowest D2 value and that reference group member’s medical specialty (which became the predicted medical specialty for that member of the stratified random sample based upon the top match utilizing the 30 items) were recorded. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the predicted specialty based upon the lowest D2 value. Lastly, it was documented if the predicted medical specialty matched the actual medical specialty for that member of the stratified random sample. This was performed 499 more times for all members of the stratified random sample. Recording singular hit rates for the top 5 matches. First, the five lowest D2 values were considered the top 5 matches (Figure 1 Step 4C). Then, a record was made of the D2 values for the top 5 matches and those five reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the five reference group members. Lastly, if one of those five reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon a singular match within the top 5 matches utilizing the 30 items. This was performed 499 more times for all members of the stratified random sample. Recording singular hit rates for the top 10 matches. First, the 10 lowest D2 values were considered the top 10 matches (Figure 1 Step 5C). Then, a record was made

82 of the D2 values for the top 10 matches and those 10 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 10 reference group members. Lastly, if one of those 10 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon a singular match within the top 10 matches utilizing the 30 items. This was performed 499 more times for all members of the stratified random sample. Recording singular hit rates for the top 20 matches. First, the 20 lowest D2 values were considered the top 20 matches (Figure 1 Step 6C). Then, a record was made of the D2 values for the top 20 matches and those 20 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 20 reference group members. Lastly, if one of those 20 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon a singular match within the top 20 matches utilizing the 30 items. This was performed 499 more times for all members of the stratified random sample. Recording dominant hit rates for the top 5 matches. First, the five lowest D2 values for a member of the stratified random sample were documented (Figure 1 Step 7C). Then, a record was made of the D2 values for the top 5 matches and those five reference group members’ medical specialties. Next, the actual medical specialty for the

83

member of the stratified random sample was recorded. This was compared to the specialties of the five reference group members. Lastly, if a majority of those five reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon dominance in the top 5 matches utilizing the 30 items. This was performed 499 more times for all members of the stratified random sample. Recording dominant hit rates for the top 10 matches. First, the 10 lowest D2 values for a member of the stratified random sample were documented (Figure 1 Step 8C). Then, a record was made of the D2 values for the top 10 matches and those 10 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 10 reference group members. Lastly, if a majority of those 10 reference group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon dominance in the top 10 matches utilizing the 30 items. This was performed 499 more times for all members of the stratified random sample. Recording dominant hit rates for the top 20 matches. First, the 20 lowest D2 values for a member of the stratified random sample were documented (Figure 1 Step 9C). Then, a record was made of the D2 values for the top 20 matches and those 20 reference group members’ medical specialties. Next, the actual medical specialty for the member of the stratified random sample was recorded. This was compared to the specialties of the 20 reference group members. Lastly, if a majority of those 20 reference

84

group members had the same medical specialty as the member of the stratified random sample, it was recorded that an accurate prediction had been obtained based upon dominance in the top 20 matches utilizing the 30 items. This was performed 499 more times for all members of the stratified random sample. Recording cases where the medical specialty entered is dominant for a member of the stratified random sample across the top 1, 5, 10, and 20 matches. First, the recorded hit rates for the top match, dominant in the top 5, dominant in the top 10, and dominant in the top 20 utilizing the 30 items were examined for the 500 members of the stratified random sample (Figure 1 Step 10C). Then, only those members of the stratified random sample who had the medical specialty that they entered predicted as the top match while simultaneously dominant in the top 5, 10, and 20 utilizing the 30 items were identified and placed into a new file. For example, a member of the stratified random sample with a specialty of Psychiatry might have Psychiatry predicted as the top match, Psychiatry predicted as the dominant match in the top 5, Psychiatry predicted as the dominant match in the top 10, and Psychiatry predicted as the dominant match in the top 20. This allowed the researcher to gather valuable descriptive data about members of the stratified random sample who had the specialty that they entered be simultaneously dominant across the top 1, 5, 10 and 20 matches. Recording cases where the medical specialty entered for a member of the stratified random sample is different from the medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches. First, the recorded hit rates for the top match, dominant in the top 5, dominant in the top 10, and dominant in the top 20

85

utilizing the 30 items were examined for the 500 members of the stratified random sample (Figure 1 Step 11C). Then, only those members of the stratified random sample who had a medical specialty predicted as the top match while simultaneously dominant in the top 5, 10, and 20 utilizing the 30 items be different from the medical specialty that they entered were identified and placed into a new file. For example, a member of the stratified random sample with a specialty of Urology might have Pathology predicted as the top match, Pathology predicted as the dominant match in the top 5, Pathology predicted as the dominant match in the top 10, and Pathology predicted as the dominant match in the top 20. This allowed the researcher to gather valuable descriptive data about members of the stratified random sample who had the specialty that they entered be different from the consistently dominant predicted medical specialty found across the top 1, 5, 10 and 20 matches. Recording cases where there is a different medical specialty predicted as dominant across the top 1, 5, 10, and 20 matches. First, the recorded hit rates for the top match, dominant in the top 5, dominant in the top 10, and dominant in the top 20 utilizing the 30 items were examined for the 500 members of the stratified random sample (Figure 1 Step 12C). Then, only those members of the stratified random sample who had four different medical specialties predicted as dominant across the top 1, 5, 10, and 20 utilizing the 30 items that did not predict the medical specialty entered were identified and placed into a new file. For example, a member of the stratified random sample with a specialty of Radiation Oncology might have Pediatrics predicted as the top match, Surgery predicted as the dominant match in the top 5, Internal Medicine predicted

86

as the dominant match in the top 10, and Family Medicine predicted as the dominant match in the top 20. This allowed the researcher to gather valuable descriptive data about members of the stratified random sample who had four different predicted medical specialties across the top 1, 5, 10 and 20 where none of them predicted the correct medical specialty. Standard Scoring Hit Rates Specialty choice probabilities for the 16 medical specialties were generated for each member of the stratified random sample by using beta weights (which were derived from multinomial logistic regression analysis) with 30 out of 150 MSPI-R items (Figure 1 Step 2D). The highest probability score was documented as the first match-predicted specialty (also known as the top match) based upon standard scoring (Figure 1 Step 3D). The second-highest probability score was documented as the second match predicted specialty based upon standard scoring (Figure 1 Step 4D). The third-highest probability score was documented as the third match predicted specialty based upon standard scoring (Figure 1 Step 5D). The fourth-highest probability score was documented as the fourth match predicted specialty based upon standard scoring (Figure 1 Step 6D). The fifth-highest probability score was documented as the fifth match predicted specialty based upon standard scoring (Figure 1 Step 7D). This was performed 499 more times for all members of the stratified random sample. Standard Scoring Kappa Coefficients The kappa coefficient can determine the amount of agreement between two raters categorizing nominal data (Cohen, 1960). Kappa coefficient values can range from zero

87

when agreement is at the level expected by chance to 1.0 when there is perfect agreement beyond chance. A kappa value of less than .20 represents poor agreement; between .21 and .40 represents fair agreement; between .41 and .60 represents moderate agreement; between .61 and .80 represents good agreement; and between .81 and 1.0 represents very good agreement beyond chance (Landis & Koch, 1977). If the agreement is less than chance, kappa coefficient values can be negative. In this analysis, the researcher wanted to determine the amount of agreement between the observed standard scoring hit rates for members of the random sample versus the expected hit rates published in the MSPI-R manual. First, the researcher totaled the number of observed accurate first, second, third, fourth, fifth, and combined five probabilities calculated utilizing the standard scoring method (Figure 1 Step 8D). Then, the researcher located in the MSPI-R manual the expected hit rates for the first, second, third, fourth, fifth, and combined five probabilities. Next, the kappa coefficient was calculated. Finally, the results were recorded along with the assigned narrative definition. Kappa Coefficients for the Top Match Reporting simple hit rates may be misleading as an accurate prediction may take place simply by chance. When general hit rates are reported, therefore, the percentages include chance agreement. The kappa coefficient can signify the agreement observed between two ratings beyond chance. Coefficient kappa was calculated to determine the interrater agreement between the predicted and the actual medical specialty selection beyond chance for the top match for each of the four scoring methods (Cohen, 1960).

88

Kappa coefficient values can range from zero when agreement is at the level expected by chance to 1.0 when there is perfect agreement beyond chance. A kappa value of less than .20 represents poor agreement; between .21 and .40 represents fair agreement; between .41 and .60 represents moderate agreement; between .61 and .80 represents good agreement; and between .81 and 1.0 represents very good agreement beyond chance (Landis & Koch, 1977). If the agreement is less than chance, kappa coefficient values can be negative. First, the researcher recorded the predicted medical specialty calculated utilizing the standard scoring method and person matching via the 150 items, the 18 scales, and the 30 items for the stratified random sample (Figure 1 Step 13A-D). Then, the researcher recorded the actual medical specialty of all members of the stratified random sample. Next, the kappa coefficient was calculated. Finally, the results were recorded along with the associated narrative definition. Chance Expectancy Hit Rates Chance expectancy hit rates were calculated to determine the expected hit rates that would be achieved by chance alone for each specialty. This measure serves as a second means to determine if the four psychometric scoring methodologies performed better than what we would expect by chance for the top match. First, the researcher divided the number of members of the stratified random sample in a specialty (44 in Pediatrics) by the total number in the stratified random sample (500) and recorded the chance expectancy hit rate (44/500 =.09 for Pediatrics) for each of the 22 medical specialties (Figure 1 Step 14A-D). Then, the researcher compared the top match hit rate

89

for standard scoring and person matching calculated utilizing the 150 items, the 18 scales, and the 30 items to the chance expectancy rate. Limitations and Delimitations The results of this study are bound by the study’s limitations and delimitations. The study’s limitations include a sample that is derived from medical students who voluntarily took the MSPI-R between the years 2005 and 2008. It is impossible to require every medical student in the United States to take the MSPI-R, and as such the sample may be made up of a specific type of medical student that is not representative of all medical students. Therefore, this study is not a random sample. Further, the ex post facto study cannot directly control for the inclusion of minority students. Additionally, only medical students who attended a medical college that is part of the AAMC were able to access the MSPI-R and therefore participate in the study. Consequently, the results may not pertain to medical students attending medical schools outside of the United States. The study’s delimitation includes comparing only the predictive hit rates between the MSPI-R’s traditional scoring method and Kuder’s person matching method and does not examine the vast differences in the reports generated by standard scoring (which includes 18 Medical Interest Scales and 16 Medical Specialty Probabilities) versus the person matching report (which includes the medical specialty and the narrative biography of the closest matches). The study does not attempt to examine the differences in information provided to participants taking the MSPI-R to see if the report generated by each of the two methods would be more meaningful than the other. This question is too

90

problematic for this study to research because there is no criterion database that includes biographical data already in existence to run Kuder’s complete person matching protocol for medical students. It is the task of this study to determine if the AAMC would benefit by investing time and resources to provide students with Kuder’s full person matching model to further assist medical specialty decision making for medical students. Expected Results The expected result of this study was to find that predictive hit rates increase for medical students taking the MSPI-R when the instrument is scored utilizing person matching. Chapter Summary This chapter detailed the methodology used to investigate predictive hit rates of the Medical Specialty Preference Inventory-Revised utilizing different psychometric scoring methodologies. This chapter explained the methodology used to collect and study the data, and described the participants, measure, data collection procedures, data analyses, delimitations, and expected results. The participants section described the 5,143 medical students who have taken the MSPI-R and have been matched to a medical residency. The measures section described the MSPI-R. The data collection procedures section described how the researcher gathered and coded the data. The data analysis procedures section described the D2 statistic guiding the study and how the researcher analyzed the data using the D2 statistic three different ways. In addition, this section described the standard scoring methodology, along with the necessity of calculating kappa coefficients and chance expectancy rates. The limitations and delimitations section

91

noted characteristics that limited the scope of the study as determined by conscious exclusionary and inclusionary decisions. Finally, the chapter concluded with the expected results. This chapter defined the foundation for the actual study itself. The following chapter presents the results of the study to either accept or reject the null hypothesis.

CHAPTER III RESULTS The current study investigated (a) how to best perform the person matching psychometric scoring methodology and (b) the comparison of predictive hit rates of the Medical Specialty Preference Inventory-Revised (MSPI-R) utilizing two different psychometric scoring methodologies. This chapter presents results of the descriptive and inferential analyses. Descriptive measures include hit rates, mean scores, standard deviations, increments of change, and low and high top match scores. Inferential analyses examined kappa coefficients for standard scoring to determine the amount of agreement between expected and actual hit rates, kappa coefficients for the top match, and chance expectancy hit rates. Descriptive Analyses Comparison of Person Matching Singular Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender Table 1 displays singular hit rates for person matching based on the calculations of the 150 items, the 18 scales, and the 30 items for the 500 members of the random sample and separately for the 250 females and 250 males. These singular hit rates were documented for the four criteria of (a) the top match representing the specialty entered by the medical student, (b) at least one match in the top 5 matches representing the specialty entered by the medical student, (c) at least one match in the top 10 matches representing the specialty entered by the medical student, and (d) at least one match in the top 20 matches representing the specialty entered by the medical student. 92

93

When identifying the specialty entered with the top match for the 500 in the random sample, the 30 items was the most accurate with 115 (23%) correct matches, the 150 items came in second with 111 (22%) correct matches, and the 18 scales came in third with 90 (18%) correct matches. When identifying the specialty entered with at least one match in the top 5, the 150 items and the 30 items tied with 251 (50%) correct matches and the 18 scales came in second with 218 (44%) correct matches. When identifying the specialty entered with at least one match in the top 10, the 30 items was the most accurate with 315 (63%) correct matches, the 150 items came in second with 304 (61%) correct matches, and the 18 scales came in third with 298 (60%) correct matches. When identifying the specialty entered with at least one match in the top 20, the 30 items was the most accurate with 369 (74%) correct matches, the 150 items came in second with 359 (72%) correct matches, and the 18 scales came in third with 354 (71%) correct matches. Overall for this random sample of 500 medical students, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates for the top match and at least one correct match in the top 5, 10, and 20 obtaining 15 more hits than the 150 items and 75 more hits than the 18 scales. When identifying the specialty entered with the top match for the 250 females in the random sample, the 150 items and the 30 items tied as the most accurate with 56 (22%) correct matches and the 18 scales came in second with 42 (17%) correct matches. When identifying the specialty entered with at least one match in the top 5, the 150 items was the most accurate with 123 (49%) correct matches, the 30 items came in second with 122 (49%) correct matches, and the 18 scales came in third with 102 (41%) correct

94

matches. When identifying the specialty entered with at least one match in the top 10, the 30 items was the most accurate with 155 (62%) correct matches, the 150 items came in second with 149 (60%) correct matches, and the 18 scales came in third with 145 (58%) correct matches. When identifying the specialty entered with at least one match in the top 20, the 30 items was the most accurate with 182 (73%) correct matches, the 150 items came in second with 177 (71%) correct matches, and the 18 scales came in third with 176 (70%) correct matches. Overall, for the 250 females in the random sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates for the top match and at least one correct match in the top 5, 10, and 20 obtaining 10 more hits than the 150 items and 50 more hits than the 18 scales. When identifying the specialty entered with the top match for the 250 males in the random sample, the 30 items was the most accurate with 59 (24%) correct matches, the 150 items came in second with 55 (22%) correct matches, and the 18 scales came in third with 48 (19%) correct matches. When identifying the specialty entered with at least one match in the top 5, the 30 items was the most accurate with 129 (52%) correct matches, the 150 items came in second with 128 (51%) correct matches, and the 18 scales came in third with 116 (46%) correct matches. When identifying the specialty entered with at least one match in the top 10, the 30 items was the most accurate with 160 (64%) correct matches, the 150 items came in second with 155 (62%) correct matches, and the 18 scales came in third with 153 (61%) correct matches. When identifying the specialty entered with at least one match in the top 20, the 30 items was the most accurate with 187 (75%) correct matches, the 150 items came in second with 182 (73%) correct matches, and the

95

18 scales came in third with 178 (71%) correct matches. Overall for the 250 males in the random sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates for the top match and at least one correct match in the top 5, 10, and 20 obtaining 15 more hits than the 150 items and 40 more hits than the 18 scales. When comparing the hit rates of the 250 females and the 250 males, scores suggest that females received lower hit rates than males. The most gender balanced performance was obtained when calculating the top match based upon the 150 items where females obtained one additional hit when compared to males. The least balanced performance was obtained when identifying the specialty entered with at least one match in the top 5 based upon the 18 scales where females scored 14 fewer hits when compared to males. When utilizing the 150 items, females had the closest hit rates scoring 15 fewer hits when compared to males. When utilizing the 30 items, females had the second closest hit rates scoring 20 fewer hits when compared to males. When calculating person matching utilizing the 18 scales, females had the lowest hit rates and scored 30 fewer hits when compared to males. Data suggest that calculations based upon the 150 items allows for more even hit rates between males and females.

96

Table 1 Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Criteria Calculation n Hit Rate Overall Top Match 150 111/500 22% 18 90/500 18% 30 115/500 23% Top 5 150 251/500 50% 18 218/500 44% 30 251/500 50% Top 10 150 304/500 61% 18 298/500 60% 30 315/500 63% Top 20 150 359/500 72% 18 354/500 71% 30 369/500 74% Females Top Match 150 56/250 22% 18 42/250 17% 30 56/250 22% Top 5 150 123/250 49% 18 102/250 41% 30 122/250 49% Top 10 150 149/250 60% 18 145/250 58% 30 155/250 62% Top 20 150 177/250 71% 18 176/250 70% 30 182/250 73% Males Top Match 150 55/250 22% 18 48/250 19% 30 59/250 24% Top 5 150 128/250 51% 18 116/250 46% 30 129/250 52% Top 10 150 155/250 62% 18 153/250 61% 30 160/250 64% Top 20 150 182/250 73% 18 178/250 71% 30 187/250 75%

97

Comparison of Person Matching Singular Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender for the 12 Groups With Over 100 in the Sample Table 2 displays singular hit rates for person matching for the 12 groups with over 100 in the sample based on the calculations of the 150 items, the 18 scales, and the 30 items for the total sample and by gender. The 12 medical specialties with a total number in the reference group greater than 100 comprised 372 medical students from the random sample and included Internal Medicine (56), Pediatrics (44), Emergency Medicine (40), Family Medicine (40), Obstetrics/Gynecology (30), Surgery (30), Anesthesiology (24), Psychiatry (24), Orthopedic Surgery (24), Radiology (20), Pathology (20), and Internal Medicine Pediatrics (20). Hit rates were documented for the four criteria of (a) the top match representing the specialty entered by the medical student, (b) at least one match in the top 5 matches representing the specialty entered by the medical student, (c) at least one match in the top 10 matches representing the specialty entered by the medical student, and (d) at least one match in the top 20 matches representing the specialty entered by the medical student. For the 12 groups with over 100 in the sample when identifying the top match, the 30 items was the most accurate with 103 (28%) correct matches, the 150 items came in second with 102 (27%) correct matches, and the 18 scales came in third with 86 (23%) correct matches. When identifying the specialty entered with at least one match in the top 5, the 30 items was the most accurate with 230 (62%) correct matches, the 150 items came in second with 229 (62%) correct matches, and the 18 scales came in third with 204

98

(55%) correct matches. When identifying the specialty entered with at least one match in the top 10, the 30 items was the most accurate with 280 (75%) correct matches, the 150 items came in second with 277 (74%) correct matches, and the 18 scales came in third with 272 (73%) correct matches. When identifying the specialty entered with at least one match in the top 20, the 150 items was the most accurate with 320 (86%) correct matches, the 30 items came in second with 318 (85%) correct matches, and the 18 scales came in third with 316 (85%) correct matches. Overall, for medical students in the 12 groups with over 100 in the sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates for the top match and at least one correct match in the top 5, 10, and 20 obtaining three more hits than the 150 items and 53 more hits than the 18 scales. For the 186 females in the 12 groups with over 100 in the sample when identifying the top match, the 30 items was the most accurate with 53 (28%) correct matches, the 150 items came in second with 52 (28%) correct matches, and the 18 scales came in third with 40 (22%) correct matches. When identifying the specialty entered with at least one match in the top 5, the 150 items was the most accurate with 115 (62%) correct matches, the 30 items came in second with 112 (60%) correct matches, and the 18 scales came in third with 95 (51%) correct matches. When identifying the specialty entered with at least one match in the top 10, the 30 items was the most accurate with 139 (75%) correct matches, the 150 items came in second with 137 (74%) correct matches, and the 18 scales came in third with 131 (70%) correct matches. When identifying the specialty entered with at least one match in the top 20, the 150 items and the 30 items

99

were the most accurate with 160 (86%) correct matches and the 18 scales came in second with 157 (84%) correct matches. Overall, for the 186 females in the 12 groups with over 100 in the sample, calculating person matching utilizing the 150 items and the 30 items appears to provide the most accurate hit rates for the top match and at least one correct match in the top 5, 10, and 20 obtaining 41 more hits than the 18 scales. For the 186 males in the 12 groups with over 100 in the sample when identifying the top match, the 150 items and the 30 items tied as the most accurate with 50 (27%) correct matches and the 18 scales came in second with 46 (25%) correct matches. When identifying the specialty entered with at least one match in the top 5, the 30 items was the most accurate with 118 (63%) correct matches, the 150 items came in second with 114 (61%) correct matches, and the 18 scales came in third with 109 (59%) correct matches. When identifying the specialty entered with at least one match in the top 10, the 18 scales and the 30 items tied with 141 (76%) correct matches and the 150 items came in second with 140 (75%) correct matches. When identifying the specialty entered with at least one match in the top 20, the 150 items was the most accurate with 160 (86%) correct matches, the 18 scales came in second with 159 (85%) correct matches, and the 30 items came in third with 158 (85%) correct matches. Overall, for the 186 males in the 12 groups with over 100 in the sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates for the top match and at least one correct match in the top 5, 10, and 20 obtaining three more hits than the 150 items and 12 more hits than the 18 scales.

100

When comparing the hit rates of females and males in the 12 groups with over 100 in the sample, scores suggest that females received lower hit rates than males. The most balanced performance was obtained when calculating least one correct match in the top 20 based upon the 150 items where females and males scored equally. The lowest performance was obtained when identifying the specialty entered with at least one match in the top 5 based upon the 18 scales where females scored 14 fewer hits when compared to males. When utilizing the 150 items, females had the closest hit rates scoring at the same rate when compared to males. When utilizing the 30 items, females had the second closest hit rates scoring three fewer hits when compared to males. When calculated utilizing the 18 scales, females had the lowest hit rates scoring 32 fewer hits when compared to males. Data suggest that calculations based upon the 150 items allows for more even hit rates between males and females. Comparing table 1 with table 2. When comparing the hit rates obtained for all 500 medical students in the sample in Table 1 to the hit rates of the 372 medical students in the 12 groups with over 100 in the sample in Table 2, the hit rates for the 12 groups with over 100 in the sample are higher no matter how they were calculated (the 150 items, the 18 scales, and the 30 items). For the 12 groups with over 100 in the sample, there is an average observed increase of 11% in hit rates across all four criteria when calculated utilizing the 150 items and the 18 scales and an average observed increase of 10% in hit rates across all four criteria when calculated utilizing the 30 items. Increases can be seen in the hit rates for the four criteria (top match and at least one match in the top 5, 10, and 20). When averaging hit rates calculating the top match

101

utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 5% in hit rates was observed for the 12 groups with over 100 in the sample. When averaging hit rates calculated based upon at least one match in the top 5 utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 12% in hit rates was observed. When averaging hit rates calculated based upon at least one match in the top 10 utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 13% in hit rates was observed. When averaging hit rates calculated based upon at least one match in the top 20 utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 13% in hit rates was observed. Data suggest that medical students in the 12 groups with over 100 in the sample receive an increase in hit rate accuracy over the total sample of 500 medical students, which includes those from smaller specialty groups. When comparing the hit rates obtained for all 250 female medical students in the sample in Table 1 to the hit rates of the 186 female medical students in the 12 groups with over 100 in the sample in Table 2, the hit rates for the 12 groups with over 100 in the sample are higher no matter how they were calculated (the 150 items, the 18 scales, and the 30 items). For the 12 groups with over 100 in the sample, there is an average observed increase of 11% in hit rates across all four criteria when calculated utilizing the 150 items and an average observed increase of 10% in hit rates across all four criteria when calculated utilizing the 18 scales and the 30 items. Increases can be seen in the hit rates for the four criteria (top match and at least one match in the top 5, 10, and 20). When averaging hit rates for females calculating the top match utilizing the 150 items, the 18 scales, and the 30 items, an average increase of

102

5% in hit rates was observed for the females in the 12 groups with over 100 in the sample. When averaging hit rates for females calculated based upon at least one match in the top 5 utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 12% in hit rates was observed. When averaging hit rates for females calculated based upon at least one match in the top 10 utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 13% in hit rates was observed. When averaging hit rates for females calculated based upon at least one match in the top 20 utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 14% in hit rates was observed. Data suggest that female medical students in the 12 groups with over 100 in the sample receive an increase in hit rate accuracy over the total sample of 250 female medical students, which includes those from smaller specialty groups. When comparing the hit rates obtained for all 250 male medical students in the sample in Table 1 to the hit rates of the 186 male medical students in the 12 groups with over 100 in the sample in Table 2, the hit rates for the 12 groups with over 100 in the sample are higher no matter how they were calculated (the 150 items, the 18 scales, and the 30 items). For the 12 groups with over 100 in the sample, there is an average observed increase of 10% in hit rates across all four criteria when calculated utilizing the 150 items, an average observed increase of 11% in hit rates across all four criteria when calculated utilizing the 18 scales, and an average observed increase of 9% in hit rates across all four criteria when calculated utilizing the 30 items. Increases can be seen in the hit rates for the four criteria (top match and at least one match in the top 5, 10, and 20). When averaging hit rates for males calculating the

103

top match utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 5% in hit rates was observed for the males in the 12 groups with over 100 in the sample. When averaging hit rates for males calculated based upon at least one match in the top 5 utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 11% in hit rates was observed. When averaging hit rates for males calculated based upon at least one match in the top 10 utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 13% in hit rates was observed. When averaging hit rates for males calculated based upon at least one match in the top 20 utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 12% in hit rates was observed. Data suggest that male medical students in the 12 groups with over 100 in the sample receive an increase in hit rate accuracy over the total sample of 250 male medical students, which includes those from smaller specialty groups. For both sexes, choosing a medical specialty in the 12 groups with over 100 in the sample garnered an increase in hit rates no matter how person matching was calculated. Females obtained a 3% increase when compared to males for the top match, even hit rates when compared to males for at least one match in the top 5, a 1% decrease when compared to males for at least one match in the top 10, and a 5% increase when compared to males for at least one match in the top 20. Females obtained a 7% increase when compared to males for the 150 items, a 7% decrease when compared to males for the 18 scales, and a 7% increase when compared to males for the 30 items. The data suggests that, overall, females in the 12 groups with over 100 in the sample receive an increase in hit rate accuracy when compared to males.

104

Table 2 Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender for the 12 Groups With Over 100 in the Sample Criteria Calculation n Hit Rate Overall Top Match 150 102/372 27% 18 86/372 23% 30 103/372 28% Top 5 150 229/372 62% 18 204/372 55% 30 230/372 62% Top 10 150 277/372 74% 18 272/372 73% 30 280/372 75% Top 20 150 320/372 86% 18 316/372 85% 30 318/372 85% Female Top Match 150 52/186 28% 18 40/186 22% 30 53/186 28% Top 5 150 115/186 62% 18 95/186 51% 30 112/186 60% Top 10 150 137/186 74% 18 131/186 70% 30 139/186 75% Top 20 150 160/186 86% 18 157/186 84% 30 160/186 86% Male Top Match 150 50/186 27% 18 46/186 25% 30 50/186 27% Top 5 150 114/186 61% 18 109/186 59% 30 118/186 63% Top 10 150 140/186 75% 18 141/186 76% 30 141/186 76% Top 20 150 160/186 86% 18 159/186 85% 30 158/186 85%

105

Comparison of Person Matching Singular Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender for the 10 Groups With Under 100 in the Sample Table 3 displays hit rates for person matching for the 10 groups with under 100 in the sample based on the calculations of the 150 items, the 18 scales, and the 30 items for the total sample and by gender. The 10 medical specialties with a total number in the reference group less than 100 comprised 128 medical students from the random sample and included: Otolaryngology (14), Ophthalmology (14), Neurology (14), Dermatology (14), Physical Medicine and Rehabilitation (14), Urology (14), Neurological Surgery (14), Radiation Oncology (10), Plastic Surgery (10), and Pediatrics/Child and Adolescent Psychiatry (10). Hit rates were documented for the four criteria of (a) the top match representing the specialty entered by the medical student, (b) at least one match in the top 5 matches representing the specialty entered by the medical student, (c) at least one match in the top 10 matches representing the specialty entered by the medical student, and (d) at least one match in the top 20 matches representing the specialty entered by the medical student. For the 10 groups with under 100 in the sample when identifying the top match, the 30 items was the most accurate with 12 (9%) correct matches, the 150 items came in second with 9 (7%) correct matches, and the 18 scales came in third with 4 (3%) correct matches. When identifying the specialty entered with at least one match in the top 5, the 150 items was the most accurate with 22 (17%) correct matches, the 30 items came in second with 21 (16%) correct matches, and the 18 scales came in third with 14 (11%)

106

correct matches. When identifying the specialty entered with at least one match in the top 10, the 30 items was the most accurate with 35 (27%) correct matches, the 150 items came in second with 27 (21%) correct matches, and the 18 scales came in third with 26 (20%) correct matches. When identifying the specialty entered with at least one match in the top 20, the 30 items was the most accurate with 51 (40%) correct matches, the 150 items came in second with 39 (30%) correct matches, and the 18 scales came in third with 38 (30%) correct matches. For medical students in the 10 groups with under 100 in the sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates for the top match and at least one correct match in the top 5, 10, and 20 obtaining 22 more hits than the 150 items and 29 more hits than the 18 scales. For the 64 females in the 10 groups with under 100 in the sample when identifying the top match, the 150 items was the most accurate with 4 (6%) correct matches, the 30 items came in second with 3 (5%) correct matches, and the 18 scales came in third with 2 (3%) correct matches. When identifying the specialty entered with at least one match in the top 5, the 30 items was the most accurate with 10 (16%) correct matches, the 150 items came in second with 8 (13%) correct matches, and the 18 scales came in third with 7 (11%) correct matches. When identifying the specialty entered with at least one match in the top 10, the 30 items was the most accurate with 16 (25%) correct matches, the 18 scales came in second with 14 (22%) correct matches, and the 150 items came in third with 12 (19%) correct matches. When identifying the specialty entered with at least one match in the top 20, the 30 items was the most accurate with 22 (34%) correct matches, the 18 scales came in second with 19 (30%) correct matches, and the

107

150 items came in third with 17 (27%) correct matches. Overall, for the 64 females in the 10 groups with under 100 in the sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates for the top match and at least one correct match in the top 5, 10, and 20 obtaining nine more hits than the 18 scales and 10 more hits than the 150 items. For the 64 males in the 10 groups with under 100 in the sample when identifying the top match, the 30 items was the most accurate with 9 (14%) correct matches, the 150 items came in second with 5 (8%) correct matches, and the 18 scales came in third with 2 (3%) correct matches. When identifying the specialty entered with at least one match in the top 5, the 150 items was the most accurate with 14 (22%) correct matches, the 30 items came in second with 11 (17%) correct matches, and the 18 scales came in third with 7 (11%) correct matches. When identifying the specialty entered with at least one match in the top 10, the 30 items was the most accurate with 19 (30%) correct matches, the 150 items came in second with 15 (23%) correct matches, and the 18 scales came in third with 12 (19%) correct matches. When identifying the specialty entered with at least one match in the top 20, the 30 items was the most accurate with 29 (45%) correct matches, the 150 items came in second with 22 (34%) correct matches, and the 18 scales came in third with 19 (30%) correct matches. Overall, for the 64 males in the 10 groups with under 100 in the sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates for the top match and at least one correct match in the top 5, 10, and 20 obtaining 12 more hits than the 150 items and 28 more hits than the 18 scales.

108

When comparing the hit rates of females and males in the 10 groups with under 100 in the sample, scores suggest that females received lower hit rates than males. The most balanced performance was obtained when calculating the top match, at least one correct match in the top 5, and at least one correct match in the top 20, based upon the 18 scales where hit rates were the same for both genders. The largest difference in performance was obtained when identifying the specialty entered with at least one match in the top 20 based upon the 30 items where females scored seven fewer hits when compared to males. When utilizing the 18 scales, females had the closest hit rates scoring two fewer hits when compared to males. When utilizing the 150 items, females had the second closest hit rates scoring 15 fewer hits when compared to males. When utilizing the 30 items, females had the lowest hit rates scoring 17 fewer hits when compared to males. Data suggest that calculations based upon the 18 scales allows for more even hit rates between males and females. Comparing table 1 with table 3. When comparing the hit rates obtained for all 500 medical students in the sample in Table 1 to the hit rates of the 128 medical students in the 10 groups with under 100 in the sample in Table 2, the hit rates for the 128 medical students in the 10 groups with under 100 in the sample are lower no matter how they were calculated (the 150 items, the 18 scales, and the 30 items). For the 10 groups with under 100 in the sample, there is an average observed decrease of 33% in hit rates across all four criteria when calculated utilizing the 150 items, an average observed decrease of 32% in hit rates across all four criteria when calculated utilizing the 18 scales, and an

109

average observed decrease of 30% in hit rates across all four criteria when calculated utilizing the 30 items. Decreases can be seen in the hit rates for the four criteria (top match and at least one match in the top 5, 10, and 20). When averaging hit rates calculating the top match utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 15% in hit rates was observed for the 10 groups with under 100 in the sample. When averaging hit rates calculated based upon at least one match in the top 5 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 33% in hit rates was observed. When averaging hit rates calculated based upon at least one match in the top 10 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 39% in hit rates was observed. When averaging hit rates calculated based upon at least one match in the top 20 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 39% in hit rates was observed. Data suggest that medical students in the 10 groups with under 100 in the sample receive a decrease in hit rate accuracy over the total sample of 500 medical students, which includes those from larger specialty groups. When comparing the hit rates obtained for all 250 female medical students in the sample in Table 1 to the hit rates of the 64 female medical students in the 10 groups with under 100 in the sample in Table 3, the hit rates for the 10 groups with under 100 in the sample are lower no matter how they were calculated (the 150 items, the 18 scales, and the 30 items). For the 10 groups with under 100 in the sample, there is an average observed decrease of 34% in hit rates across all four criteria when calculated utilizing the 150 items, an average observed decrease of 30% in hit rates across all four criteria when

110

calculated utilizing the 18 scales, and an average observed decrease of 32% in hit rates across all four criteria when calculated utilizing the 30 items. Decreases can be seen in the hit rates for the four criteria (top match and at least one match in the top 5, 10, and 20). When averaging hit rates for females calculating the top match utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 16% in hit rates was observed for the females in the 10 groups with under 100 in the sample. When averaging hit rates for females calculated based upon at least one match in the top 5 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 33% in hit rates was observed. When averaging hit rates for females calculated based upon at least one match in the top 10 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 38% in hit rates was observed. When averaging hit rates for females calculated based upon at least one match in the top 20 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 41% in hit rates was observed. Data suggest that female medical students in the 10 groups with under 100 in the sample receive a decrease in hit rate accuracy over the total sample of 250 female medical students, which includes those from larger specialty groups. When comparing the hit rates obtained for all 250 male medical students in the sample in Table 1 to the hit rates of the 186 male medical students in the 10 groups with under 100 in the sample in Table 3, the hit rates for the 10 groups with under 100 in the sample are lower no matter how they were calculated (the 150 items, the 18 scales, and the 30 items). For the 10 groups with under 100 in the sample, there is an average observed decrease of 30% in hit rates across all four criteria when calculated utilizing the

111

150 items, an average observed decrease of 34% in hit rates across all four criteria when calculated utilizing the 18 scales and an average observed decrease of 27% in hit rates across all four criteria when calculated utilizing the 30 items. Decreases can be seen in the hit rates for the four criteria (top match and at least one match in the top 5, 10, and 20). When averaging hit rates for males calculating the top match utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 13% in hit rates was observed for the males in the 10 groups with under 100 in the sample. When averaging hit rates for males calculated based upon at least one match in the top 5 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 33% in hit rates was observed. When averaging hit rates for males calculated based upon at least one match in the top 10 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 38% in hit rates was observed. When averaging hit rates for males calculated based upon at least one match in the top 20 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 37% in hit rates was observed. Data suggest that male medical students in the 10 groups with under 100 in the sample receive a decrease in hit rate accuracy over the total sample of 250 male medical students, which includes those from larger specialty groups. For both sexes, choosing a medical specialty in the 10 groups with under 100 in the sample garnered a decrease in hit rates no matter how person matching was calculated. Females obtained a 7% decrease when compared to males for the top match, a 14% decrease when compared to males for at least one match in the top 5, a 1% increase when compared to males for at least one match in the top 10, and a 13%

112

decrease when compared to males for at least one match in the top 20. Females were observed to score a 16% decrease when compared to males for the 150 items, a 4% increase when compared to males for the 18 scales, and a 21% decrease when compared to males for the 30 items. Data suggest that, overall, females in the 10 groups with under 100 in the sample receive a decrease in hit rate accuracy when compared to males.

113

Table 3 Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender for the 10 Groups With Under 100 in the Sample Criteria Calculation n Hit Rate Overall Top Match 150 9/128 7% 18 4/128 3% 30 12/128 9% Top 5 150 22/128 17% 18 14/128 11% 30 21/128 16% Top 10 150 27/128 21% 18 26/128 20% 30 35/128 27% Top 20 150 39/128 30% 18 38/128 30% 30 51/128 40% Female Top Match 150 4/64 6% 18 2/64 3% 30 3/64 5% Top 5 150 8/64 13% 18 7/64 11% 30 10/64 16% Top 10 150 12/64 19% 18 14/64 22% 30 16/64 25% Top 20 150 17/64 27% 18 19/64 30% 30 22/64 34% Male Top Match 150 5/64 8% 18 2/64 3% 30 9/64 14% Top 5 150 14/64 22% 18 7/64 11% 30 11/64 17% Top 10 150 15/64 23% 18 12/64 19% 30 19/64 30% Top 20 150 22/64 34% 18 19/64 30% 30 29/64 45%

114

Comparison of Person Matching Dominant Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender Table 4 displays dominant hit rates for person matching based on the calculations of the 150 items, the 18 scales, and the 30 items for the 500 members of the random sample and separately for the 250 females and 250 males. These dominant hit rates were documented for the three criteria of (a) a majority of the matches in the top 5 representing the specialty entered by the medical student, (b) a majority of the matches in the top 10 representing the specialty entered by the medical student, and (c) a majority of the matches in the top 20 representing the specialty entered by the medical student. When identifying a majority of the matches in the top 5 representing the specialty entered by the medical student, the 30 items was the most accurate with 135 (27%) dominant matches, the 150 items came in second with 132 (26%) dominant matches, and the 18 scales came in third with 103 (21%) dominant matches. When identifying a majority of the matches in the top 10 representing the specialty entered by the medical student, the 30 items was the most accurate with 170 (34%) dominant matches, the 150 items came in second with 160 (32%) dominant matches, and the 18 scales came in third with 137 (27%) dominant matches. When identifying a majority of the matches in the top 20 representing the specialty entered by the medical student, the 150 items was the most accurate with 156 (31%) dominant matches, the 30 items came in second with 155 (31%) dominant matches, and the 18 scales came in third with 133 (27%) dominant matches. When identifying a majority of the matches in the top 5, 10, and 20 that represent the specialty entered by the random sample of 500 medical students, calculating

115

person matching utilizing the 30 items appears to provide the most accurate hit rates by obtaining 12 more hits than the 150 items and 87 more hits than the 18 scales. When identifying a majority of the matches in the top 5 representing the specialty that the 250 females in the random sample entered, the 150 items was the most accurate with 67 (27%) correct matches, the 30 items came in second with 65 (26%) correct matches, and the 18 scales came in third with 50 (20%) correct matches. When identifying a majority of the matches in the top 10 representing the specialty that the 250 females entered, the 30 items was the most accurate with 84 (34%) correct matches, the 150 items came in second with 76 (30%) correct matches, and the 18 scales came in third with 70 (28%) correct matches. When identifying a majority of the matches in the top 20 representing the specialty that the 250 females entered, the 150 items was the most accurate with 78 (31%) correct matches, the 30 items came in second with 75 (30%) correct matches, and the 18 scales came in third with 67 (27%) correct matches. When identifying a majority of the matches in the top 5, 10, and 20 that represent the specialty entered by the 250 females in the random sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates by obtaining three more hits than the 150 items and 37 more hits than the 18 scales. When identifying a majority of the matches in the top 5 representing the specialty that the 250 males in the random sample entered, the 30 items was the most accurate with 70 (28%) correct matches, the 150 items came in second with 65 (26%) correct matches, and the 18 scales came in third with 53 (21%) correct matches. When identifying a majority of the matches in the top 10 representing the specialty that the 250 males

116

entered, the 30 items was the most accurate with 86 (34%) correct matches, the 150 items came in second with 84 (34%) correct matches, and the 18 scales came in third with 67 (27%) correct matches. When identifying a majority of the matches in the top 20 representing the specialty that the 250 males entered, the 30 items was the most accurate with 80 (32%) correct matches, the 150 items came in second with 78 (31%) correct matches, and the 18 scales came in third with 66 (26%) correct matches. Overall, when identifying a majority of the matches in the top 5, 10, and 20 that represent the specialty entered by the 250 males in the random sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates by obtaining nine more hits than the 150 items and 50 more hits than the 18 scales. When comparing the dominant hit rates of females and males, scores suggest that females overall received lower hit rates than males. The most balanced performance was obtained when identifying a majority of the matches in the top 20 representing the specialty entered by the medical student based upon the 150 items where females and males obtained the same hit rate. The least balanced performance was obtained when identifying a majority of the matches in the top 10 representing the specialty entered by the medical student based upon the 150 items where females scored eight fewer hits when compared to males. When utilizing the 18 scales, females had the closest hit rates compared to males scoring 1 additional hit. When utilizing the 150 items, females had the second closest hit rates compared to males scoring six fewer hits. When utilizing the 30 items, females had the third closest hit rates compared to males scoring 12 fewer hits.

117

Data suggest that calculations based upon the 18 scales allows for more even dominant hit rates between males and females.

118

Table 4 Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10, or 20; Overall and by Gender Criteria Calculation n Hit Rate Overall Top 5 150 132/500 26% 18 103/500 21% 30 135/500 27% Top 10 150 160/500 32% 18 137/500 27% 30 170/500 34% Top 20 150 156/500 31% 18 133/500 27% 30 155/500 31% Female Top 5 150 67/250 27% 18 50/250 20% 30 65/250 26% Top 10 150 76/250 30% 18 70/250 28% 30 84/250 34% Top 20 150 78/250 31% 18 67/250 27% 30 75/250 30% Male Top 5 150 65/250 26% 18 53/250 21% 30 70/250 28% Top 10 150 84/250 34% 18 67/250 27% 30 86/250 34% Top 20 150 78/250 31% 18 66/250 26% 30 80/250 32%

119

Comparison of Person Matching Dominant Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender for the 12 Groups With Over 100 in the Sample Table 5 displays dominant hit rates for person matching for the 12 groups with over 100 in the sample based on the calculations of the 150 items, the 18 scales, and the 30 items for the total sample and by gender. The 12 medical specialties with a total number in the reference group greater than 100 comprised 372 medical students from the random sample and included Internal Medicine (56), Pediatrics (44), Emergency Medicine (40), Family Medicine (40), Obstetrics/Gynecology (30), Surgery (30), Anesthesiology (24), Psychiatry (24), Orthopedic Surgery (24), Radiology (20), Pathology (20), and Internal Medicine Pediatrics (20). These dominant hit rates were documented for the three criteria of (a) a majority of the matches in the top 5 representing the specialty entered by the medical student, (b) a majority of the matches in the top 10 representing the specialty entered by the medical student, and (c) a majority of the matches in the top 20 representing the specialty entered by the medical student. When identifying a majority of the matches in the top 5 representing the specialty entered by the medical student for the 12 groups with over 100 in the sample, the 30 items was the most accurate with 130 (35%) dominant matches, the 150 items came in second with 127 (34%) dominant matches, and the 18 scales came in third with 101 (27%) dominant matches. When identifying a majority of the matches in the top 10 representing the specialty entered by the medical student, the 30 items was the most accurate with 162 (44%) dominant matches, the 150 items came in second with 155

120

(42%) dominant matches, and the 18 scales came in third with 133 (36%) dominant matches. When identifying a majority of the matches in the top 20 representing the specialty entered by the medical student, the 150 items was the most accurate with 156 (42%) dominant matches, the 30 items came in second with 148 (40%) dominant matches, and the 18 scales came in third with 132 (35%) dominant matches. When identifying a majority of the matches in the top 5, 10, and 20 that represent the specialty that the 372 medical students entered in the 12 groups with over 100 in the sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates by obtaining two more hits than the 150 items and 74 more hits than the 18 scales. When identifying a majority of the matches in the top 5 representing the specialty entered by the 186 females in the 12 groups with over 100 in the sample, the 150 items was the most accurate with 65 (35%) correct matches, the 30 items came in second with 62 (33%) correct matches, and the 18 scales came in third with 49 (26%) correct matches. When identifying a majority of the matches in the top 10 representing the specialty that the 186 females entered, the 30 items was the most accurate with 80 (43%) correct matches, the 150 items came in second with 74 (40%) correct matches, and the 18 scales came in third with 67 (36%) correct matches. When identifying a majority of the matches in the top 20 representing the specialty that the 186 females entered, the 150 items was the most accurate with 78 (42%) correct matches, the 30 items came in second with 72 (39%) correct matches, and the 18 scales came in third with 66 (35%) correct matches. Overall, when identifying a majority of the matches in the top 5, 10, and 20 that

121

represent the specialty that the 186 females entered for the 12 groups with over 100 in the sample, calculating person matching utilizing the 150 items appears to provide the most accurate hit rates by obtaining three more hits than the 30 items and 35 more hits than the 18 scales. When identifying a majority of the matches in the top 5 representing the specialty entered by the 186 males in the 12 groups with over 100 in the sample, the 30 items was the most accurate with 68 (37%) correct matches, the 150 items came in second with 62 (33%) correct matches, and the 18 scales came in third with 52 (28%) correct matches. When identifying a majority of the matches in the top 10 representing the specialty that the 186 males entered, the 30 items was the most accurate with 82 (44%) correct matches, the 150 items came in second with 81 (44%) correct matches, and the 18 scales came in third with 66 (35%) correct matches. When identifying a majority of the matches in the top 20 representing the specialty that the 186 males entered, the 150 items was the most accurate with 78 (42%) correct matches, the 30 items came in second with 76 (41%) correct matches, and the 18 scales came in third with 66 (35%) correct matches. Overall, when identifying a majority of the matches in the top 5, 10, and 20 that represent the specialty that the 186 males entered for the 12 groups with over 100 in the sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates by obtaining five more hits than the 150 items and 42 more hits than the 18 scales. When comparing the dominant hit rates of females and males in the 12 groups with over 100 in the sample, scores suggest that females overall received lower hit rates

122

than males. The most balanced performance was obtained when identifying a majority of the matches in the top 20 representing the specialty entered by the medical student based upon the 150 items and the 18 scales where females and males obtained equal hit rates. The least balanced performance was obtained when identifying a majority of the matches in the top 5 representing the specialty entered by the medical student based upon the 30 items where females scored six fewer hits when compared to males. When utilizing the 18 scales, females had the closest hit rates compared to males scoring two fewer hits. When utilizing the 150 items, females had the second closest hit rates compared to males scoring four fewer hits. When utilizing the 30 items, females had the third closest hit rates compared to males scoring 12 fewer hits. Data suggest that calculations based upon the 18 scales allows for more even dominant hit rates between males and females in the 12 groups with over 100 in the sample. Comparing table 4 with table 5. When comparing the dominant hit rates obtained for all 500 medical students in the sample in Table 4 to the dominant hit rates of the 372 medical students in the 12 groups with over 100 in the sample, the hit rates for the 12 groups with over 100 in the sample are higher no matter how they were calculated (the 150 items, the 18 scales, and the 30 items). When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student utilizing the 150 items for the 12 groups with over 100 in the sample, there is an observed average increase of 10% in hit rates when compared to the total random sample of 500. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student utilizing the 18 scales for the 12 groups with

123

over 100 in the sample, there is an observed average increase of 8% in hit rates when compared to the total random sample of 500. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student utilizing the 30 items for the 12 groups with over 100 in the sample, there is an observed average increase of 9% in hit rates when compared to the total random sample of 500. Increases can be seen in the hit rates for the three criteria (a majority of the matches in the top 5 representing the specialty entered by the medical student, a majority of the matches in the top 10 representing the specialty entered by the medical student, and a majority of the matches in the top 20 representing the specialty entered by the medical student). When identifying a majority of the matches in the top 5 representing the specialty entered by the medical student utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 7% in hit rates was observed for the 12 groups with over 100 in the sample. When scoring dominance in the top 10 utilizing the 150 items, the 18 scales, and the 30 items; an average increase of 10% in hit rates was observed for the 12 groups with over 100 in the sample. When scoring dominance in the top 20 utilizing the 150 items, the 18 scales, and the 30 items; an average increase of 9% in hit rates was observed for the 12 groups with over 100 in the sample. Data suggest that medical students in the 12 groups with over 100 in the sample receive an increase in dominant hit rate accuracy over medical students in the random sample of 500, which includes those from smaller specialty groups. When comparing the dominant hit rates obtained for all 250 females in the sample in Table 4 to the dominant hit rates of the 186 females in the 12 groups with over 100 in

124

the sample in Table 5, the hit rates for the 12 groups with over 100 in the sample are higher no matter how they were calculated (the 150 items, the 18 scales, and the 30 items). When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 186 females in the 12 groups with over 100 in the sample utilizing the 150 items, there is an observed average increase of 10% in hit rates when compared to the total random sample of 250 females. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 186 females in the 12 groups with over 100 in the sample utilizing the 18 scales, there is an observed average increase of 7% in hit rates when compared to the total random sample of 250 females. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 186 females in the 12 groups with over 100 in the sample utilizing the 30 items, there is an observed average increase of 8% in hit rates when compared to the total random sample of 250 females. Increases can be seen in the hit rates for the three criteria (a majority of the matches in the top 5 representing the specialty entered by the medical student, a majority of the matches in the top 10 representing the specialty entered by the medical student, and a majority of the matches in the top 20 representing the specialty entered by the medical student). When identifying a majority of the matches in the top 5 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 7% in hit rates was observed for females in the 12 groups with over 100 in the sample. When identifying a majority of the matches in the top 10 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average

125

increase of 9% in hit rates was observed. When identifying a majority of the matches in the top 20 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 9% in hit rates was observed. When comparing the dominant hit rates obtained for all 250 males in the sample in Table 4 to the dominant hit rates of the 186 males in the 12 groups with over 100 in the sample in Table 5, the hit rates for the 12 groups with over 100 in the sample are higher no matter how they were calculated (the 150 items, the 18 scales, and the 30 items). When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 186 males in the 12 groups with over 100 in the sample utilizing the 150 items, there is an observed average increase of 9% in hit rates when compared to the total random sample of 250 males. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 186 males in the 12 groups with over 100 in the sample utilizing the 18 scales, there is an observed average increase of 8% in hit rates when compared to the total random sample of 250 males. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 186 males in the 12 groups with over 100 in the sample utilizing the 30 items, there is an observed average increase of 9% in hit rates when compared to the total random sample of 250 males. Increases can be seen in the hit rates for the three criteria (a majority of the matches in the top 5 representing the specialty entered by the medical student, a majority of the matches in the top 10 representing the specialty entered by the medical student, and a majority of the matches in the top 20 representing the specialty entered by the medical

126

student). When identifying a majority of the matches in the top 5 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 8% in hit rates was observed for males in the 12 groups with over 100 in the sample. When identifying a majority of the matches in the top 10 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 9% in hit rates was observed. When identifying a majority of the matches in the top 20 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average increase of 10% in hit rates was observed. For both sexes, choosing a medical specialty in the 12 groups with over 100 in the sample garnered an increase in dominant hit rates no matter how person matching was calculated. Females obtained a 2% decrease when compared to males for dominance in the top 5 and a 1% decrease when compared to males for dominance in the top 10 and 20. Females obtained a 1% increase when compared to males for the 150 items, a 2% decrease when compared to males for the 18 scales, and a 3% decrease when compared to males for the 30 items. Data suggest that females in the 12 groups with over 100 in the sample generally obtained lower increases in hit rate accuracy when compared to males.

127

Table 5 Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10, or 20; Overall and by Gender for the 12 Groups With Over 100 in the Sample Criteria Calculation n Hit Rate Overall Top 5 150 127/372 34% 18 101/372 27% 30 130/372 35% Top 10 150 155/372 42% 18 133/372 36% 30 162/372 44% Top 20 150 156/372 42% 18 132/372 35% 30 148/372 40% Female Top 5 150 65/186 35% 18 49/186 26% 30 62/186 33% Top 10 150 74/186 40% 18 67/186 36% 30 80/186 43% Top 20 150 78/186 42% 18 66/186 35% 30 72/186 39% Male Top 5 150 62/186 33% 18 52/186 28% 30 68/186 37% Top 10 150 81/186 44% 18 66/186 35% 30 82/186 44% Top 20 150 78/186 42% 18 66/186 35% 30 76/186 41%

128

Comparison of Person Matching Dominant Hit Rates Utilizing the 150 Items, the 18 Scales, and the 30 Items Overall and by Gender for the 10 Groups With Under 100 in the Sample Table 6 displays dominant hit rates for person matching for the 10 groups with under 100 in the sample based on the calculations of the 150 items, the 18 scales, and the 30 items for the total sample and by gender. The 10 medical specialties with a total number in the reference group under 100 comprised 128 medical students from the random sample and included Otolaryngology (14), Ophthalmology (14), Neurology (14), Dermatology (14), Physical Medicine and Rehabilitation (14), Urology (14), Neurological Surgery (14), Radiation Oncology (10), Plastic Surgery (10), and Pediatrics/Child and Adolescent Psychiatry (10). These dominant hit rates were documented for the three criteria of (a) a majority of the matches in the top 5 representing the specialty entered by the medical student, (b) a majority of the matches in the top 10 representing the specialty entered by the medical student, and (c) a majority of the matches in the top 20 representing the specialty entered by the medical student. When identifying a majority of the matches in the top 5 representing the specialty entered by the medical student for the 10 groups with under 100 in the sample, the 150 items and the 30 items were the most accurate with 5 (4%) dominant matches and the 18 scales came in second with 2 (2%) dominant matches. When identifying a majority of the matches in the top 10 representing the specialty entered by the medical student, the 30 items was the most accurate with 8 (6%) dominant matches and the 150 items and the 18 scales came in second with 4 (3%) dominant matches. When identifying a majority of

129

the matches in the top 20 representing the specialty entered by the medical student, the 30 items was the most accurate with 7 (5%) dominant matches, the 18 scales came in second with 1 (1%) dominant matches, and the 150 items came in third with 0 (0%) dominant matches. Overall, when identifying a majority of the matches in the top 5, 10, and 20 that represent the specialty that the 128 medical students entered in the 10 groups with under 100 in the sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates by obtaining 11 more hits than the 150 items and 13 more hits than the 18 scales. When identifying a majority of the matches in the top 5 representing the specialty entered by the 64 females in the 10 groups with under 100 in the sample, the 30 items was the most accurate with 3 (5%) correct matches, the 150 items came in second with 2 (3%) correct matches, and the 18 scales came in third with 1 (2%) correct match. When identifying a majority of the matches in the top 10 representing the specialty that the 64 females entered, the 30 items was the most accurate with 4 (6%) correct matches, the 18 scales came in second with 3 (5%) correct matches, and the 150 items came in third with 1 (2%) correct matches. When identifying a majority of the matches in the top 20 representing the specialty that the 64 females entered, the 30 items was the most accurate with 3 (5%) correct matches, the 18 scales came in second with 1 (2%) correct matches, and the 150 items came in third with 0 (0%) correct matches. Overall, when identifying a majority of the matches in the top 5, 10, and 20 that represent the specialty that the 64 females entered for the 10 groups with under 100 in the sample, calculating person

130

matching utilizing the 30 items appears to provide the most accurate hit rates by obtaining five more hits than the 18 scales and seven more hits than the 150 items. When identifying a majority of the matches in the top 5 representing the specialty entered by the 64 males in the 10 groups with under 100 in the sample, the 150 items was the most accurate with 3 (5%) correct matches, the 30 items came in second with 2 (3%) correct matches, and the 18 scales came in third with 1 (2%) correct match. When identifying a majority of the matches in the top 10 representing the specialty that the 64 males entered, the 30 items was the most accurate with 4 (6%) correct matches, the 150 items came in second with 3 (5%) correct matches, and the 18 scales came in third with 1 (2%) correct matches. When identifying a majority of the matches in the top 20 representing the specialty that the 64 males entered, the 30 items was the most accurate with 4 (6%) correct matches and the 150 items and the 18 scales came in second with 0 (0%) correct matches. Overall, when identifying a majority of the matches in the top 5, 10, and 20 that represent the specialty that the 64 males entered for the 10 groups with under 100 in the sample, calculating person matching utilizing the 30 items appears to provide the most accurate hit rates by obtaining four more hits than the 150 items and eight more hits than the 18 scales. When comparing the dominant hit rates of females and males in the 10 groups with under 100 in the sample, scores suggest females often received lower hit rates than males. The most balanced performance was obtained when identifying a majority of the matches in the top 5 representing the specialty entered by the medical student based upon the 18 scales items, when identifying a majority of the matches in the top 10 representing

131

the specialty entered by the medical student based upon the 30 items, and when identifying a majority of the matches in the top 20 representing the specialty entered by the medical student based upon the 150 items where females and males obtained the same rates. The least balanced performance was obtained when identifying a majority of the matches in the top 10 representing the specialty entered by the medical student based upon the 150 items where females scored two fewer hits when compared to males and when identifying a majority of the matches in the top 10 representing the specialty entered by the medical student based upon the 18 scales where females scored two additional hits when compared to males. When utilizing the 30 items, females had the closest hit rates compared to males scoring equally. When utilizing the 18 scales, females had the second closest hit rates compared to males scoring three additional hits. When utilizing the 150 items, females had the second closest hit rates compared to males scoring three fewer hits. Data suggest that calculations based upon the 30 items allows for more even dominant hit rates between males and females in the 10 groups with under 100 in the sample. Comparing table 4 with table 6. When comparing the dominant hit rates obtained for all 500 medical students in the sample in Table 4 to the dominant hit rates of the 128 medical students in the 10 groups with under 100 in the sample in Table 6, the hit rates for the 10 groups with under 100 in the sample are lower no matter how they were calculated (the 150 items, the 18 scales, or the 30 items). When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student utilizing the 150 items for the 10 groups with under 100 in the sample, there is an

132

observed average decrease of 27% in hit rates when compared to the total random sample of 500. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student utilizing the 18 scales for the 10 groups with under 100 in the sample, there is an observed average decrease of 23% in hit rates when compared to the total random sample of 500. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student utilizing the 30 items for the 10 groups with under 100 in the sample, there is an observed average decrease of 26% in hit rates when compared to the total random sample of 500. Decreases can be seen in the hit rates for the three criteria (a majority of the matches in the top 5 representing the specialty entered by the medical student, a majority of the matches in the top 10 representing the specialty entered by the medical student, and a majority of the matches in the top 20 representing the specialty entered by the medical student). When identifying a majority of the matches in the top 5 representing the specialty entered by the medical student utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 21% in hit rates was observed for the 10 groups with under 100 in the sample. When scoring dominance in the top 10 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 27% in hit rates was observed. When scoring dominance in the top 20 utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 28% in hit rates was observed. Data suggest that medical students in the 10 groups with under 100 in the sample receive a decrease in dominant hit rate accuracy over medical students in the random sample of 500, which includes those from larger specialty groups.

133

When comparing the dominant hit rates obtained for all 250 females in the sample in Table 4 to the dominant hit rates of the 64 females in the 10 groups with under 100 in the sample in Table 6, the hit rates for the 10 groups with under 100 in the sample are lower no matter how they were calculated (the 150 items, the 18 scales, or the 30 items). When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 64 females in the 10 groups with under 100 in the sample utilizing the 150 items, there is an observed average decrease of 28% in hit rates when compared to the total random sample of 250 females. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 64 females in the 10 groups with under 100 in the sample utilizing the 18 scales, there is an observed average decrease of 22% in hit rates when compared to the total random sample of 250 females. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 64 females in the 10 groups with under 100 in the sample utilizing the 30 items, there is an observed average decrease of 25% in hit rates when compared to the total random sample of 250 females. Decreases can be seen in the hit rates for the three criteria (a majority of the matches in the top 5 representing the specialty entered by the medical student, a majority of the matches in the top 10 representing the specialty entered by the medical student, and a majority of the matches in the top 20 representing the specialty entered by the medical student). When identifying a majority of the matches in the top 5 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 21% in hit rates was observed for females in the 10 groups with under 100 in

134

the sample. When identifying a majority of the matches in the top 10 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 26% in hit rates was observed. When identifying a majority of the matches in the top 20 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 27% in hit rates was observed. When comparing the dominant hit rates obtained for all 250 males in the sample in Table 4 to the dominant hit rates of the 64 males in the 10 groups with under 100 in the sample in Table 6, the hit rates for the 10 groups with under 100 in the sample are lower no matter how they were calculated (the 150 items, the 18 scales, or the 30 items). When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 64 males in the 10 groups with under 100 in the sample utilizing the 150 items, there is an observed average decrease of 27% in hit rates when compared to the total random sample of 250 males. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 64 males in the 10 groups with under 100 in the sample utilizing the 18 scales, there is an observed average decrease of 23% in hit rates when compared to the total random sample of 250 males. When identifying a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the 64 males in the 10 groups with under 100 in the sample utilizing the 30 items, there is an observed average decrease of 26% in hit rates when compared to the total random sample of 250 males. Decreases can be seen in the hit rates for the three criteria (a majority of the matches in the top 5 representing the specialty entered by the medical student, a majority

135

of the matches in the top 10 representing the specialty entered by the medical student, and a majority of the matches in the top 20 representing the specialty entered by the medical student). When identifying a majority of the matches in the top 5 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 22% in hit rates was observed for males in the 10 groups with under 100 in the sample. When identifying a majority of the matches in the top 10 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 27% in hit rates was observed. When identifying a majority of the matches in the top 20 representing the specialty entered utilizing the 150 items, the 18 scales, and the 30 items, an average decrease of 28% in hit rates was observed. For both sexes, choosing a medical specialty in the 10 groups with under 100 in the sample garnered a decrease in dominant hit rates no matter how person matching was calculated. Females obtained a 5% increase when compared to males for dominance in the top 5, a 3% increase when compared to males for dominance in the top 10, and a 2% increase when compared to males for dominance in the top 20. Females obtained a 1% increase when compared to males for the 150 items. Females obtained a 1% decrease when compared to males for the 18 scales. Females obtained a 2% increase when compared to males for the 30 items. Data suggest that females in the 10 groups with under 100 in the sample generally scored increases in hit rate accuracy when compared to males.

136

Table 6 Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10, or 20; Overall and by Gender for the 10 Groups With Under 100 in the Sample Criteria Calculation n Hit Rate Overall Top 5 150 5/128 4% 18 2/128 2% 30 5/128 4% Top 10 150 4/128 3% 18 4/128 3% 30 8/128 6% Top 20 150 0/128 0% 18 1/128 1% 30 7/128 5% Female Top 5 150 2/64 3% 18 1/64 2% 30 3/64 5% Top 10 150 1/64 2% 18 3/64 5% 30 4/64 6% Top 20 150 0/64 0% 18 1/64 2% 30 3/64 5% Male Top 5 150 3/64 5% 18 1/64 2% 30 2/64 3% Top 10 150 3/64 5% 18 1/64 2% 30 4/64 6% Top 20 150 0/64 0% 18 0/64 0% 30 4/64 6%

137

Comparison of Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is the Only One Dominant Across the Top 1, 5, 10, and 20 Table 7 displays consistency in dominant hit rates for person matching based on the calculations of the 150 items, the 18 scales, and the 30 items for the total random sample and by gender for medical students who had the same specialty predicted for the top 1, 5, 10, and 20, which matched the specialty entered by the medical student. When identifying consistency in a medical specialty that was entered by the medical student across all calculations of dominant hit rates (top 1, 5, 10, and 20), the 150 items obtained 62 (12%) hits, the 30 items obtained 60 (12%) hits, and the 18 scales obtained 43 (9%) hits. Data suggest that more medical students were observed to enter the same specialty that was dominant across all calculations (top 1, 5, 10, and 20) when utilizing the 150 items. When identifying the same specialty as dominant across all calculations (top 1, 5, 10, and 20) for the 250 females in the random sample who entered the predicted specialty, the 150 items obtained 31 (12%) hits, the 30 items obtained 29 (12%) hits, and the 18 scales obtained 18 (7%) hits. When identifying the same specialty as dominant across all calculations (top 1, 5, 10, and 20) for the 250 males in the random sample who entered the predicted specialty, the 150 items and the 30 items obtained 31 (12%) hits and the 18 scales obtained 25 (10%) hits. Females received equal hit rates to males when calculated using the 150 items, three fewer hits when calculated using the 18 scales, and seven fewer hits when calculated using the 30 items.

138

Table 7 Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Specialty Entered Is the Only One Dominant Across the Top 1, 5, 10, and 20; Overall and by Gender Calculation n Hit Rate Overall 150 62/500 12% 18 43/500 9% 30 60/500 12% Female 150 31/250 12% 18 18/250 7% 30 29/250 12% Male 150 31/250 12% 18 25/250 10% 30 31/250 12%

139

Comparison of Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Student Had Only One Specialty Dominant Across the Top 1, 5, 10, and 20, but Did Not Enter the Medical Specialty Table 8 displays consistency in dominant hit rates for person matching based on the calculations of the 150 items, the 18 scales, and the 30 items for the total random sample and by gender for medical students who had the same specialty predicted for the top 1, 5, 10, and 20, but did not enter that medical specialty. When identifying consistency across all calculations of dominant hit rates (top 1, 5, 10, and 20) where the predicted medical specialty was not entered by the medical student, the 30 items obtained 76 (15%) hits, the 150 items obtained 74 (15%) hits, and the 18 scales obtained 60 (12%) hits. Data suggest that the 30 items calculates higher rates of medical students observed to enter a different specialty than was observed as dominant across all calculations (top 1, 5, 10, and 20). When identifying the 250 females in the random sample who had the same medical specialty dominant across the top 1, 5, 10, and 20, but entered a different medical specialty, the 30 items obtained 42 (17%) hits, the 150 items obtained 37 (15%) hits, and the 18 scales obtained 35 (14%) hits. When identifying consistency for the 250 males in the random sample who had the same medical specialty dominant across the top 1, 5, 10, and 20, but entered a different medical specialty, the 150 items obtained 37 (15%) hits, the 30 items obtained 34 (14%) hits, and the 18 scales obtained 25 (10%) hits. For medical students in the random sample who had the same medical specialty dominant across the top 1, 5, 10, and 20, but entered a different medical specialty, females received

140

equal hit rates to males when calculated using the 150 items, eight fewer hits when calculated using the 30 items, and 10 fewer hits when calculated using the 18 scales.

141

Table 8 Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items With the Same Medical Specialty Dominant Across the Top 1, 5, 10, and 20, but the Individual Did Not Go Into That Medical Specialty; Overall and by Gender Calculation n Hit Rate Overall 150 74/500 15% 18 60/500 12% 30 76/500 15% Female 150 37/250 15% 18 35/250 14% 30 42/250 17% Male 150 37/250 15% 18 25/250 10% 30 34/250 14%

142

Comparison of Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where the Medical Student Had a Different Specialty Dominant Across the Top 1, 5, 10, and 20 Table 9 displays inconsistency in dominant hit rates for person matching based on the calculations of the 150 items, the 18 scales, and the 30 items for the total random sample and by gender for medical students who were observed as having four different specialties dominant across the top 1, 5, 10, and 20 that did not match the medical specialty entered by the medical student. When identifying medical students who were observed as having four different specialties dominant across the top 1, 5, 10, and 20 that did not match the medical specialty entered by the medical student, the 18 scales obtained the highest number with 11 (2%), the 30 items obtained the second-highest number with 8 (2%), and the 150 items obtained the third-highest number with 7 (1%). Data suggest that calculations based upon the 18 scales allows for more inconsistent dominant hit rates. For the 250 females in the random sample who were observed as having four different specialties dominant across the top 1, 5, 10, and 20 that did not match their chosen medical specialty, the 18 scales obtained the highest number with 6 (2%), the 30 items obtained the second-highest number with 3 (1%), and the 150 items obtained the third-highest number with 2 (1%). For the 250 males in the random sample who were observed as having four different specialties dominant across the top 1, 5, 10, and 20 that did not match their chosen medical specialty, the 150 items, the 18 scales, and the 30 items obtained the same number with 5 (2%) hits. For medical students who were observed as having four different specialties dominant across the top 1, 5, 10, and 20 that

143

did not match their chosen medical specialty, females received one additional hit when compared to males when using the 18 scales, two fewer hits when calculated using the 30 items, and three fewer hits when calculated using the 150 items.

144

Table 9 Person Matching Hit Rates for the 150 Items, the 18 Scales, and the 30 Items Where There Is a Different Medical Specialty Dominant Across the Top 1, 5, 10, and 20; Overall and by Gender Calculation n Hit Rate Overall 150 7/500 1% 18 11/500 2% 30 8/500 2% Female 150 2/250 1% 18 6/250 2% 30 3/250 1% Male 150 5/250 2% 18 5/250 2% 30 5/250 2%

145

Person Matching Means, Standard Deviations, and Top Match Scores for the 150 Items; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 Table 10 displays means, standard deviations, and low and high top match scores based on the calculations of the 150 items (a) in general, (b) for each gender, (c) with the top match correct, and (d) with the top match incorrect three ways; for the entire random sample, for groups with over 100 in the sample, and for groups with under 100 in the sample. Additionally, means, standard deviations, and closest top match scores are listed for (a) medical students who had the same specialty dominant in the top 1, 5, 10, and 20, which matched the specialty entered by the medical student; (b) medical students who had the same specialty dominant in the top 1, 5, 10, and 20, which did not match the specialty entered by the medical student; and (c) medical students who were observed as having four different specialties dominant across the top 1, 5, 10, and 20. Overview. The mean score for the top match in the sample of 500 was 245.93 with a standard deviation of 89.09. The mean score for the 20th match in the sample of 500 was 301.83 with a standard deviation of 101.08. Incremental changes between the 1st and 20th mean scores were 55.90 and incremental changes between the 1st and 20th standard deviations were 11.99. The closest top match score for the 500 medical students in the sample was 64 and the highest top match score was 775. The mean score for the top match in the sample of 372 medical students in the 12 groups with over 100 in the sample was 243.23 with a standard deviation of 85.12. The

146

mean score for the 20th match in the sample of 372 was 299.16 with a standard deviation of 97.81. Incremental changes between the 1st and 20th mean scores were 55.93 and incremental changes between the 1st and 20th standard deviations were 12.69. The closest top match score for the 372 medical students in the 12 groups with over 100 in the sample was 64 and the highest top match score was 564. The mean score for the top match in the sample of 128 medical students in the 10 groups with under 100 in the sample was 253.75 with a standard deviation of 99.68. The mean score for the 20th match in the sample of 128 was 309.59 with a standard deviation of 110.06. Incremental changes between the 1st and 20th mean scores were 55.84 and incremental changes between the 1st and 20th standard deviations were 10.38. The closest top match score for the 128 medical students in the 12 groups with over 100 in the sample was 76 and the highest top match score was 775. When reviewing the differences in means, standard deviations, and top match scores between the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means, standard deviations and top match score came from the medical students in the 12 groups with over 100 in the sample. Females. The mean score for the top match in the sample of 250 females was 257.19 with a standard deviation of 97.28. The mean score for the 20th match in the sample of 250 females was 315.00 with a standard deviation of 108.81. Incremental changes between the 1st and 20th mean scores were 57.81 and incremental changes between the 1st and 20th standard deviations were 11.53. The closest top match score for the 250 females in the sample was 76 and the highest top match score was 775.

147

The mean score for the top match in the sample of 186 females in the 12 groups with over 100 in the sample was 250.47 with a standard deviation of 91.91. The mean score for the 20th match in the sample of 186 was 307.51 with a standard deviation of 103.92. Incremental changes between the 1st and 20th mean scores were 57.04 and incremental changes between the 1st and 20th standard deviations were 12.01. The closest top match score for the 186 females in the 12 groups with over 100 in the sample was 80 and the highest top match score was 564. The mean score for the top match in the sample of 64 females in the 10 groups with under 100 in the sample was 276.70 with a standard deviation of 109.88. The mean score for the 20th match in the sample of 64 females was 336.77 with a standard deviation of 120.16. Incremental changes between the 1st and 20th mean scores were 60.07 and incremental changes between the 1st and 20th standard deviations were 10.28. The closest top match score for the 64 females in the 12 groups with over 100 in the sample was 76 and the highest top match score was 775. When reviewing the differences in means, standard deviations, and top match scores between females in the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means and standard deviations were found in the females in the 12 groups with over 100 in the sample. The closest top match score came from a female in the 10 groups with under 100 in the sample. Males. The mean score for the top match in the sample of 250 males was 234.66, with a standard deviation of 78.67. The mean score for the 20th match in the sample of 250 males was 288.66, with a standard deviation of 91.02. Incremental changes between

148

the 1st and 20th mean scores were 54.00 and changes between the 1st and 20th standard deviations were 12.35. The closest top match score for the 250 males in the sample was 64 and the highest top match score was 547. The mean score for the top match in the sample of 186 males in the 12 groups with over 100 in the sample was 235.99, with a standard deviation of 77.32. The mean score for the 20th match in the sample of 186 was 290.81, with a standard deviation of 90.81. Incremental changes between the 1st and 20th mean scores were 54.82 and changes between the 1st and 20th standard deviations were 13.49. The closest top match score for the 186 males in the 12 groups with over 100 in the sample was 64 and the highest top match score was 545. The mean score for the top match in the sample of 64 males in the 10 groups with under 100 in the sample was 230.80, with a standard deviation of 82.97. The mean score for the 20th match in the sample of 64 males was 282.41, with a standard deviation of 92.08. Incremental changes between the 1st and 20th mean scores were 51.61 and changes between the 1st and 20th standard deviations were 9.11. The closest top match score for the 64 males in the 12 groups with over 100 in the sample was 119 and the highest top match score was 547. When reviewing the differences in means, standard deviations, and top match scores between males in the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest standard deviations and top match score were found in the males in the 12 groups with over 100 in the sample. The lowest means were found in the males in the 10 groups with under 100 in the sample.

149

Comparison of females and males. Males had lower means, standard deviations, and top match scores than females in the total random sample, in the 12 groups with over 100 in the sample, and the 10 groups with under 100 in the sample with one exception. In the 10 groups with under 100 in the sample, females had the closest top match score when compared to males in the same sample. Top match predicted specialty entered. The mean score for the 108 top matches that correctly identified the specialty the medical student entered was 264.88, with a standard deviation of 92.60. The mean score for the 20th match in the sample of 108 was 325.96, with a standard deviation of 106.35. Incremental changes between the 1st and 20th mean scores were 61.08 and changes between the 1st and 20th standard deviations were 13.75. The closest top match score for the 108 top matches that correctly identified the specialty the medical student entered was 119 and the highest top match score was 564. The mean score for the 99 top matches that correctly identified the specialty the medical student entered for the 12 groups with over 100 in the sample was 267.60, with a standard deviation of 91.18. The mean score for the 20th match in the sample of 99 was 329.18, with a standard deviation of 104.80. Incremental changes between the 1st and 20th mean scores were 61.58 and changes between the 1st and 20th standard deviations were 13.62. The closest top match score for the 99 top matches that correctly identified the specialty the medical student entered for the 12 groups with over 100 in the sample was 129 and the highest top match score was 564.

150

The mean score for the nine top matches that correctly identified the specialty the medical student entered for the 10 groups with under 100 in the sample was 234.11, with a standard deviation of 108.49. The mean score for the 20th match in the sample of nine was 289.56, with a standard deviation of 123.38. Incremental changes between the 1st and 20th mean scores were 55.45 and changes between the 1st and 20th standard deviations were 14.89. The closest top match score for the nine top matches that correctly identified the specialty the medical student entered for the 10 groups with under 100 in the sample was 119 and the highest top match score was 434. When reviewing the differences in means, standard deviations, and top match scores that correctly identified the specialty the medical student entered between the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means and top match score came from the medical students in the 10 groups with under 100 in the sample. The lowest standard deviations came from the 12 groups with over 100 in the sample. Top match did not predict specialty entered. The mean score for the 392 top matches that did not correctly identify the specialty the medical student entered was 240.52, with a standard deviation of 87.43. The mean score for the 20th match in the sample of 392 was 294.94, with a standard deviation of 98.58. Incremental changes between the 1st and 20th mean scores were 54.42 and changes between the 1st and 20th standard deviations were 11.15. The closest top match score for the 392 top matches that did not correctly identify the specialty the medical student entered was 64 and the highest top match score was 775.

151

The mean score for the 273 top matches that did not correctly identify the specialty the medical student entered for the 12 groups with over 100 in the sample was 234.03, with a standard deviation of 81.01. The mean score for the 20th match in the sample of 273 was 287.82, with a standard deviation of 92.73. Incremental changes between the 1st and 20th mean scores were 53.79 and changes between the 1st and 20th standard deviations were 11.72. The closest top match score for the 273 top matches that did not correctly identify the specialty the medical student entered for the 12 groups with over 100 in the sample was 64 and the highest top match score was 512. The mean score for the 119 top matches that did not correctly identify the specialty the medical student entered for the 10 groups with under 100 in the sample was 255.24, with a standard deviation of 99.31. The mean score for the 20th match in the sample of 119 was 311.10, with a standard deviation of 109.42. Incremental changes between the 1st and 20th mean scores were 55.86 and changes between the 1st and 20th standard deviations were 10.11. The closest top match score for the 119 top matches that did not correctly identify the specialty the medical student entered for the 10 groups with under 100 in the sample was 76 and the highest top match score was 775. When reviewing the differences in means, standard deviations, and top match scores that did not correctly identify the specialty the medical student entered between the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means, standard deviations, and top match score came from the medical students in the 12 groups with over 100 in the sample.

152

Actual specialty was predicted as dominant across the top 1, 5, 10, and 20. The mean score for the top match of the 62 medical students that had the medical specialty that they entered as the majority of matches in the top 1, 5, 10 and 20 was 271.23, with a standard deviation of 86.13. The mean score for the 20th match in the sample of 62 was 339.40, with a standard deviation of 98.52. Incremental changes between the 1st and 20th mean scores were 68.17 and changes between the 1st and 20th standard deviations were 12.39. The closest top match score for the 62 medical students that had the medical specialty that they entered be dominant in the top 1, 5, 10 and 20 was 136 and the highest top match score was 564. Specialty predicted across the top 1, 5, 10, and 20 was not the specialty entered. The mean score for the top match of the 74 medical students that had one medical specialty dominant across the top 1, 5, 10 and 20, which was different from the medical specialty that they entered, was 268.54, with a standard deviation of 103.86. The mean score for the 20th match in the sample of 74 was 324.09, with a standard deviation of 116.21. Incremental changes between the 1st and 20th mean scores were 55.55 and changes between the 1st and 20th standard deviations were 12.35. The closest top match score for the 74 medical students that had one medical specialty dominant across the top 1, 5, 10 and 20 that was different from the medical specialty that they entered was 94 and the highest top match score was 775. Different specialty predicted across the top 1, 5, 10, and 20. The mean score for the top match of the seven medical students that had four different medical specialties dominant across the top 1, 5, 10 and 20 that did not match the medical specialty that they

153

entered was 210.14, with a standard deviation of 42.38. The mean score for the 20th match in the sample of seven was 254.43, with a standard deviation of 49.67. Incremental changes between the 1st and 20th mean scores were 44.00 and changes between the 1st and 20th standard deviations were 7.29. The closest top match score for the seven medical students that had four different medical specialties dominant across the top 1, 5, 10 and 20 that did not match the medical specialty that they entered was 129 and the highest top match score was 260. Comparison. When reviewing the differences in means, standard deviations, and top match scores for the 62 medical students who entered the specialty that was dominant, the 74 medical students who did not enter the specialty that was dominant, and the seven medical students with four different medical specialties dominant, the lowest means and standard deviations came from the seven medical students with four different medical specialties dominant across the top 1, 5, 10, and 20 that did not match the medical specialty that the student entered. The lowest top match score came from the 74 medical students who did not enter the specialty that was dominant across the top 1, 5, 10, and 20.

154

Table 10 Person Matching Means, Standard Deviations, and Top Match Scores for the 150 Items; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 First Match Twentieth Match M SD Low High Variable M SD M SD ∆ ∆ Score Score Overall 245.93 89.09 301.83 101.08 55.90 11.99 64 775 12 +100 243.23 85.12 299.16 97.81 55.93 12.69 64 564 10 -100 253.75 99.68 309.59 110.06 55.84 10.38 76 775 Females 257.19 97.28 315.00 108.81 57.81 11.53 76 775 12 +100 250.47 91.91 307.51 103.92 57.04 12.01 80 564 10 -100 276.70 109.88 336.77 120.16 60.07 10.28 76 775 Males 234.66 78.67 288.66 91.02 54.00 12.35 64 547 12 +100 235.99 77.32 290.81 90.81 54.82 13.49 64 545 10 -100 230.80 82.97 282.41 92.08 51.61 9.11 119 547 Top 1 Correct 264.88 92.60 325.96 106.35 61.08 13.75 119 564 12 +100 267.60 91.18 329.18 104.80 61.58 13.62 129 564 10 -100 234.11 108.49 289.56 123.38 55.45 14.89 119 434 Top 1 Incorrect 240.52 87.43 294.94 98.58 54.42 11.15 64 775 12 +100 234.03 81.01 287.82 92.73 53.79 11.72 64 512 10 -100 255.24 99.31 311.10 109.42 55.86 10.11 76 775 Dominant Miss 268.54 103.86 324.09 116.21 55.55 12.35 94 775 Dominant Match 271.23 86.13 339.40 98.52 68.17 12.39 136 564 Different Across 210.14 42.38 254.43 49.67 44.00 7.29 129 260

155

Person Matching Means, Standard Deviations, and Top Match Scores for the 18 Scales; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 Table 11 displays means, standard deviations, and low and high top match scores based on the calculations of the 18 scales (a) in general, (b) for each gender, (c) with the top match correct, and (d) with the top match incorrect three ways; for the entire random sample, for groups with over 100 in the sample, and for groups with under 100 in the sample. Additionally, means, standard deviations, and closest top match scores are listed for (a) medical students who had the same specialty dominant in the top 1, 5, 10, and 20, which matched the specialty entered by the medical student; (b) medical students who had the same specialty dominant in the top 1, 5, 10, and 20, which did not match the specialty entered by the medical student; and (c) medical students who were observed as having four different specialties dominant across the top 1, 5, 10, and 20. Overview. The mean score for the top match in the sample of 500 was 7.45, with a standard deviation of 3.77. The mean score for the 20th match in the sample of 500 was 12.61, with a standard deviation of 5.51. Incremental changes between the 1st and 20th mean scores were 5.16 and changes between the 1st and 20th standard deviations were 1.74. The closest top match score for the 500 medical students in the sample was 1.43 and the highest top match score was 34.85. The mean score for the top match in the sample of 372 medical students in the 12 groups with over 100 in the sample was 7.38, with a standard deviation of 3.63. The

156

mean score for the 20th match in the sample of 372 was 12.55, with a standard deviation of 5.41. Incremental changes between the 1st and 20th mean scores were 5.17 and changes between the 1st and 20th standard deviations were 1.78. The closest top match score for the 372 medical students in the 12 groups with over 100 in the sample was 1.43 and the highest top match score was 24.71. The mean score for the top match in the sample of 128 medical students in the 10 groups with under 100 in the sample was 7.64, with a standard deviation of 4.19. The mean score for the 20th match in the sample of 128 was 12.77, with a standard deviation of 5.81. Incremental changes between the 1st and 20th mean scores were 5.13 and changes between the 1st and 20th standard deviations were 1.62. The closest top match score for the 128 medical students in the 12 groups with over 100 in the sample was 1.71 and the highest top match score was 34.85. When reviewing the differences in means, standard deviations, and scores between the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means, standard deviations and top match score came from the medical students in the 12 groups with over 100 in the sample. Females. The mean score for the top match in the sample of 250 females was 7.96, with a standard deviation of 4.16. The mean score for the 20th match in the sample of 250 females was 13.40, with a standard deviation of 6.32. Incremental changes between the 1st and 20th mean scores were 5.44 and changes between the 1st and 20th standard deviations were 2.16. The closest top match score for the 250 females in the sample was 1.86 and the highest top match score was 34.85.

157

The mean score for the top match in the sample of 186 females in the 12 groups with over 100 in the sample was 7.80, with a standard deviation of 3.94. The mean score for the 20th match in the sample of 186 was 13.21, with a standard deviation of 6.22. Incremental changes between the 1st and 20th mean scores were 5.41 and changes between the 1st and 20th standard deviations were 2.28. The closest top match score for the 186 females in the 12 groups with over 100 in the sample was 1.86 and the highest top match score was 24.71. The mean score for the top match in the sample of 64 females in the 10 groups with under 100 in the sample was 8.42, with a standard deviation of 4.72. The mean score for the 20th match in the sample of 64 females was 13.95, with a standard deviation of 6.60. Incremental changes between the 1st and 20th mean scores were 5.53 and changes between the 1st and 20th standard deviations were 1.88. The closest top match score for the 64 females in the 12 groups with over 100 in the sample was 2.57 and the highest top match score was 34.85. When reviewing the differences in means, standard deviations, and scores between females in the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means, standard deviations and top match score were all found in the females in the 12 groups with over 100 in the sample. Males. The mean score for the top match in the sample of 250 males was 6.94, with a standard deviation of 3.28. The mean score for the 20th match in the sample of 250 males was 11.81, with a standard deviation of 4.43. Incremental changes between the 1st and 20th mean scores were 4.87 and changes between the 1st and 20th standard

158

deviations were 1.15. The closest top match score for the 250 males in the sample was 1.43 and the highest top match score was 21.02. The mean score for the top match in the sample of 186 males in the 12 groups with over 100 in the sample was 6.97, with a standard deviation of 3.24. The mean score for the 20th match in the sample of 186 was 11.89, with a standard deviation of 4.37. Incremental changes between the 1st and 20th mean scores were 4.92 and changes between the 1st and 20th standard deviations were 1.13. The closest top match score for the 186 males in the 12 groups with over 100 in the sample was 1.43 and the highest top match score was 21.02. The mean score for the top match in the sample of 64 males in the 10 groups with under 100 in the sample was 6.85, with a standard deviation of 3.44. The mean score for the 20th match in the sample of 64 males was 11.58, with a standard deviation of 4.65. Incremental changes between the 1st and 20th mean scores were 4.73 and changes between the 1st and 20th standard deviations were 1.21. The closest top match score for the 64 males in the 12 groups with over 100 in the sample was 1.71 and the highest top match score was 15.89. When reviewing the differences in means, standard deviations, and scores between males in the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest standard deviations and closest top match score were found in the males in the 12 groups with over 100 in the sample. The lowest means were found in the males in the 10 groups with under 100 in the sample.

159

Comparison of females and males. Males had lower means, standard deviations, and top match scores than females in the total random sample, the 12 groups with over 100 in the sample, and the 10 groups with under 100 in the sample. Top match predicted specialty entered. The mean score for the 90 top matches that correctly identified the specialty the medical student entered was 8.04, with a standard deviation of 4.45. The mean score for the 20th match in the sample of 90 was 13.59, with a standard deviation of 6.12. Incremental changes between the 1st and 20th mean scores were 5.55 and changes between the 1st and 20th standard deviations were 1.67. The closest top match score for the 90 top matches that correctly identified the specialty the medical student entered was 2.13 and the highest top match score was 34.85. The mean score for the 86 top matches that correctly identified the specialty the medical student entered for the 12 groups with over 100 in the sample was 7.73, with a standard deviation of 3.46. The mean score for the 20th match in the sample of 86 was 13.23, with a standard deviation of 4.86. Incremental changes between the 1st and 20th mean scores were 5.50 and changes between the 1st and 20th standard deviations were 1.40. The closest top match score for the 86 top matches that correctly identified the specialty the medical student entered for the 12 groups with over 100 in the sample was 2.13 and the highest top match score was 19.88. The mean score for the four top matches that correctly identified the specialty the medical student entered for the 10 groups with under 100 in the sample was 14.56, with a standard deviation of 13.69. The mean score for the 20th match in the sample of four

160

was 21.45, with a standard deviation of 18.89. Incremental changes between the 1st and 20th mean scores were 6.89 and changes between the 1st and 20th standard deviations were 5.20. The closest top match score for the four top matches that correctly identified the specialty the medical student entered for the 10 groups with under 100 in the sample was 5.88 and the highest top match score was 34.85. When reviewing the differences in means, standard deviations, and scores for top matches that correctly identified the specialty the medical student entered between the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means, standard deviations, and top match scores came from the medical students in the 12 groups with over 100 in the sample. Top match did not predict specialty entered. The mean score for the 410 top matches that did not correctly identify the specialty the medical student entered was 7.32, with a standard deviation of 3.60. The mean score for the 20th match in the sample of 410 was 12.39, with a standard deviation of 5.35. Incremental changes between the 1st and 20th mean scores were 5.07 and changes between the 1st and 20th standard deviations were 1.75. The closest top match score for the 410 top matches that did not correctly identify the specialty the medical student entered was 1.43 and the highest top match score was 24.71. The mean score for the 286 top matches that did not correctly identify the specialty the medical student entered for the 12 groups with over 100 in the sample was 7.28, with a standard deviation of 3.67. The mean score for the 20th match in the sample of 286 was 12.35, with a standard deviation of 5.56. Incremental changes between the

161

1st and 20th mean scores were 5.07 and changes between the 1st and 20th standard deviations were 1.89. The closest top match score for the 286 top matches that did not correctly identify the specialty the medical student entered for the 12 groups with over 100 in the sample was 1.43 and the highest top match score was 24.71. The mean score for the 124 top matches that did not correctly identify the specialty the medical student entered for the 10 groups with under 100 in the sample was 7.41, with a standard deviation of 3.45. The mean score for the 20th match in the sample of 124 was 12.49, with a standard deviation of 4.86. Incremental changes between the 1st and 20th mean scores were 5.08 and changes between the 1st and 20th standard deviations were 1.41. The closest top match score for the 124 top matches that did not correctly identify the specialty the medical student entered for the 10 groups with under 100 in the sample was 1.71 and the highest top match score was 15.89. When reviewing the differences in means, standard deviations, and scores for top matches that did not correctly identify the specialty the medical student entered between the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means and closest top match score came from the medical students in the 12 groups with over 100 in the sample. The lowest standard deviations came from the medical students in the 10 groups with under 100 in the sample. Actual specialty was predicted as dominant across the top 1, 5, 10, and 20. The mean score for the top match of the 43 medical students that had the medical specialty that they entered be dominant in the top 1, 5, 10 and 20 was 8.26, with a standard deviation of 3.65. The mean score for the 20th match in the sample of 43 was

162

14.62, with a standard deviation of 4.92. Incremental changes between the 1st and 20th mean scores were 6.36 and changes between the 1st and 20th standard deviations were 1.27. The closest top match score for the 43 medical students that had the medical specialty that they entered be dominant in the top 1, 5, 10 and 20 was 2.50 and the highest top match score was 19.88. Specialty predicted across the top 1, 5, 10, and 20 was not the specialty entered. The mean score for the top match of the 60 medical students that had one medical specialty dominant across the top 1, 5, 10 and 20 that was different from the medical specialty that they entered was 8.23, with a standard deviation of 3.95. The mean score for the 20th match in the sample of 60 was 13.61, with a standard deviation of 6.24. Incremental changes between the 1st and 20th mean scores were 5.38 and changes between the 1st and 20th standard deviations were 2.29. The closest top match score for the 60 medical students that had one medical specialty dominant across the top 1, 5, 10 and 20 that was different from the medical specialty that they entered was 1.86 and the highest top match score was 18.63. Different specialty predicted across the top 1, 5, 10, and 20. The mean score for the top match of the 11 medical students that had four different medical specialties dominant across the top 1, 5, 10 and 20 that did not match the medical specialty that they entered was 6.14, with a standard deviation of 4.31. The mean score for the 20th match in the sample of 11 was 11.08, with a standard deviation of 6.15. Incremental changes between the 1st and 20th mean scores were 4.94 and changes between the 1st and 20th standard deviations were 1.84. The closest top match score for the 11 medical students

163

that had four different medical specialties dominant across the top 1, 5, 10 and 20 that did not match the medical specialty that they entered was 3.60 and the highest top match score was 18.58. Comparison. When reviewing the differences in means, standard deviations, and scores for the 43 medical students that entered the specialty that was dominant across the top 1, 5, 10 and 20; the 60 medical students that did not enter the specialty that was dominant across the top 1, 5, 10, and 20; and the 11 medical students with four different medical specialties dominant across the top 1, 5, 10, and 20 that did not match the medical specialty that the student entered; the lowest means came from the 11 medical students with four different medical specialties dominant across the top 1, 5, 10, and 20 that did not match the medical specialty that the student entered. The lowest standard deviations came from the 43 medical students who entered the specialty that was dominant across the top 1, 5, 10, and 20. The closest top match score came from the 60 medical students that did not enter the specialty that was dominant across the top 1, 5, 10, and 20.

164

Table 11 Person Matching Means, Standard Deviations, and Top Match Scores for the 18 Scales; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 First Match Twentieth Match M SD Low High Variable M SD M SD ∆ ∆ Score Score Overall 7.45 3.77 12.61 5.51 5.16 1.74 1.43 34.85 12 +100 7.38 3.63 12.55 5.41 5.17 1.78 1.43 24.71 10 -100 7.64 4.19 12.77 5.81 5.13 1.62 1.71 34.85 Females 7.96 4.16 13.40 6.32 5.44 2.16 1.86 34.85 12 +100 7.80 3.94 13.21 6.22 5.41 2.28 1.86 24.71 10 -100 8.42 4.72 13.95 6.60 5.53 1.88 2.57 34.85 Males 6.94 3.28 11.81 4.43 4.87 1.15 1.43 21.02 12 +100 6.97 3.24 11.89 4.37 4.92 1.13 1.43 21.02 10 -100 6.85 3.44 11.58 4.65 4.73 1.21 1.71 15.89 Top 1 Correct 8.04 4.45 13.59 6.12 5.55 1.67 2.13 34.85 12 +100 7.73 3.46 13.23 4.86 5.50 1.40 2.13 19.88 10 -100 14.56 13.69 21.45 18.89 6.89 5.20 5.88 34.85 Top 1 Incorrect 7.32 3.60 12.39 5.35 5.07 1.75 1.43 24.71 12 +100 7.28 3.67 12.35 5.56 5.07 1.89 1.43 24.71 10 -100 7.41 3.45 12.49 4.86 5.08 1.41 1.71 15.89 Dominant Miss 8.23 3.95 13.61 6.24 5.38 2.29 1.86 18.63 Dominant Match 8.26 3.65 14.62 4.92 6.36 1.27 2.50 19.88 Different Across 6.14 4.31 11.08 6.15 4.94 1.84 3.60 18.58

165

Person Matching Means, Standard Deviations, and Top Match Scores for the 30 Items; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 Table 12 displays means, standard deviations, and low and high top match scores based on the calculations of the 30 items (a) in general, (b) for each gender, (c) with the top match correct, and (d) with the top match incorrect three ways; for the entire random sample, for groups with over 100 in the sample, and for groups with under 100 in the sample. Additionally, means, standard deviations, and closest top match scores are listed for (a) medical students who had the same specialty dominant in the top 1, 5, 10, and 20, which matched the specialty entered by the medical student; (b) medical students who had the same specialty dominant in the top 1, 5, 10, and 20, which did not match the specialty entered by the medical student; and (c) medical students who were observed as having four different specialties dominant across the top 1, 5, 10, and 20. Overview. The mean score for the top match in the sample of 500 was 35.22, with a standard deviation of 13.66. The mean score for the 20th match in the sample of 500 was 50.98, with a standard deviation of 18.00. Incremental changes between the 1st and 20th mean scores were 15.76 and changes between the 1st and 20th standard deviations were 4.34. The closest top match score for the 500 medical students in the sample was 9 and the highest top match score was 94. The mean score for the top match in the sample of 372 medical students in the 12 groups with over 100 in the sample was 35.30, with a standard deviation of 13.86. The

166

mean score for the 20th match in the sample of 372 was 50.98, with a standard deviation of 18.07. Incremental changes between the 1st and 20th mean scores were 15.68 and changes between the 1st and 20th standard deviations were 4.21. The closest top match score for the 372 medical students in the 12 groups with over 100 in the sample was 9 and the highest top match score was 94. The mean score for the top match in the sample of 128 medical students in the 10 groups with under 100 in the sample was 34.98, with a standard deviation of 13.12. The mean score for the 20th match in the sample of 128 was 50.95, with a standard deviation of 17.88. Incremental changes between the 1st and 20th mean scores were 15.97 and changes between the 1st and 20th standard deviations were 4.76. The closest top match score for the 128 medical students in the 12 groups with over 100 in the sample was 13 and the highest top match score was 80. When reviewing the differences in means, standard deviations, and scores between the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means and standard deviations came from the medical students in the 10 groups with under 100 in the sample. The closest top match score came from the medical students in the 12 groups with over 100 in the sample. Females. The mean score for the top match in the sample of 250 females was 36.93, with a standard deviation of 14.46. The mean score for the 20th match in the sample of 250 females was 53.83, with a standard deviation of 19.49. Incremental changes between the 1st and 20th mean scores were 16.90 and changes between the 1st

167

and 20th standard deviations were 5.03. The closest top match score for the 250 females in the sample was 11 and the highest top match score was 94. The mean score for the top match in the sample of 186 females in the 12 groups with over 100 in the sample was 36.60, with a standard deviation of 14.84. The mean score for the 20th match in the sample of 186 was 53.24, with a standard deviation of 19.69. Incremental changes between the 1st and 20th mean scores were 16.64 and changes between the 1st and 20th standard deviations were 4.85. The closest top match score for the 186 females in the 12 groups with over 100 in the sample was 11 and the highest top match score was 94. The mean score for the top match in the sample of 64 females in the 10 groups with under 100 in the sample was 37.89, with a standard deviation of 13.38. The mean score for the 20th match in the sample of 64 females was 55.55, with a standard deviation of 18.93. Incremental changes between the 1st and 20th mean scores were 17.66 and changes between the 1st and 20th standard deviations were 5.55. The closest top match score for the 64 females in the 12 groups with over 100 in the sample was 15 and the highest top match score was 80. When reviewing the differences in means, standard deviations, and scores between females in the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means and closest top match score were found in the females in the 12 groups with over 100 in the sample. The lowest standard deviations were found in the females in the 10 groups with under 100 in the sample.

168

Males. The mean score for the top match in the sample of 250 males was 33.50, with a standard deviation of 12.61. The mean score for the 20th match in the sample of 250 males was 48.12, with a standard deviation of 15.92. Incremental changes between the 1st and 20th mean scores were 14.62 and changes between the 1st and 20th standard deviations were 3.31. The closest top match score for the 250 males in the sample was 9 and the highest top match score was 88. The mean score for the top match in the sample of 186 males in the 12 groups with over 100 in the sample was 33.99, with a standard deviation of 12.71. The mean score for the 20th match in the sample of 186 was 48.73, with a standard deviation of 16.02. Incremental changes between the 1st and 20th mean scores were 14.74 and changes between the 1st and 20th standard deviations were 3.31. The closest top match score for the 186 males in the 12 groups with over 100 in the sample was 9 and the highest top match score was 88. The mean score for the top match in the sample of 64 males in the 10 groups with under 100 in the sample was 32.08, with a standard deviation of 12.28. The mean score for the 20th match in the sample of 64 males was 46.36, with a standard deviation of 15.59. Incremental changes between the 1st and 20th mean scores were 14.28 and changes between the 1st and 20th standard deviations were 3.31. The closest top match score for the 64 males in the 12 groups with over 100 in the sample was 13 and the highest top match score was 70. When reviewing the differences in means, standard deviations, and scores between males in the 12 groups with over 100 in the sample and the 10 groups with under

169

100 in the sample, the lowest means and standard deviations were found in the males in the 10 groups with under 100 in the sample. The closest top match score was found in the males in the 12 groups with over 100 in the sample. Comparison of females and males. Males had lower means, standard deviations, and top match scores than females in the total sample, in the 12 groups with over 100 in the sample, and the 10 groups with under 100 in the sample. Top match predicted specialty entered. The mean score for the 111 top matches that correctly identified the specialty the medical student entered was 38.16, with a standard deviation of 14.63. The mean score for the 20th match in the sample of 111 was 54.59, with a standard deviation of 18.48. Incremental changes between the 1st and 20th mean scores were 16.43 and changes between the 1st and 20th standard deviations were 3.85. The closest top match score for the 111 top matches that correctly identified the specialty the medical student entered was 9 and the highest top match score was 94. The mean score for the 99 top matches that correctly identified the specialty the medical student entered for the 12 groups with over 100 in the sample was 38.42, with a standard deviation of 15.03. The mean score for the 20th match in the sample of 99 was 54.63, with a standard deviation of 18.86. Incremental changes between the 1st and 20th mean scores were 16.21 and changes between the 1st and 20th standard deviations were 3.83. The closest top match score for the 99 top matches that correctly identified the specialty the medical student entered for the 12 groups with over 100 in the sample was 9 and the highest top match score was 94.

170

The mean score for the 12 top matches that correctly identified the specialty the medical student entered for the 10 groups with under 100 in the sample was 35.92, with a standard deviation of 10.77. The mean score for the 20th match in the sample of 12 was 54.25, with a standard deviation of 15.43. Incremental changes between the 1st and 20th mean scores were 18.33 and changes between the 1st and 20th standard deviations were 4.66. The closest top match score for the 12 top matches that correctly identified the specialty the medical student entered for the 10 groups with under 100 in the sample was 19 and the highest top match score was 54. When reviewing the differences in means, standard deviations, and scores for top matches that correctly identified the specialty the medical student entered between the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means and standard deviations came from the medical students in the 10 groups with under 100 in the sample. The closest top match score came from the medical students in the 12 groups with over 100 in the sample. Top match did not predict specialty entered. The mean score for the 389 top matches that did not correctly identify the specialty the medical student entered was 34.34, with a standard deviation of 13.25. The mean score for the 20th match in the sample of 389 was 49.90, with a standard deviation of 17.74. Incremental changes between the 1st and 20th mean scores were 15.56 and changes between the 1st and 20th standard deviations were 4.49. The closest top match score for the 389 top matches that did not correctly identify the specialty the medical student entered was 11 and the highest top match score was 80.

171

The mean score for the 273 top matches that did not correctly identify the specialty the medical student entered for the 12 groups with over 100 in the sample was 34.10, with a standard deviation of 13.22. The mean score for the 20th match in the sample of 273 was 49.59, with a standard deviation of 17.59. Incremental changes between the 1st and 20th mean scores were 15.49 and changes between the 1st and 20th standard deviations were 4.37. The closest top match score for the 273 top matches that did not correctly identify the specialty the medical student entered for the 12 groups with over 100 in the sample was 11 and the highest top match score was 77. The mean score for the 116 top matches that did not correctly identify the specialty the medical student entered for the 10 groups with under 100 in the sample was 34.89, with a standard deviation of 13.38. The mean score for the 20th match in the sample of 116 was 50.61, with a standard deviation of 18.14. Incremental changes between the 1st and 20th mean scores were 15.72 and changes between the 1st and 20th standard deviations were 4.76. The closest top match score for the 116 top matches that did not correctly identify the specialty the medical student entered for the 10 groups with under 100 in the sample was 13 and the highest top match score was 80. When reviewing the differences in means, standard deviations, and scores for top matches that did not correctly identify the specialty the medical student entered between the 12 groups with over 100 in the sample and the 10 groups with under 100 in the sample, the lowest means, standard deviations, and top match scores came from the medical students in the 12 groups with over 100 in the sample.

172

Actual specialty was predicted as dominant across the top 1, 5, 10, and 20. The mean score for the top match of the 60 medical students that had the medical specialty that they entered be dominant in the top 1, 5, 10 and 20 was 39.73, with a standard deviation of 12.44. The mean score for the 20th match in the sample of 60 was 57.92, with a standard deviation of 15.44. Incremental changes between the 1st and 20th mean scores were 18.19 and changes between the 1st and 20th standard deviations were 3.00. The closest top match score for the 60 medical students that had the medical specialty that they entered be dominant in the top 1, 5, 10 and 20 was 18 and the highest top match score was 82. Specialty predicted across the top 1, 5, 10, and 20 was not the specialty entered. The mean score for the top match of the 76 medical students that had one medical specialty dominant across the top 1, 5, 10 and 20 that was different from the medical specialty that they entered was 36.20, with a standard deviation of 13.65. The mean score for the 20th match in the sample of 76 was 51.89, with a standard deviation of 18.30. Incremental changes between the 1st and 20th mean scores were 15.69 and changes between the 1st and 20th standard deviations were 4.65. The closest top match score for the 76 medical students that had one medical specialty dominant across the top 1, 5, 10 and 20 that was different from the medical specialty that they entered was 11 and the highest top match score was 75. Different specialty predicted across the top 1, 5, 10, and 20. The mean score for the top match of the eight medical students that had four different medical specialties dominant across the top 1, 5, 10 and 20 that did not match the medical specialty that they

173

entered was 29.63, with a standard deviation of 12.21. The mean score for the 20th match in the sample of eight was 44.75, with a standard deviation of 16.59. Incremental changes between the 1st and 20th mean scores were 15.12 and changes between the 1st and 20th standard deviations were 4.38. The closest top match score for the eight medical students that had four different medical specialties dominant across the top 1, 5, 10 and 20 that did not match the medical specialty that they entered was 14 and the highest top match score was 49. Comparison. When reviewing the differences in means, standard deviations, and scores for the 60 medical students that entered the specialty that was dominant, the 76 medical students that did not enter the specialty that was dominant, and the eight medical students with four different medical specialties dominant, the lowest means and 1st match standard deviation came from the eight medical students with four different medical specialties dominant across the top 1, 5, 10, and 20 that did not match the medical specialty that the student entered. The closest top match score came from the 76 medical students that did not enter the specialty that was dominant across the top 1, 5, 10, and 20. The closest twentieth match standard deviation came from the 60 medical students who entered the specialty that was dominant across the top 1, 5, 10, and 20.

174

Table 12 Person Matching Means, Standard Deviations, and Top Match Scores for the 30 Items; Overall, Groups With Over 100, Groups With Under 100, Top Match Correct, Top Match Incorrect, by Gender, and by Dominance Across the Top 1, 5, 10, and 20 First Match Twentieth Match M SD Low High Variable M SD M SD ∆ ∆ Score Score Overall 35.22 13.66 50.98 18.00 15.76 4.34 9 94 12 +100 35.30 13.86 50.98 18.07 15.68 4.21 9 94 10 -100 34.98 13.12 50.95 17.88 15.97 4.76 13 80 Females 36.93 14.46 53.83 19.49 16.90 5.03 11 94 12 +100 36.60 14.84 53.24 19.69 16.64 4.85 11 94 10 -100 37.89 13.38 55.55 18.93 17.66 5.55 15 80 Males 33.50 12.61 48.12 15.92 14.62 3.31 9 88 12 +100 33.99 12.71 48.73 16.02 14.74 3.31 9 88 10 -100 32.08 12.28 46.36 15.59 14.28 3.31 13 70 Top 1 Correct 38.16 14.63 54.59 18.48 16.43 3.85 9 94 12 +100 38.42 15.03 54.63 18.86 16.21 3.83 9 94 10 -100 35.92 10.77 54.25 15.43 18.33 4.66 19 54 Top 1 Incorrect 34.34 13.25 49.90 17.74 15.56 4.49 11 80 12 +100 34.10 13.22 49.59 17.59 15.49 4.37 11 77 10 -100 34.89 13.38 50.61 18.14 15.72 4.76 13 80 Dominant Miss 36.20 13.65 51.89 18.30 15.69 4.65 11 75 Dominant Match 39.73 12.44 57.92 15.44 18.19 3.00 18 82 Different Across 29.63 12.21 44.75 16.59 15.12 4.38 14 49

175

Inferential Analyses Standard Scoring Hit Rates Including Kappa Coefficients; Overall and by Gender Table 13 displays standard scoring hit rates including kappa coefficients for the total random sample and for the 250 females and 250 males that comprised the random sample. These hit rates document how often (a) the highest probability score selected the specialty entered by the medical student, (b) the second-highest probability score selected the specialty entered by the medical student, (c) the third-highest probability score selected the specialty entered by the medical student, (d) the fourth-highest probability score selected the specialty entered by the medical student and (e) the fifth-highest probability score selected the specialty entered by the medical student. In addition, kappa coefficients were calculated to determine the amount of agreement between the hit rates that were expected based upon the MSPI-R manual and the hit rates that were actually observed from the random sample of 500 medical students. The highest probability score accurately identified the medical specialty that was entered by medical students 165 (33%) times. This was lower than the expected hit rate of 260 (52%), which resulted in a kappa of .62 (Good). The second-highest probability score accurately identified the medical specialty that was entered by medical students 67 (13%) times. This was lower than the expected hit rate of 75 (15%), which resulted in a kappa of .93 (Very Good). The third-highest probability score accurately identified the medical specialty that was entered by medical students 41 (8%) times. This was lower than the expected hit rate of 45 (9%), which resulted in a kappa of .95 (Very Good). The fourth-highest probability score accurately identified the medical specialty that was

176

entered by medical students 36 (7%) times. This was higher than the expected hit rate of 25 (5%), which resulted in a kappa of .81 (Very Good). The fifth-highest probability score accurately identified the medical specialty that was entered by medical students 26 (5%) times. This was higher than the expected hit rate of 15 (3%), which resulted in a kappa of .72 (Good). When combining the hit rates of the five-highest probability scores in the sample of 500, standard scoring identified the medical specialty entered 335 (67%) times. This was lower than the expected hit rate of 420 (84%), which resulted in a kappa of .56 (Moderate). For the 250 females in the sample, the highest probability score accurately identified the medical specialty entered 74 (30%) times. This was lower than the expected hit rate of 130 (52%), which resulted in a kappa of .56 (Moderate). The second-highest probability score accurately identified the medical specialty entered by females 34 (14%) times. This was lower than the expected hit rate of 38 (15%), which resulted in a kappa of .94 (Very Good). The third-highest probability score accurately identified the medical specialty entered by females 24 (10%) times. This was higher than the expected hit rate of 23 (9%), which resulted in a kappa of .98 (Very Good). The fourth-highest probability score accurately identified the medical specialty entered by females 18 (7%) times. This was higher than the expected hit rate of 13 (5%), which resulted in a kappa of .83 (Very Good). The fifth-highest probability score accurately identified the medical specialty entered by females 17 (7%) times. This was higher than the expected hit rate of 8 (3%), which resulted in a kappa of .62 (Good). When combining the hit rates of the five highest probability scores for the sample of 250

177

females, standard scoring identified the medical specialty entered 167 (67%) times. This was lower than the expected hit rate of 210 (84%), which resulted in a kappa of .55 (Moderate). For the 250 males in the sample, the highest probability score accurately identified the medical specialty entered 91 (36%) times. This was lower than the expected hit rate of 130 (52%), which resulted in a kappa of .69 (Good). The second-highest probability score accurately identified the medical specialty entered by males 33 (13%) times. This was lower than the expected hit rate of 38 (15%), which resulted in a kappa of .90 (Very Good). The third-highest probability score accurately identified the medical specialty entered by males 17 (7%) times. This was lower than the expected hit rate of 23 (9%), which resulted in a kappa of .84 (Very Good). The fourth-highest probability score accurately identified the medical specialty entered by males 18 (7%) times. This was higher than the expected hit rate of 13 (5%), which resulted in a kappa of .83 (Very Good). The fifth-highest probability score accurately identified the medical specialty entered by males 9 (4%) times. This was higher than the expected hit rate of 8 (3%), which resulted in a kappa of .94 (Very Good). When combining the hit rates of the five highest probability scores in the sample of 250 males, standard scoring calculated the medical specialty entered 168 (67%) times. This was lower than the expected hit rate of 210 (84%), which resulted in a kappa of .56 (Moderate). When comparing the hit rates of females and males, scores suggest that females overall received one fewer hit when compared to males. Females received higher hit

178

rates when the selected specialty was identified by standard scoring as the second, third, or fifth-highest probability score. The fourth-highest probability score was a tie. Males far exceeded females when the highest probability score identified the chosen medical specialty where females obtained 17 fewer hits when compared to males. Kappa coefficients suggest that overall there is .56 (Moderate) agreement between the observed hit rates for the random sample and the expected hit rates published in the MSPI-R manual. The same moderate agreement was found when comparing observed and expected hit rates for females and males.

179

Table 13 Standard Scoring Hit Rates for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Overall First 165/500 33% 52% .62 Good Second 67/500 13% 15% .93 Very Good Third 41/500 8% 9% .95 Very Good Fourth 36/500 7% 5% .81 Very Good Fifth 26/500 5% 3% .72 Good Total of Top 5 335/500 67% 84% .56 Moderate Females First 74/250 30% 52% .56 Moderate Second 34/250 14% 15% .94 Very Good Third 24/250 10% 9% .98 Very Good Fourth 18/250 7% 5% .83 Very Good Fifth 17/250 7% 3% .62 Good Total of Top 5 167/250 67% 84% .55 Moderate Males First 91/250 36% 52% .69 Good Second 33/250 13% 15% .90 Very Good Third 17/250 7% 9% .84 Very Good Fourth 18/250 7% 5% .83 Very Good Fifth 9/250 4% 3% .94 Very Good Total of Top 5 168/250 67% 84% .56 Moderate

180

Standard Scoring Hit Rates Including Kappa Coefficients; Overall and by Gender for the 12 Groups With Over 100 in the Sample Table 14 displays standard scoring hit rates including kappa coefficients for the 372 medical students in the 12 groups with over 100 in the sample and for the 186 females and 186 males who comprised the sample. These hit rates document how often (a) the highest probability score selected the specialty entered by the medical student, (b) the second-highest probability score selected the specialty entered by the medical student, (c) the third-highest probability score selected the specialty entered by the medical student, (d) the fourth-highest probability score selected the specialty entered by the medical student and (e) the fifth-highest probability score selected the specialty entered by the medical student. In addition, kappa coefficients were calculated to determine the amount of agreement between the hit rates that were expected based upon the MSPI-R manual and the hit rates that were actually observed from the 372 medical students in the 12 groups with over 100 in the sample and for the 186 females and 186 males that comprised the sample. The highest probability score accurately identified the medical specialty that was entered by medical students in the 12 groups with over 100 in the sample 155 (42%) times. This was lower than the expected hit rate of 193 (52%), which resulted in a kappa of .80 (Good). The second-highest probability score accurately identified the medical specialty that was entered 57 (15%) times. This was equal to the expected hit rate of 57 (15%), which resulted in a kappa of 1.0 (Very Good). The third-highest probability score accurately identified the medical specialty that was entered 34 (9%) times. This was

181

equal to the expected hit rate of 34 (9%), which resulted in a kappa of 1.0 (Very Good). The fourth-highest probability score accurately identified the medical specialty that was entered 28 (8%) times. This was higher than the expected hit rate of 19 (5%), which resulted in a kappa of .80 (Good). The fifth-highest probability score accurately identified the medical specialty that was entered 20 (5%) times. This was higher than the expected hit rate of 11 (3%), which resulted in a kappa of .70 (Good). When combining the hit rates of the five highest probability scores in the sample of 372 medical students in the 12 groups with over 100 in the sample, standard scoring identified the medical specialty entered 294 (79%) times. This was lower than the expected hit rate of 312 (84%), which resulted in a kappa of .84 (Very Good). For the 186 females in the 12 groups with over 100 in the sample, standard scoring calculated the highest probability score for the medical specialty that was entered 70 (38%) times. This was lower than the expected hit rate of 97 (52%), which resulted in a kappa of .71 (Good). Standard scoring calculated the second-highest probability score for the medical specialty that was entered by females 30 (16%) times. This was higher than the expected hit rate of 28 (15%), which resulted in a kappa of .96 (Very Good). Standard scoring calculated the third-highest probability score for the medical specialty that was entered by females 21 (11%) times. This was higher than the expected hit rate of 17 (9%), which resulted in a kappa of .88 (Very Good). Standard scoring calculated the fourth-highest probability score for the medical specialty that was entered by females 15 (8%) times. This was higher than the expected hit rate of 9 (5%), which resulted in a kappa of .73 (Good). Standard scoring calculated the fifth-highest probability score for

182

the medical specialty that was entered by females 12 (6%) times. This was higher than the expected hit rate of 6 (3%), which resulted in a kappa of .65 (Good). When combining the hit rates of the five highest probability scores in the sample of 186 females in the 12 groups with over 100 in the sample, standard scoring calculated the probability score for the medical specialty that was entered 148 (80%) times. This was lower than the expected hit rate of 156 (84%), which resulted in a kappa of .86 (Very Good). For the 186 males in the 12 groups with over 100 in the sample, standard scoring calculated the highest probability score for the medical specialty that was entered 85 (46%) times. This was lower than the expected hit rate of 97 (52%), which resulted in a kappa of .87 (Very Good). Standard scoring calculated the second-highest probability score for the medical specialty that was entered by males 27 (15%) times. This was equal to the expected hit rate of 27 (15%), which resulted in a kappa of 1.0 (Very Good). Standard scoring calculated the third-highest probability score for the medical specialty that was entered by males 13 (7%) times. This was lower than the expected hit rate of 17 (9%), which resulted in a kappa of .86 (Very Good). Standard scoring calculated the fourth-highest probability score for the medical specialty that was entered by males 13 (7%) times. This was higher than the expected hit rate of 9 (5%), which resulted in a kappa of .81 (Good). Standard scoring calculated the fifth-highest probability score for the medical specialty that was entered by males 8 (4%) times. This was higher than the expected hit rate of 6 (3%), which resulted in a kappa of .85 (Very Good). When combining the hit rates of the five highest probability scores in the sample of 186 males in the 12 groups with over 100 in the sample, standard scoring calculated the probability

183

score for the medical specialty that was entered 146 (78%) times. This was lower than the expected hit rate of 156 (84%), which resulted in a kappa of .82 (Very Good). When comparing the hit rates of females and males in the 12 groups with over 100 in the sample, scores suggest that females overall received two additional hits when compared to males. Females received higher hit rates when the selected specialty was identified by standard scoring as the second, third, fourth, and fifth-highest probability score. However, males far exceeded females when the highest probability score identified the chosen medical specialty where females obtained 15 fewer hits when compared to males. Kappa coefficients suggest that overall there is .84 (Very Good) agreement between the observed hit rates for the random sample and the expected hit rates published in the MSPI-R manual. The same very good agreement was found when comparing observed and expected hit rates for females and males.

184

Table 14 Standard Scoring Hit Rates for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender for the 12 Groups With Over 100 in the Sample Match n Observed Expected Kappa Overall First 155/372 42% 52% .80 Good Second 57/372 15% 15% 1.0 Very Good Third 34/372 9% 9% 1.0 Very Good Fourth 28/372 8% 5% .80 Good Fifth 20/372 5% 3% .70 Good Total of Top 5 294/372 79% 84% .84 Very Good Females First 70/186 38% 52% .71 Good Second 30/186 16% 15% .96 Very Good Third 21/186 11% 9% .88 Very Good Fourth 15/186 8% 5% .73 Good Fifth 12/186 6% 3% .65 Good Total of Top 5 148/186 80% 84% .86 Very Good Males First 85/186 46% 52% .87 Very Good Second 27/186 15% 15% 1.0 Very Good Third 13/186 7% 9% .86 Very Good Fourth 13/186 7% 5% .81 Good Fifth 8/186 4% 3% .85 Very Good Total of Top 5 146/186 78% 84% .82 Very Good

185

Standard Scoring Hit Rates Including Kappa Coefficients; Overall and by Gender for the 10 Groups With Under 100 in the Sample Table 15 displays standard scoring hit rates including kappa coefficients for the 128 medical students in the 10 groups with under 100 in the sample and for the 64 females and 64 males who comprised the sample. These hit rates document how often (a) the highest probability score selected the specialty entered by the medical student, (b) the second-highest probability score selected the specialty entered by the medical student, (c) the third-highest probability score selected the specialty entered by the medical student, (d) the fourth-highest probability score selected the specialty entered by the medical student and (e) the fifth-highest probability score selected the specialty entered by the medical student. In addition, kappa coefficients were calculated to determine the amount of agreement between the hit rates that were expected based upon the MSPI-R manual and the hit rates that were actually observed from the 128 medical students in the 10 groups with under 100 in the sample and for the 64 females and 64 males that comprised the sample. The highest probability score accurately identified the medical specialty that was entered by medical students in the 10 groups with under 100 in the sample 10 (8%) times. This was lower than the expected hit rate of 67 (52%), which resulted in a kappa of .14 (Poor). The second-highest probability score accurately identified the medical specialty that was entered 10 (8%) times. This was lower than the expected hit rate of 19 (15%), which resulted in a kappa of .65 (Good). The third-highest probability score accurately identified the medical specialty that was entered 7 (5%) times. This was lower than the

186

expected hit rate of 12 (9%), which resulted in a kappa of .72 (Good). The fourth-highest probability score accurately identified the medical specialty that was entered 8 (6%) times. This was higher than the expected hit rate of 6 (5%), which resulted in a kappa of .85 (Very Good). The fifth-highest probability score accurately identified the medical specialty that was entered 6 (5%) times. This was higher than the expected hit rate of 4 (3%), which resulted in a kappa of .79 (Good). When combining the hit rates of the five highest probability scores in the sample of 372 medical students in the 10 groups with under 100 in the sample, standard scoring identified the medical specialty entered 41 (32%) times. This was lower than the expected hit rate of 108 (84%), which resulted in a kappa of .16 (Poor). For the 64 females in the 10 groups with under 100 in the sample, standard scoring calculated the highest probability score for the medical specialty that was entered 4 (6%) times. This was lower than the expected hit rate of 33 (52%), which resulted in a kappa of .12 (Poor). Standard scoring calculated the second-highest probability score for the medical specialty that was entered by females 4 (6%) times. This was lower than the expected hit rate of 10 (15%), which resulted in a kappa of .53 (Moderate). Standard scoring calculated the third-highest probability score for the medical specialty that was entered by females 3 (5%) times. This was lower than the expected hit rate of 6 (9%), which resulted in a kappa of .64 (Good). Standard scoring calculated the fourth-highest probability score for the medical specialty that was entered by females 3 (5%) times. This was equal to the expected hit rate of 3 (5%), which resulted in a kappa of 1.0 (Very Good). Standard scoring calculated the fifth-highest probability score for the medical

187

specialty that was entered by females 5 (8%) times. This was higher than the expected hit rate of 2 (3%), which resulted in a kappa of .55 (Moderate). When combining the hit rates of the five highest probability scores in the sample of 64 females in the 10 groups with under 100 in the sample, standard scoring calculated the probability score for the medical specialty that was entered 19 (30%) times. This was lower than the expected hit rate of 54 (84%), which resulted in a kappa of .15 (Poor). For the 64 males in the 10 groups with under 100 in the sample, standard scoring calculated the highest probability score for the medical specialty that was entered 6 (9%) times. This was lower than the expected hit rate of 33 (52%), which resulted in a kappa of .18 (Poor). Standard scoring calculated the second-highest probability score for the medical specialty that was entered by males 6 (9%) times. This was lower than the expected hit rate of 10 (15%), which resulted in a kappa of .72 (Good). Standard scoring calculated the third-highest probability score for the medical specialty that was entered by males 4 (6%) times. This was lower than the expected hit rate of 6 (9%), which resulted in a kappa of .78 (Good). Standard scoring calculated the fourth-highest probability score for the medical specialty that was entered by males 5 (8%) times. This was higher than the expected hit rate of 3 (5%), which resulted in a kappa of .73 (Good). Standard scoring calculated the fifth-highest probability score for the medical specialty that was entered by males 1 (2%) times. This was lower than the expected hit rate of 2 (3%), which resulted in a kappa of .66 (Good). When combining the hit rates of the five highest probability scores in the sample of 64 males in the 10 groups with under 100 in the sample, standard scoring calculated the probability score for the medical specialty

188

that was entered 22 (34%) times. This was lower than the expected hit rate of 54 (84%), which resulted in a kappa of .18 (Poor). When comparing the hit rates of females and males in the in the 10 groups with under 100 in the sample, scores suggest that females overall received three fewer hits when compared to males. Females only received higher hit rates when the selected specialty was identified by standard scoring as the fifth-highest probability score. Kappa coefficients suggest that overall there is .16 (Poor) agreement between the observed hit rates for the random sample and the expected hit rates published in the MSPI-R manual. The same poor agreement was found when comparing observed and expected hit rates for females and males.

189

Table 15 Standard Scoring Hit Rates for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender for the 10 Groups With Under 100 in the Sample Match n Observed Expected Kappa Overall First 10/128 8% 52% .14 Poor Second 10/128 8% 15% .65 Good Third 7/128 5% 9% .72 Good Fourth 8/128 6% 5% .85 Very Good Fifth 6/128 5% 3% .79 Good Total of Top 5 41/128 32% 84% .16 Poor Females First 4/64 6% 52% .12 Poor Second 4/64 6% 15% .53 Moderate Third 3/64 5% 9% .64 Good Fourth 3/64 5% 5% 1.0 Very Good Fifth 5/64 8% 3% .55 Moderate Total of Top 5 19/64 30% 84% .15 Poor Males First 6/64 9% 52% .18 Poor Second 6/64 9% 15% .72 Good Third 4/64 6% 9% .78 Good Fourth 5/64 8% 5% .73 Good Fifth 1/64 2% 3% .66 Good Total of Top 5 22/64 34% 84% .18 Poor

190

Kappa Coefficients for the Top Match The kappa coefficient signifies the agreement observed beyond chance between two ratings. Table 16 displays kappa coefficients for top match hit rate accuracy between actual versus predicted medical specialty for standard scoring and person matching calculated utilizing the 150 items, the 18 scales and the 30 items. A kappa value of less than .20 represents poor agreement; between .21 and .40 represents fair agreement; between .41 and .60 represents moderate agreement; between .61 and .80 represents good agreement; and between .81 and 1.0 represents very good agreement beyond chance (Landis & Koch, 1977). The interrater reliability for standard scoring was found to be kappa = 0.33 (p < 0.001), 95% CI (0.28, 0.38). The interrater reliability for the 150 items utilizing person matching was found to be kappa = 0.18 (p < 0.001), 95% CI (0.14, 0.22). The interrater reliability for the 18 scales utilizing person matching was found to be kappa = 0.12 (p < 0.001), 95% CI (0.08, 0.16). The interrater reliability for the 30 items utilizing person matching was found to be kappa = 0.18 (p < 0.001), 95% CI (0.14, 0.22). As compared to any calculation of person matching, standard scoring obtained the highest kappa coefficient values.

191

Table 16 Kappa Coefficients for the Top Match Criteria n Standard Scoring 500 150 Items 500 18 Scales 500 30 Items 500 *p < .001

Kappa .33 Fair .18 Poor .12 Poor .18 Poor

SE .026 .021 .019 .021

p .000* .000* .000* .000*

192

Chance Expectancy Hit Rates for the Top Match by Specialty Group for Standard Scoring and Person Matching Via the 150 Items, the 18 Scales, and the 30 Items Table 17 displays the expected hit rates that would be achieved by chance for each specialty and the hit rates obtained for the top match when calculated by standard scoring and person matching utilizing the three calculations of the 150 items, the 18 scales, and the 30 items. Person matching calculated utilizing the 30 items obtained 19 out of 22 medical specialties with hit rates that were greater than chance expectancy rates. The three medical specialties below chance expectancy rates for the 30 items included Urology, Plastic Surgery, and Pediatrics/Child and Adolescent Psychiatry. Standard scoring obtained 16 out of 22 medical specialties with hit rates that were higher than chance. The six medical specialties below chance expectancy rates for standard scoring included Internal Medicine Pediatrics, Ophthalmology, Neurological Surgery, Radiation Oncology, Plastic Surgery, and Pediatrics/Child and Adolescent Psychiatry. As standard scoring does not calculate scores for six of the 22 medical specialties used in the study, it is understandable why standard scoring fell below the chance expectancy hit rates for those six listed specialties. Person matching calculated utilizing the 150 items obtained 16 out of 22 medical specialties with hit rates that were greater than chance expectancy rates. The six medical specialties below chance expectancy rates for the 150 items included Ophthalmology, Physical Medicine and Rehabilitation, Neurological Surgery, Radiation Oncology, Plastic Surgery, and Pediatrics/Child and Adolescent Psychiatry. Person matching calculated utilizing the18 scales obtained 15 out of 22 medical specialties with hit rates that were greater than chance expectancy rates. The

193

seven medical specialties below chance expectancy rates for the 18 scales included Radiology, Otolaryngology, Dermatology, Urology, Radiation Oncology, Plastic Surgery, and Pediatrics/Child and Adolescent Psychiatry. Additionally, Plastic Surgery and Pediatrics/Child and Adolescent Psychiatry were the only two medical specialties that did not score hit rates higher than chance expectancy rates for any of the four scoring methods. All four of the scoring methods either performed better than chance expectancy rates or did not match an individual to a specific medical specialty because when any calculation fell below chance expectancy rates, the score was always 0.

194

Table 17 Chance Expectancy Hit Rates for Top Match by Specialty Group for Standard Scoring and Person Matching Via the 150 Items, the 18 Scales, and the 30 Items Person Matching Medical Specialty n Chance Standard 150 18 30 Internal Medicine 56/500 .11 .63 .30 .27 .36 Pediatrics 44/500 .09 .59 .30 .25 .41 Emergency Medicine 40/500 .08 .43 .38 .25 .25 Family Medicine 40/500 .08 .53 .35 .25 .28 Obstetrics/Gynecology 30/500 .06 .40 .27 .27 .33 Surgery 30/500 .06 .47 .27 .20 .27 Anesthesiology 24/500 .05 .25 .08 .21 .21 Psychiatry 24/500 .05 .42 .25 .29 .33 Orthopedic Surgery 24/500 .05 .29 .46 .29 .25 Radiology 20/500 .04 .15 .15 .00 .10 Pathology 20/500 .04 .20 .20 .15 .10 Internal Medicine Pediatrics 20/500 .04 .00 .05 .20 .15 Otolaryngology 14/500 .03 .29 .21 .00 .14 Ophthalmology 14/500 .03 .00 .00 .07 .07 Neurology 14/500 .03 .07 .14 .07 .07 Dermatology 14/500 .03 .14 .21 .00 .14 Physical Medicine & Rehabilitation 14/500 .03 .14 .00 .07 .21 Urology 14/500 .03 .07 .07 .00 .00 Neurological Surgery 14/500 .03 .00 .00 .07 .14 Radiation Oncology 10/500 .02 .00 .00 .00 .10 Plastic Surgery 10/500 .02 .00 .00 .00 .00 Peds/Child & Adolescent Psychiatry 10/500 .02 .00 .00 .00 .00 Number of Specialties Below Chance Expectancy Rate 6 6 7 3

195

Descriptive and Inferential Statistics for the 22 Medical Specialties Appendix B contains 88 tables, which display the hit rates, means, standard deviations, top match scores, and kappa coefficients overall and by gender for the 22 medical specialties. This section will provide a brief overview of the information in those tables including the accuracy for each calculation of person matching (the 150 items, the 18 scales, and the 30 items) for singular (at least one match in the top 1, 5, 10 and 20 matches representing the specialty entered by the medical student) and dominant (a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student) hit rates as well as the hit rates for standard scoring. The most accurate singular hit rate for the 24 medical students who entered Anesthesiology was calculated utilizing the 18 scales and the top 20 matches where 19 (79%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 30 items and the top 10 matches where 9 (38%) had a majority of the matches representing Anesthesiology. Standard scoring obtained a hit rate of 17 (71%) when placing Anesthesiology in one of the top 5 calculated percentages. The most accurate singular hit rate for the 14 medical students who entered Dermatology was calculated utilizing the 150 items and the top 20 matches where 10 (71%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 150 items and the top 5 matches where 4 (29%) had a majority of the matches representing Dermatology. Standard scoring obtained a hit rate of 9 (64%) when placing Dermatology in one of the top 5 calculated percentages.

196

The most accurate singular hit rate for the 40 medical students who entered Emergency Medicine was calculated utilizing the 18 scales and the top 20 matches where 40 (100%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 150 items and the top 20 matches where 22 (55%) had a majority of the matches representing Emergency Medicine. Standard scoring obtained a hit rate of 34 (85%) when placing Emergency Medicine in one of the top 5 calculated percentages. The most accurate singular hit rate for the 40 medical students who entered Family Medicine was calculated utilizing the 30 items and the top 20 matches where 37 (93%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 30 items and the top 20 matches where 22 (55%) had a majority of the matches representing Family Medicine. Standard scoring obtained a hit rate of 35 (88%) when placing Family Medicine in one of the top 5 calculated percentages. The most accurate singular hit rate for the 56 medical students who entered Internal Medicine was calculated utilizing the 18 scales and the top 20 matches as well as the 30 items and the top 20 matches where 56 (100%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 150 items and the top 10 matches as well as the 150 items and the top 20 matches where 38 (68%) had a majority of the matches representing Internal Medicine. Standard scoring obtained a hit rate of 55 (98%) when placing Internal Medicine in one of the top 5 calculated percentages. The most accurate singular hit rate for the 14 medical students who entered Neurological Surgery was calculated utilizing the 30 items and the top 20 matches where 9 (64%) had at least one match. The most accurate dominant hit rate was calculated

197

utilizing the 18 scales and the top 10 matches, the 30 items and the top 10 matches, as well as the 30 items and the top 20 matches where 1 (7%) had a majority of the matches representing Neurological Surgery. Standard scoring obtained a hit rate of 0 (0%) when placing Neurological Surgery in one of the top 5 calculated percentages. The most accurate singular hit rate for the 14 medical students who entered Neurology was calculated utilizing the 30 items and the top 20 matches where 6 (43%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 150 items and the top 5 matches, the 18 scales and the top 5 matches, the 30 items and the top 5 matches, the 18 scales and the top 10 matches, the 30 items and the top 10 matches, as well as the 30 items and the top 20 matches where 1 (7%) had a majority of the matches representing Neurology. Standard scoring obtained a hit rate of 7 (50%) when placing Neurology in one of the top 5 calculated percentages. The most accurate singular hit rate for the 30 medical students who entered Obstetrics and Gynecology was calculated utilizing the 30 items and the top 20 matches where 28 (93%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 18 scales and the top 10 matches, the 30 items and the top 10 matches, as well as the 30 items and the top 20 matches where 14 (47%) had a majority of the matches representing Obstetrics and Gynecology. Standard scoring obtained a hit rate of 23 (77%) when placing Obstetrics and Gynecology in one of the top 5 calculated percentages. The most accurate singular hit rate for the 14 medical students who entered Ophthalmology was calculated utilizing the 18 scales and the top 20 matches as well as

198

the 30 items and the top 20 matches where 6 (43%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 30 items and the top 5 matches, the 30 items and the top 10 matches, as well as the 30 items and the top 20 matches where 1 (7%) had a majority of the matches representing Ophthalmology. Standard scoring obtained a hit rate of 0 (0%) when placing Ophthalmology in one of the top 5 calculated percentages. The most accurate singular hit rate for the 24 medical students who entered Orthopedic Surgery was calculated utilizing the 18 scales and the top 10 matches as well as the 150 items and the top 20 matches where 23 (96%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 150 items and the top 20 matches where 13 (54%) had a majority of the matches representing Orthopedic Surgery. Standard scoring obtained a hit rate of 17 (71%) when placing Orthopedic Surgery in one of the top 5 calculated percentages. The most accurate singular hit rate for the 14 medical students who entered Otolaryngology was calculated utilizing the 18 scales and the top 20 matches where 8 (57%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 30 items and the top 5 matches, the 150 items and the top 10 matches, the 18 scales and the top 10 matches, the 30 items and the top 10 matches, the 18 scales and the top 20 matches, as well as the 30 items and the top 20 matches where 1 (7%) had a majority of the matches representing Otolaryngology. Standard scoring obtained a hit rate of 6 (43%) when placing Otolaryngology in one of the top 5 calculated percentages.

199

The most accurate singular hit rate for the 20 medical students who entered Pathology was calculated utilizing the 18 scales and the top 20 matches where 13 (65%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 150 items and the top 10 matches as well as the 18 scales and the top 20 matches where 5 (25%) had a majority of the matches representing Pathology. Standard scoring obtained a hit rate of 11 (55%) when placing Pathology in one of the top 5 calculated percentages. The most accurate singular hit rate for the 44 medical students who entered Pediatrics was calculated utilizing the 30 items and the top 20 matches where 42 (95%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 30 items and the top 20 matches where 26 (59%) had a majority of the matches representing Pediatrics. Standard scoring obtained a hit rate of 41 (93%) when placing Pediatrics in one of the top 5 calculated percentages. The most accurate singular hit rate for the 14 medical students who entered Physical Medicine was calculated utilizing the 30 items and the top 20 matches where 7 (50%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 30 items and the top 10 matches where 3 (21%) had a majority of the matches representing Physical Medicine. Standard scoring obtained a hit rate of 12 (86%) when placing Physical Medicine in one of the top 5 calculated percentages. The most accurate singular hit rate for the 10 medical students who entered Plastic Surgery was calculated utilizing the 18 scales and the top 10 matches as well as the 18 scales and the top 20 matches where 1 (10%) had at least one match. No accurate dominant hit rate was calculated by any person matching method. Standard scoring

200

obtained a hit rate of 0 (0%) when placing Plastic Surgery in one of the top 5 calculated percentages. The most accurate singular hit rate for the 24 medical students who entered Psychiatry was calculated utilizing the 150 items and the top 20 matches as well as the 18 scales and the top 20 matches where 18 (75%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 18 scales and the top 5 matches as well as the 30 items and the top 10 matches where 10 (42%) had a majority of the matches representing Psychiatry. Standard scoring obtained a hit rate of 20 (83%) when placing Psychiatry in one of the top 5 calculated percentages. The most accurate singular hit rate for the 20 medical students who entered Radiology was calculated utilizing the 150 items and the top 20 matches where 18 (90%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 150 items and the top 10 matches where 6 (30%) had a majority of the matches representing Radiology. Standard scoring obtained a hit rate of 15 (75%) when placing Radiology in one of the top 5 calculated percentages. The most accurate singular hit rate for the 10 medical students who entered Radiation Oncology was calculated utilizing the 18 scales and the top 20 matches where 2 (20%) had at least one match. No accurate dominant hit rate was calculated by any person matching method. Standard scoring obtained a hit rate of 0 (0%) when placing Radiation Oncology in one of the top 5 calculated percentages. The most accurate singular hit rate for the 30 medical students who entered Surgery was calculated utilizing the 150 items and the top 20 matches as well as the 30

201

items and the top 20 matches where 28 (93%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 30 items and the top 10 matches as well as the 30 items and the top 20 matches where 12 (40%) had a majority of the matches representing Surgery. Standard scoring obtained a hit rate of 26 (87%) when placing Surgery in one of the top 5 calculated percentages. The most accurate singular hit rate for the 14 medical students who entered Urology was calculated utilizing the 30 items and the top 20 matches where 5 (36%) had at least one match. No accurate dominant hit rate was calculated by any person matching method. Standard scoring obtained a hit rate of 7 (50%) when placing Urology in one of the top 5 calculated percentages. The most accurate singular hit rate for the 20 medical students who entered Internal Medicine Pediatrics was calculated utilizing the 150 items and the top 20 matches as well as the 30 items and the top 20 matches where 15 (75%) had at least one match. The most accurate dominant hit rate was calculated utilizing the 30 items and the top 5 matches where 3 (15%) had a majority of the matches representing Internal Medicine Pediatrics. Standard scoring obtained a hit rate of 0 (0%) when placing Internal Medicine Pediatrics in one of the top 5 calculated percentages. The most accurate singular hit rate for the 10 medical students who entered Pediatrics/Child and Adolescent Psychiatry was calculated utilizing the 150 items and the top 10 matches as well as the 150 items and the top 20 matches where 2 (20%) had at least one match. No accurate dominant hit rate was calculated by any person matching

202

method. Standard scoring obtained a hit rate of 0 (0%) when placing Pediatrics/Child and Adolescent Psychiatry in one of the top 5 calculated percentages. When utilizing person matching to maximize singular (at least one match in the top 1, 5, 10 and 20 matches representing the specialty entered by the medical student) hit rates, the 30 items and the top 20 matches were the most accurate. The second most accurate singular hit rates were achieved with the 18 scales and the top 20 matches. When utilizing person matching to maximize dominant (a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student) hit rates, the 30 items and the top 10 matches were the most accurate. The second most accurate dominant hit rates were achieved with the 30 items and the top 20 matches. Summary This chapter presented the results of the descriptive and inferential analyses for the study. The descriptive analysis suggests that the highest singular hit rates for person matching were calculated when utilizing the 30 items for the 500 medical students in the random sample and the 250 females and 250 males comprising the sample. For the 372 medical students in the 12 groups with over 100 in the sample and the 186 males in this sample, the highest singular hit rates for person matching were calculated when utilizing the 30 items. However, for the 186 females in the 12 groups with over 100 in the sample, the 150 items obtained the highest singular hit rates. For the 128 medical students in the 10 groups with under 100 in the sample and the 64 females and 64 males comprising the sample, the highest singular hit rates for person matching were calculated when utilizing the 30 items.

203

The descriptive analysis suggests that the highest dominant hit rates for person matching were calculated when utilizing the 30 items for the 500 medical students in the random sample and the 250 females and 250 males comprising the sample. For the 372 medical students in the 12 groups with over 100 in the sample and the 186 males in this sample, the highest dominant hit rates for person matching were calculated when utilizing the 30 items. However, for the 186 females in the 12 groups with over 100 in the sample, the 150 items obtained the highest dominant hit rates. For the 128 medical students in the 10 groups with under 100 in the sample and the 64 females and 64 males comprising the sample, the highest dominant hit rates for person matching were calculated when utilizing the 30 items. Males had lower means, standard deviations, and top match scores than females in the total random sample, in the 12 groups with over 100 in the sample, and the 10 groups with under 100 in the sample when calculated utilizing the 150 items, the 18 scales, and the 30 items with only one exception. When calculated utilizing the 150 items in the 10 groups with under 100 in the sample, females had the closest top match score when compared to males in the same sample. When comparing the hit rates of females and males for standard scoring, scores suggest that females overall received three additional hits when compared to males. Females received higher hit rates when the selected specialty was identified by standard scoring as the second, third, fourth, or fifth-highest probability score. However, males far exceeded females when the highest probability score identified the chosen medical specialty where females obtained 16 fewer hits when compared to males. Kappa

204

coefficients suggest that overall there is .54 (Moderate) agreement between the observed hit rates for the random sample and the expected hit rates published in the MSPI-R manual. The same moderate agreement was found when comparing observed and expected hit rates for females and males. When comparing the hit rates of females and males for standard scoring in the 12 groups with over 100 in the sample, scores suggest that females overall received two additional hits when compared to males. Females received higher hit rates when the selected specialty was identified by standard scoring as the second, third, fourth and fifth-highest probability score. However, males far exceeded females when the highest probability score identified the chosen medical specialty where females obtained 15 fewer hits when compared to males. Kappa coefficients suggest that overall there is .84 (Very Good) agreement between the observed hit rates for the 12 groups with over 100 in the sample and the expected hit rates published in the MSPI-R manual. The same very good agreement was found when comparing observed and expected hit rates for females and males. When comparing the hit rates of females and males in the in the 10 groups with under 100 in the sample for standard scoring, scores suggest that females overall received three fewer hits when compared to males. Females only received higher hit rates when the selected specialty was identified by standard scoring as the fifth-highest probability score. Kappa coefficients suggest that overall there is .16 (Poor) agreement between the observed hit rates for the 10 groups with under 100 in the sample and the expected hit

205

rates published in the MSPI-R manual. The same poor agreement was found when comparing observed and expected hit rates for females and males. The kappa coefficient was calculated for all four methods with the top match to signify the agreement observed beyond chance between the two ratings of predicted and actual medical specialty. Standard scoring obtained a .33 (Fair) kappa value, the 150 items obtained a .18 (Poor) kappa value, the 18 scales obtained a .12 (Poor) kappa value, and the 30 items obtained a .18 (Poor) kappa value. Further, all four methods either performed better than the chance expectancy rate or did not match an individual to a specific medical specialty. When any calculation fell below the chance expectancy rate, the score was always zero. Chapter four interprets the results found in this chapter and makes recommendations based upon the data.

CHAPTER IV DISCUSSION Researchers are challenged with improving the theory and science behind interest inventories currently used to predict an individual’s career interest or specialty preference within an occupation. Strong’s empirical method, Holland’s theoretical method, and Kuder’s rational and person matching methods are currently used as the psychometric scoring methodologies for interest inventories. However, person matching has received negligible research attention. The current study investigated (a) how to best perform the person matching psychometric scoring methodology and (b) the comparison of predictive hit rates of the Medical Specialty Preference Inventory-Revised (MSPI-R) utilizing two different psychometric scoring methodologies. Specifically, the research investigated if the predictive hit rates for the Medical Specialty Preference Inventory-Revised (MSPI-R) would increase by changing its current modernist scoring system, based upon the occupational groups of Strong, to a post-modernist scoring system, based upon Kuder’s person matching model. To understand how to use person matching as a psychometric scoring methodology, this study examined (a) how best to calculate person matching (item, scale, or factor level), (b) how many person matches to consider for each test-taker (1, 5, 10, or 20), (c) how person matching compares to standard scoring on the same interest inventory with the same participants, (d) how gender differences compare between person matching and standard scoring, and (e) if predictive hit rates increase if the data is scored by combining the person matching model with aspects of Strong’s occupational group matching model. 206

207

The next section will discuss the conclusions drawn about each hypothesis along with other conclusions based upon the data. Conclusions Hypotheses The data suggested several conclusions to the five hypotheses presented. First, the inclusion of all 150 raw item scores when person matching produced more accurate hit rates than when using the 18 scale scores. These results suggest that person matching on the item level is more accurate than person matching on the factor or scale level. Second, using the top 20 person matches produced the highest hit rates. Third, standard scoring outperformed person matching for the top match. However, the 150 items and the 30 items were able to outperform standard scoring when looking beyond the top match to offer medical students a few medical specialties to research for further consideration. Fourth, gender differences were less pronounced for person matching than standard scoring. Fifth, the predictive hit rates were slightly higher when combining standard scoring and person matching psychometric scoring methodologies. The proposed conclusions to the five hypotheses presented above will be fully explained in the following section. First hypothesis. The first hypothesis stated that person matching achieved the highest hit rates on the MSPI-R when scored utilizing the 150 items. The MSPI-R generates (a) raw scores for the 150 items, (b) the 18 scale scores, and (c) the 16 probability scores for each test-taker. Since the 16 probability scores are based upon the work of Strong and are not truly raw scores since they describe how certain we are that

208

an event will likely occur, only the 150 raw scores and the 18 scale scores test for this hypothesis with this interest inventory. Therefore, the first hypothesis compared person matching scored with the 150 items against person matching scored with the 18 scales using the MSPI-R. The researcher believed that person matching using the 150 items would achieve more accurate hit rates because there were more scores calculated. The data from this research suggested that the 150 items achieved higher hit rates than the 18 scales in 61 out of 72 calculations (three comparisons garnered equal results). There were eight exceptions where the 18 scales outperformed the 150 items: (a) singular hit rates for males in the 12 groups with over 100 in the sample when looking at the top 10 person matches, (b) singular hit rates for females in the 10 groups with under 100 in the sample when looking at the top 10 and top 20 person matches, (c) dominant hit rates overall in the 10 groups with under 100 in the sample when looking at the top 20 person matches, (d) dominant hit rates for females in the 10 groups with under 100 in the sample when looking at the top 10 and 20 person matches, and (e) overall and for females where the four different medical specialties are dominant across the top 1, 5, 10, and 20. Data suggest that overall and for both genders, the inclusion of all 150 raw scores when person matching produces more accurate hit rates than when using the 18 scale scores. Second hypothesis. The second hypothesis stated that person matching achieved the highest hit rates on the MSPI-R when scored utilizing the top 20 person matches. The second hypothesis compared the hit rates of person matching when looking at (a) singular matches for the top match, the top 5 matches, the top 10 matches, and the top 20 matches and (b) dominance in the top 5 matches, the top 10 matches, and the top 20 matches. The

209

researcher believed that person matching using the top 20 matches would achieve more accurate hit rates because there were more chances for success. The data from this research suggested that for singular person matches for the 150 items, the 18 scales, and the 30 items, the highest singular hit rates were achieved using the top 20 matches. The recommendations for the number of matches to consider for dominance in the top 5, top 10, and top 20 change depending upon how they are calculated. For the 18 scales and the 30 items, dominance in the top 10 matches was the most accurate. There were significant variations in dominance for the 150 items. Overall and for males, calculating dominance in the top 10 matches using the 150 items obtains more accurate hit rates. Females had more accuracy with the top 20 dominant matches. Males in the 12 groups with over 100 in the sample had higher hit rates with dominance in the top 10 matches using the 150 items, while females had higher hit rates with dominance the top 20 matches. The 10 groups with under 100 in the sample (overall and for both genders) had higher rates with dominance in the top 5 matches. A formal conclusion could not be drawn about accurate dominant hit rates calculated using the 150 items. Third hypothesis. The third hypothesis stated that person matching achieved higher hit rates on the MSPI-R than standard scoring. The researcher believed that person matching would achieve more accurate hit rates because more of the items on the interest inventory would be utilized in scoring. In some ways, the data would suggest that person matching achieved higher hit rates than standard scoring. In other ways, the data would suggest that standard scoring achieved higher hit rates than person matching.

210

There are several reasons for this tenuous outcome. Both psychometric scoring methodologies take an entirely different approach to measuring people and each method has distinct advantages and disadvantages. Standard scoring offers the test-taker the percent likelihood that they will enter one of 16 medical specialties. Person matching compares the test-taker individually to all 5,142 members of the reference group (who can belong to an infinite number of medical specialties) and offers the test-taker the 5, 10, or 20 closest matches, which includes both the medical specialty and an autobiography of the individual’s career and lifestyle. While standard scoring is based upon a modernist methodology and is concerned with obtaining the one right answer, person matching is based upon a postmodernist methodology and is concerned with offering the test-taker a narrative to help them write their preferred future. In essence, is it better to be told what you will or should do, or is it better to be given information to help you write your preferred future? These two divergent ways of measuring people makes it extremely difficult to make a direct comparison of which performs better in terms of predictive hit rates. Despite the difficulties in answering this hypothesis, the answer will attempt to be presented comparing each calculation of person matching (the 150 items, the 18 scales, and the 30 items) to standard scoring. The 150 items. When comparing the 150 items to standard scoring for the random sample, standard scoring obtained 11% more hits than person matching for the top match. Standard scoring predicting the medical specialty as the two highest probability scores obtained 4% fewer hits than person matching singular hit rates for the top 5 matches. Standard scoring predicting the medical specialty as the four highest

211

probability scores equaled person matching singular hit rates for the top 10 matches. Standard scoring predicting the medical specialty as the five highest probability scores obtained 5% fewer hits than person matching singular hit rates for the top 20 matches. Person matching’s dominant hit rate performance fell below standard scoring’s top match predictive hit rate for the top 5 (5% below), 10 (1% below), and 20 (2% below) matches. When comparing the 150 items to standard scoring for the 12 groups with over 100 in the sample, standard scoring obtained 15% more hits than person matching for the top match. Standard scoring predicting the medical specialty as the two highest probability scores obtained 5% fewer hits than person matching singular hit rates for the top 5 matches. Standard scoring predicting the medical specialty as the four highest probability scores obtained equal hit rates when compared to person matching singular hit rates for the top 10 matches. Standard scoring predicting the medical specialty as the five highest probability scores obtained 7% fewer hits than person matching singular hit rates for the top 20 matches. Dominant hit rate performance equaled or fell below standard scoring’s top match predictive hit rate for the top 5 (8% below), 10 (equaled), and 20 (equaled) matches. When comparing the 150 items to standard scoring for the 10 groups with under 100 in the sample, standard scoring obtained 1% more hits than person matching for the top match. Standard scoring predicting the medical specialty as the two highest probability scores obtained 1% fewer hits than person matching singular hit rates for the top 5 matches. Standard scoring predicting the medical specialty as the four highest probability scores obtained 6% more hits than person matching singular hit rates for the

212

top 10 matches. Standard scoring predicting the medical specialty as the five highest probability scores obtained 2% fewer hits than person matching singular hit rates for the top 20 matches. Dominant hit rate performance fell below standard scoring’s top match predictive hit rate for the top 5 (4% below), 10 (5% below), and 20 (8% below) matches. The 18 scales. When comparing the 18 scales to standard scoring for the random sample, standard scoring obtained 15% more hits than person matching for the top match. Standard scoring predicting the medical specialty as the two highest probability scores obtained 2% more hits than person matching singular hit rates for the top 5 matches. Standard scoring predicting the medical specialty as the four highest probability scores obtained 1% more hits than person matching singular hit rates for the top 10 matches. Standard scoring predicting the medical specialty as the five highest probability scores obtained 4% fewer hits than person matching singular hit rates for the top 20 matches. Person matching’s dominant hit rate performance fell below standard scoring’s top match predictive hit rate for the top 5 (12% below), 10 (6% below), and 20 (7% below) matches. When comparing the 18 scales to standard scoring for the 12 groups with over 100 in the sample, standard scoring obtained 19% more hits than person matching for the top match. Standard scoring predicting the medical specialty as the two highest probability scores obtained 2% more hits than person matching singular hit rates for the top 5 matches. Standard scoring predicting the medical specialty as the four highest probability scores obtained 1% more hits than person matching singular hit rates for the top 10 matches. Standard scoring predicting the medical specialty as the five highest

213

probability scores obtained 6% fewer hits than person matching singular hit rates for the top 20 matches. Dominant hit rate performance fell below standard scoring’s top match predictive hit rate for the top 5 (5% below), 10 (1% below), and 20 (2% below) matches. When comparing the 18 scales to standard scoring for the 10 groups with under 100 in the sample, standard scoring obtained 5% more hits than person matching for the top match. Standard scoring predicting the medical specialty as the highest score obtained 5% more hits than person matching singular hit rates for the top 5 matches. Standard scoring predicting the medical specialty as the four highest probability scores obtained 7% more hits than person matching singular hit rates for the top 10 matches. Standard scoring predicting the medical specialty as the five highest probability scores obtained 2% more hits than person matching singular hit rates for the top 20 matches. Dominant hit rate performance fell below standard scoring’s top match predictive hit rate for the top 5 (6% below), 10 (5% below), and 20 (7% below) matches. The 30 items. When comparing the 30 items to standard scoring for the random sample, standard scoring obtained 10% more hits than person matching for the top match. Standard scoring predicting the medical specialty as the two highest probability scores obtained 4% fewer hits than person matching singular hit rates for the top 5 matches. Standard scoring predicting the medical specialty as the four highest probability scores obtained 2% fewer hits than person matching singular hit rates for the top 10 matches. Standard scoring predicting the medical specialty as the five highest probability scores obtained 7% fewer hits than person matching singular hit rates for the top 20 matches. Person matching’s dominant hit rate performance fell above or below standard scoring’s

214

top match predictive hit rate for the top 5 (6% below), 10 (1% above), and 20 (2% below) matches. When comparing the 30 items to standard scoring for the 12 groups with over 100 in the sample, standard scoring obtained 14% more hits than person matching for the top match. Standard scoring predicting the medical specialty as the two highest probability scores obtained 5% fewer hits than person matching singular hit rates for the top 5 matches. Standard scoring predicting the medical specialty as the four highest probability scores obtained 1% fewer hits than person matching singular hit rates for the top 10 matches. Standard scoring predicting the medical specialty as the five highest probability scores obtained 6% fewer hits than person matching singular hit rates for the top 20 matches. Dominant hit rate performance fell above or below standard scoring’s top match predictive hit rate for the top 5 (7% below), 10 (1% above), and 20 (2% below) matches. When comparing the 30 items to standard scoring for the 10 groups with under 100 in the sample, standard scoring obtained 1% fewer hits than person matching for the top match. Standard scoring predicting the medical specialty as the two highest probability scores equaled person matching singular hit rates for the top 5 matches. Standard scoring predicting the medical specialty as the four highest probability scores equaled person matching singular hit rates for the top 10 matches. Standard scoring predicting the medical specialty as the five highest probability scores obtained 8% fewer hits than person matching singular hit rates for the top 20 matches. Dominant hit rate

215 performance fell below standard scoring’s top match predictive hit rate for the top 5 (4% below), 10 (2% below), and 20 (3% below) matches. Summary. In closing, standard scoring performed better than any calculation of person matching (the 150 items, the 18 scales, or the 30 items) for the top match with one exception. Standard scoring was outperformed by 1% for the top match when calculated using the 30 items for the 10 groups with under 100 in the sample. Second, standard scoring’s hit rate for the two highest probability scores obtained (a) lower hit rates than person matching’s top 5 singular hit rate for the 150 items, (b) higher hit rates than person matching’s top 5 singular hit rate for the 18 scales, and (c) lower or equal hit rates than person matching’s top 5 singular hit rate for the 30 scales. Third, standard scoring’s hit rate for the four highest probability scores obtained (a) equal or higher hit rates than person matching’s top 10 singular hit rate for the 150 items, (b) higher hit rates than person matching’s top 10 singular hit rate for the 18 scales, and (c) lower or equal hit rates than person matching’s top 10 singular hit rate for the 30 scales. Fourth, standard scoring’s five highest probability scores were outperformed by person matching’s top 20 singular hit rate with one exception. When calculating the top 20 singular matches using the 18 scales with the 10 groups with under 100 in the sample, person matching obtained 2% fewer hits than standard scoring’s five highest scores. Lastly, dominance in the top 5, 10, and 20 matches for person matching using the 150 items performed equal to or fell below standard scoring’s top match hit rates, the 18 scales performed below standard scoring’s top match hit rates, and the 30 items performed above or below standard scoring’s top match hit rates. Overall, the results suggest that there are times when

216

person matching has higher singular hit rates than standard scoring and there are times when standard scoring has higher hit rates than any calculation of person matching. Person matching’s dominance in the top 5, 10, and 20 matches equates roughly to standard scoring’s top match. Fourth hypothesis. The fourth hypothesis stated that gender differences in hit rates were less pronounced for person matching on the MSPI-R than for standard scoring. The researcher believed that person matching would achieve more gender balanced hit rates because women were an equal part of the reference group for person matching and would therefore be less likely to be marginalized by the psychometric scoring methodology. This hypothesis will be examined from (a) the vantage point of the top match, (b) the combination of the two highest probability scores for standard scoring and the top 5 matches for person matching, (c) the combination of the four highest probability scores for standard scoring and the top 10 matches for person matching, and (d) the combination of the five highest probability scores for standard scoring and the top 20 matches for person matching. First, when looking at the entire random sample, standard scoring demonstrated a 6% increase in top match hit rates for males. Person matching using the 150 items demonstrated equal top match hit rates for females and males. Person matching using the 18 scales and the 30 items demonstrated a 2% increase in top match hit rates for males. When looking at the 12 groups with over 100 in the sample, standard scoring demonstrated an 8% increase in top match hit rates for males. Person matching using the 150 items and the 30 items demonstrated a 1% increase in top match hit rates for females.

217

Person matching using the 18 scales demonstrated a 3% increase in top match hit rates for males. When looking at the 10 groups with under 100 in the sample, standard scoring demonstrated a 3% increase in top match hit rates for males. Person matching using the 150 items demonstrated a 2% increase in top match hit rates for males. Person matching using the 18 scales demonstrated equal top match hit rates for females and males. Person matching using the 30 items demonstrated a 9% increase in top match hit rates for males. The second consideration was a comparison of hit rates for the two highest probability scores for standard scoring and top 5 singular hit rates for person matching. When looking at the entire random sample, standard scoring demonstrated a 5% increase in top match hit rates for males when the medical specialty was one of the two highest probability scores. Person matching using the 150 items demonstrated a 2% increase in top 5 singular hit rates for males. Person matching using the 18 scales demonstrated a 5% increase in top 5 singular hit rates for males. Person matching using the 30 items demonstrated a 3% increase in top 5 singular hit rates for males. When looking at the 12 groups with over 100 in the sample, standard scoring demonstrated a 7% increase in the two highest probability scores for males. Person matching using the 150 items demonstrated a 1% increase in top 5 singular hit rates for females. Person matching using the 18 scales demonstrated an 8% increase in top 5 singular hit rates for males. Lastly, person matching using the 30 items demonstrated a 3% increase in top 5 singular hit rates for males.

218

When looking at the 10 groups with under 100 in the sample, standard scoring demonstrated a 6% increase in the two highest probability scores for males. Person matching using the 150 items demonstrated a 9% increase in top 5 singular hit rates for males. Person matching using the 18 scales demonstrated equal top 5 singular hit rates for females and males. Person matching using the 30 items demonstrated a 1% increase in top 5 singular hit rates for males. The third comparison is the hit rates for the four highest probability scores for standard scoring and top 10 singular hit rates for person matching. When looking at the entire random sample, standard scoring demonstrated a 2% increase in top match hit rates for males when the medical specialty was one of the four highest probability scores. Person matching using the 150 items demonstrated a 2% increase in top 10 singular hit rates for males. Person matching using the 18 scales demonstrated a 3% increase in top 10 singular hit rates for males. Person matching using the 30 items demonstrated a 2% increase in top 10 singular hit rates for males. When looking at the 12 groups with over 100 in the sample, standard scoring demonstrated a 2% increase in the four highest probability scores for males. Person matching using the 150 items demonstrated a 1% increase in top 10 singular hit rates for males. Person matching using the 18 scales demonstrated a 6% increase in top 10 singular hit rates for males. Lastly, person matching using the 30 items demonstrated a 1% increase in top 10 singular hit rates for males. When looking at the 10 groups with under 100 in the sample, standard scoring demonstrated a 10% increase in the four highest probability scores for males. Person

219

matching using the 150 items demonstrated a 4% increase in top 10 singular hit rates for males. Person matching using the 18 scales demonstrated a 3% increase in top 10 singular hit rates for females. Person matching using the 30 items demonstrated a 5% increase in top 10 singular hit rates for males. The fourth comparison examines the hit rates for the five highest probability scores for standard scoring and top 20 singular hit rates for person matching. When looking at the entire random sample, standard scoring demonstrated equal hit rates for females and males when the medical specialty was one of the five highest probability scores. Person matching using the 150 items demonstrated a 2% increase in top 20 singular hit rates for males. Person matching using the 18 scales demonstrated a 1% increase in top 20 singular hit rates for males. Person matching using the 30 items demonstrated a 2% increase in top 20 singular hit rates for males. When looking at the 12 groups with over 100 in the sample for gender differences in hit rates, standard scoring demonstrated a 2% increase for females when the medical specialty was one of the five highest probability scores. Person matching using the 150 items demonstrated equal top 20 singular hit rates for females and males. Person matching using the 18 scales demonstrated a 1% increase in top 20 singular hit rates for males. Person matching using the 30 items demonstrated a 1% increase in top 20 singular hit rates for females. When looking at the 10 groups with under 100 in the sample for gender differences in hit rates, standard scoring demonstrated a 4% increase for males when the medical specialty was one of the five highest probability scores. Person matching using

220

the 150 items demonstrated a 7% increase in top 20 singular hit rates for males. Person matching using the 18 scales demonstrated equal top 20 singular hit rates for females and males. Person matching using the 30 items demonstrated an 11% increase in top 20 singular hit rates for males. The results suggest that standard scoring obtains the largest gender differences in hit rates for the highest and two highest probability scores overall, the highest probability score and the five highest probability scores for the 12 groups with over 100 in the sample, and the top four probability scores for the 10 groups with under 100 in the sample. The 150 items obtains the largest gender differences in hit rates for the top 5 matches for the 10 groups with over 100 in the sample. The 18 scales obtains the largest gender differences in hit rates for the top 5 and top 10 matches overall and for the top 5 and top 10 matches for the 12 groups with over 100 in the sample. The 30 items obtains the largest gender differences in hit rates for the top 20 matches overall and the top match and the top 20 matches for the 10 groups with under 100 in the sample. If the four calculations are rank ordered from most to least often obtaining gender differences, standard scoring obtains the most differences, the 18 scales is next, the 30 items follows, and the 150 items obtains the least amount of gender differences. In conclusion, person matching is able to garner more gender balanced hit rates than standard scoring. Fifth hypothesis. The fifth hypothesis stated that predictive hit rates are at their highest on the MPSI-R when combining standard scoring and person matching psychometric scoring methodologies. The combination of the two methodologies is found in the person matching calculation using the 30 items since those 30 items were

221

found through discriminate function analysis to calculate the 16 probability scores for standard scoring. Comparisons will be demonstrated between the 30 items and (a) the 150 items, (b) the 18 scales, and (c) standard scoring. The researcher believed that person matching using the 30 items would achieve more accurate hit rates because the best from both psychometric scoring methodologies would be used to calculate final scores for test-takers. The data from this research suggested that the 30 items achieved higher hit rates than the 150 items and the 18 scales in 44 out of 72 calculations (seven comparisons with the 150 items garnered equal results and one comparison with the 18 scales garnered equal results). There were 18 exceptions where the 150 items outperformed the 30 items: (a) singular hit rates for females when looking at the top 5 matches, (b) singular hit rates for the 12 groups with over 100 in the sample when looking at the top 20 person matches, (c) singular hit rates for females in the 12 groups with over 100 in the sample when looking at the top 5 matches, (d) singular hit rates for males in the 12 groups with over 100 in the sample when looking at the top 20 matches, (e) singular hit rates for the 10 groups with under 100 in the sample when looking at the top 5 matches, (f) singular hit rates for females in the 10 groups with under 100 in the sample when looking at the top match, (g) singular hit rates for males in the 10 groups with under 100 in the sample when looking at the top 5 matches, (h) dominant hit rates in the random sample when looking at the top 10 matches, (i) dominant hit rates for females in the random sample when looking at the top 5 and top 20 matches, (j) dominant hit rates for the 12 groups with over 100 in the sample when looking at the top 20 matches, (k) dominant hit rates

222

for females in the 12 groups with over 100 in the sample when looking at the top 5 and 20 matches, (l) dominant hit rates for males in the 12 groups with over 100 in the sample when looking at the top 20 matches, (m) dominant hit rates for males in the 10 groups with under 100 in the sample when looking at the top 5 matches, (n) overall and for females when the medical specialty entered is the only one dominant across the top 1, 5, 10, and 20, and (o) for males where there is one medical specialty dominant across the top 1, 5, 10, and 20, but the medical student entered a different specialty. There were two exceptions where the 18 scales outperformed the 30 items: overall and for females who had a different medical specialty dominant across the top 1, 5, 10, and 20. Data suggest that the 30 items outperforms the 150 items. However, the 30 items has greater gender imbalances than the 150 items and can therefore place half of the test-takers (females) at a disadvantage. The 30 items consistently outperforms the 18 scales. Singular hit rates. Overall for the random sample, the 30 items demonstrates the highest accuracy in the top match. There is a tie between the 150 items and the 30 items for the top 5 matches. The 30 items appears to make greater gains in accuracy when looking at the top 10 and top 20 singular matches. For females, there is a tie between the 150 items and the 30 items for the top match. The 150 items is most accurate for the top 5 singular matches. The 30 items is most accurate for the top 10 and 20 singular matches. For males, the 30 items consistently produces the highest singular hit rates. Overall for the 12 groups with over 100 in the sample, the 30 items appears most accurate for the top match, top 5 singular matches, and top 10 singular matches. The 150 items appears most accurate when looking at the top 20 singular matches. For females,

223

the top match is most accurate with the 30 items. The top 5 singular matches are most accurate with the 150 items. The 30 items is most accurate for the top 10 singular matches while the 150 items and the 30 items tie for the top 20 singular matches. For males, the top match is tied between the 150 items and the 30 items for accuracy. The top 5 singular matches are most accurate with the 30 items. The 18 scales and the 30 items tie for accurate top 10 singular matches and the 150 items appears most accurate for the top 20 singular matches. Overall for the 10 groups with under 100 in the sample, the 30 items appears to be most accurate for the top match. The 150 items appears to be most accurate for the top 5 singular matches and the 30 items appears most accurate for the top 10 and 20 singular matches. For females, the top match is most accurate with the 150 items. The 30 items is most accurate for the top 5, 10, and 20 singular matches. For males, the top match is most accurate with the 30 items. The 150 items appears to be most accurate for the top 5 singular matches and the 30 items appears most accurate for the top 10 and 20 singular matches. Dominant hit rates. Overall for the random sample, the 30 items appears to be most accurate for the top 5 and 10 dominant matches. The 150 items is most accurate in the top 20 dominant matches. For females, the top 5 and 20 dominant matches are most accurate with the 150 items and the 30 items appear most accurate for the top 10 dominant matches. For males, the top 5, 10 and 20 dominant matches are most accurate with the 30 items.

224

Overall for the 12 groups with over 100 in the sample, the 30 items appears most accurate for the top 5 and 10 dominant matches. The 150 items appears more accurate when looking at the top 20 dominant matches. For females, the top 5 and 20 dominant matches are most accurate with the 150 items. The 30 items appears more accurate for the top 10 dominant matches. For males, the top 5 and 10 dominant matches are most accurate with the 30 items. The 150 items is most accurate for the top 20 dominant matches. Overall for the 10 groups with under 100 in the sample, the 150 and the 30 items tie as most accurate for the top 5 dominant matches and the 30 items appears to be most accurate for the top 10 and 20 dominant matches. For females, the 30 items appears most accurate for the top 5, 10, and 20 dominant matches. For males, the 150 items appears to be most accurate for the top 5 dominant matches and the 30 items appears most accurate for the top 10 and 20 dominant matches. Standard scoring. When comparing the 30 items to standard scoring, the data from this research suggested that the 30 items achieved higher singular hit rates than standard scoring in 25 out of 36 calculations (two comparisons garnered equal results). There were nine exceptions where standard scoring outperformed the 30 items; (a) top match hit rates overall and for both genders, (b) top match hit rates overall and for both genders in the 12 groups with over 100 in the sample, (c) top match hit rates for females in the 10 groups with under 100 in the sample, and (d) top 5 and 10 singular hit rates for males in the 10 groups with under 100 in the sample. The results suggest that in general

225

standard scoring obtains more accurate hit rates for the top matches and that the 30 items obtains more accurate hit rates for the top 5, 10, and 20 singular matches. Summary. When comparing person matching calculations, the results suggest that the 30 items is most accurate followed by the 150 items and the 18 scales. When looking strictly at the top match, the results suggest that the most accurate predictive hit rates are achieved first by standard scoring, then the 30 items, the 150 items, and lastly the 18 scales. When looking strictly at the top 5 matches for person matching and the two highest probability scores for standard scoring, the results suggest that the most accurate predictive hit rates are achieved first by the 150 items and the 30 items, next standard scoring, and lastly the 18 scales. When looking strictly at the top 10 matches for person matching and the four highest probability scores for standard scoring, the results suggest that the most accurate predictive hit rates are achieved first by the 30 items, then standard scoring, next the 150 items, and lastly the 18 scales. When looking strictly at the top 20 matches for person matching and the five highest probability scores for standard scoring, the results suggest that the most accurate predictive hit rates are achieved first by the 30 items, next the 150 items, then the 18 scales, and lastly standard scoring. Standard scoring (6%) and the 30 items (2%) demonstrated greater gender imbalances than the 150 items, especially for the top match, which could be problematic for the half of the MSPI-R’s test-takers who are female. Additional Conclusions In addition to the five hypotheses noted above, there were other conclusions made based upon the data. First, a discussion of notable outcomes from calculating dominance

226

in person matching will be presented. Second, significant changes in means, standard deviations and top match scores will be reviewed. Third, important aspects of observed agreement and chance expectancy rates will be covered. Lastly, an overview of the individual medical specialties will be presented. Dominance in person matching. First, when looking at hit rates for person matching where the medical specialty entered by the medical student was dominant across the top 1, 5, 10, and 20 matches, the 150 items achieved the most hits overall and for females. The 150 items and the 30 items tied for accuracy for males. Second, when looking at hit rates for person matching where one medical specialty was dominant across the top 1, 5, 10, and 20 matches (but was not entered by the medical student) the 30 items obtained the highest hit rates overall and for females. The 150 items achieved the highest number for males. Third, when looking at hit rates for person matching where there was a different medical specialty dominant across the top 1, 5, 10, and 20 matches for a medical student, the 18 scales had the most hits overall and for females. The 150 items, the 18 scales, and the 30 items tied for the most hits for males. In conclusion, the 150 items were suggested to be the most likely to show consistency in predicting the specialty entered by a medical student across the top 1, 5, 10, and 20. At the same time, the 18 scales were the most likely to have the most scattered medical specialty prediction by having the most medical students have a different medical specialty appear as dominant across the top 1, 5, 10, and 20. The results suggest that person matching using scale scores increases the chances of diffusing test-takers’ scores.

227

Significance in means, standard deviations, and top match scores. Conclusions can be drawn about the means, standard deviations, and top match scores for the sample. For the random sample using the 150 items, the 18 scales, and the 30 items, an accurate top match had higher means, standard deviations, and scores than inaccurate top matches with one exception. When calculated using the 30 items, accurate top match scores had the closest top match score. Accurate top matches generally have higher scores and more diverse scores than inaccurate top matches. When looking at accurate top matches for the 18 scales, the lowest means, standard deviations, and top match scores were found in the 12 groups with over 100 in the sample. In this instance, the largest medical specialties displayed lower and more focused scores than the smaller medical specialties. For the 150 items, the lowest means and top match scores were found in the 10 groups with under 100 in the sample while the lowest standard deviations were found in the 12 groups with over 100 in the sample. In this instance, the smaller medical specialties displayed more focused scores and the largest medical specialties displayed lower scores. For the 30 items, the lowest means and standard deviations were found in the 10 groups with under 100 in the sample while the lowest top match scores were found in the larger medical specialties. In this instance, the smaller medical specialties displayed lower and more focused scores than the larger medical specialties. These results suggest that higher hit rates are achieved when there is a balance in lower and more focused scores between the larger and smaller specialties or when the smaller specialties have lower and more focused scores.

228

Third, for the 150 items, the 18 scales and the 30 items, females have higher means, standard deviations, and scores than males with one exception. When calculated using the 150 items with the 10 groups with under 100 in the sample, females had the lowest top match scores. In general, females have higher scores and more diverse scores than males. Fourth, dominance across the top 1, 5, 10, and 20 for the random sample where the medical student entered the specialty predicted has higher means, standard deviations, and top match scores than when the medical student had four different specialties picked as dominant across the top 1, 5, 10, and 20 for the 150 items and the 30 items. There were three exceptions to these findings for the 18 scales. When calculated using the 18 scales, accurate hit rates for dominance resulted in lower standard deviations and a lower top match score. Dominance across the top 1, 5, 10, and 20 where the medical student entered the specialty predicted generally has higher and more diverse scores than when four different specialties were picked as dominant across the top 1, 5, 10, and 20. There is a trend that accuracy in the top match and dominance across the top 1, 5, 10, and 20 matches results in higher means, standard deviations, and scores over incorrect predictions. Further, females generally have higher means, standard deviations, and top match scores than males no matter how person matching was calculated. Lastly, lower means, standard deviations, and top match scores in the smaller medical specialties was suggested to result in higher overall predictive accuracy. Observed agreement and chance expectancy rates. The data suggested important findings in observed agreement and chance expectancy rates. First, the

229

discussion will focus on observed agreement for standard scoring. When comparing the observed hit rates found by the study and the expected hit rates published in the MSPI-R manual for standard scoring, moderate kappa coefficients were suggested for the random sample (.56) and the females (.55) and males (.56) comprising the sample. Very good kappa coefficients were suggested for the 12 groups with over 100 in the sample (.84) and the females (.86) and males (.82) comprising the sample. Poor kappa coefficients were suggested for the 10 groups with under 100 in the sample (.16) and the females (.15) and males (.18) comprising the sample. Since standard scoring calculates scores for 16 of the larger medical specialties, it is no surprise that standard scoring would have greater observed agreement with the larger specialties than the smaller specialties. These results suggest that standard scoring is obtaining hit rates similar to the hit rates published in the MSPI-R manual for the medical specialties that are calculated. Next, kappa coefficients comparing observed versus predicted hit rates for the top match will be presented. These calculations demonstrate how reporting simple hit rates may be misleading as an accurate prediction may take place simply by chance. Top match hit rates for standard scoring were 33% and the kappa coefficient calculated a fair (.33) agreement between observed versus predicted top match hit rates. Top match hit rates for the 150 items were 22% and the kappa coefficient calculated a poor (.18) agreement between observed versus predicted top match hit rates. Top match hit rates for the 30 items were 23% and the kappa coefficient calculated a poor (.18) agreement between observed versus predicted top match hit rates. Top match hit rates for the 18 scales were 18% and the kappa coefficient calculated a poor (.12) agreement between

230

observed versus predicted top match hit rates. Significant amounts of chance agreement are suggested to happen for all four calculations and are different than the reported predictive hit rates. Finally, comparing chance expectancy rates allows for an understanding of the breadth of a psychometric scoring method. The 30 items was able to achieve hit rates beyond the chance expectancy rate for all but three (19/22) of the medical specialties used in this study. Standard scoring and the 150 items were able to achieve hit rates beyond the chance expectancy rate for all but six (16/22) of the medical specialties used in this study. The 18 scales was able to achieve hit rates beyond the chance expectancy rate for all but seven (15/22) of the medical specialties used in this study. Person matching using the 30 items was able to make predictions at a rate better than chance for the greatest number of medical specialties. Individual Medical Specialties When utilizing person matching to maximize singular hit rates (at least one match in the top 1, 5, 10, and 20 matches representing the specialty entered by the medical student) for medical specialties in the 12 groups with over 100 in the sample, the top 20 matches were the most accurate. No preference for the 150 items, the 18 scales, or the 30 items could be determined. When utilizing person matching to maximize dominant hit rates (a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student) for medical specialties in the 12 groups with over 100 in the sample, the top 10 and top 20 matches were the most accurate. No preference for the 150 items, the 18 scales, or the 30 items could be determined.

231

When utilizing person matching to maximize singular hit rates (at least one match in the top 1, 5, 10, and 20 matches representing the specialty entered by the medical student) for the medical specialties in the 10 groups with under 100 in the sample, the top 20 matches were the most accurate. No preference for the 150 items the 18 scales, or the 30 items could be determined. When utilizing person matching to maximize dominant hit rates (a majority of the matches in the top 5, 10, and 20 representing the specialty entered by the medical student) for the medical specialties in the 10 groups with under 100 in the sample, the 30 items was the most accurate. No preference for the top 5, 10, or 20 dominant matches could be determined. As with the random sample, the medical specialties most often achieved the highest singular hit rates with the top 20 matches. However, no clear favorite was observed in terms of person matching calculation (the 150 items, the 18 scales, and the 30 items) for the 22 separate medical specialties. The medical specialties most often obtained the highest dominant hit rates with the 30 items. While some generalizations can be made, each medical specialty was impacted independently by each person matching calculation (the 150 items, the 18 scales, and the 30 items) and for dominance in the top 5, 10, and 20 matches. As such, each person matching calculation impacts the individual medical specialties separately. Interpretations The interpretations section will discuss the meanings proposed by the conclusions drawn at the beginning of this chapter about the data. This section begins with the meanings surrounding the comparison of standard scoring, the 150 items, the 18 scales,

232

and the 30 items. Next, a discussion of meanings surrounding singular hit rates will be presented. Then, a discussion of the meanings surrounding dominant hit rates will be offered. Subsequently, significant interpretations of means, standard deviations and top match scores will be reported. Lastly, meanings surrounding the individual medical specialties will be presented. General Comparisons The direct evaluation of standard scoring and person matching mandated the development of a way to compare the results of two divergent methodologies. The data evaluating the hit rates of all four calculations suggested that standard scoring obtained higher hit rates for the top match than the 150 items, the 18 scales, and the 30 items. This result suggests that when the goal is looking for the one right answer for the test-taker (as is the case in modernist philosophy) that standard scoring would be preferred over person matching. A correct prediction in the two highest probability scores for standard scoring was outperformed by the singular match hit rate in the top 5 for person matching. Person matching can obtain similar hit rates in the closest 5 matches out of 5,142 reference group members at a higher hit rate than standard scoring can calculate the highest two out of 16 percentages. This suggests that there is a very high sensitivity in person matching to accurately match test-takers to their preferred medical specialty in the top 5 matches. Further, the four highest probability scores for standard scoring were often outperformed by the singular match hit rate in the top 10 for person matching. Person matching can obtain similar hit rates in the closest 10 matches out of 5,142 reference group members as readily as standard scoring when calculating the top four out of 16 percentages. Lastly,

233

person matching singular hit rates for the top 20 matches outperformed the five highest probability scores for standard scoring. Person matching has the ability to obtain higher hit rates in the closest 20 matches out of 5,142 reference group members than standard scoring calculates in the top five out of 16 percentages. While standard scoring outperforms person matching for the top match, person matching is able to achieve hit rates that match or exceed standard scoring overall. This suggests that further research should be conducted to understand how the full person matching protocol, including the biographies of reference group members, would impact the value of the information test-takers receive from the two methodologies. It may be worth a few less top match hits and more gender balanced hits to test-takers to receive valuable narrative information through person matching. The data comparing the 150 items to the 18 scales suggested that the 150 items obtained more accurate hit rates than the 18 scales. It was suggested by Kuder that person matching on the item level would obtain more accurate hit rates than calculations based upon a smaller subset of the inventory items. Because of this, he rejected Strong’s method of not scoring every item on the interest inventory. For person matching, it appears that Kuder’s hypothesis was correct because the 150 items consistently had higher hit rates than the 18 scales. The calculations for the 18 scales are based upon 88 items on the MSPI and appear to oversimplify and generalize test-takers’ answers enough to obtain lower hit rates than the 150 items, the 30 items, and standard scoring. The data comparing the 150 items and the 30 items suggested that the 30 items performed very similarly to the 150 items. The results suggest that there is merit in

234

blending the best of both methodologies with the 30 items. The two calculations overall performed within 1% or 2% of the other. The 18 scales, the 30 items, and standard scoring all have larger gender differences than the 150 items since these calculations are based upon only a portion of the items on the inventory. There is merit to Kuder’s assertion that only looking at interest inventory items that account for differences between groups will increase gender imbalances in interest inventories. However, it would appear that performing person matching using inventory items that are gender neutral is more important than the quantity of items person matched. When comparing the 150 items, the 18 scales, the 30 items, and standard scoring, standard scoring is most accurate for the top match. However, standard scoring’s top match hit rates demonstrate the greatest gender difference among the 150 items, the 18 scales, and the 30 items. If gender balancing is the benchmark for success for the MSPI-R, person matching using the 150 items or the 30 items would be preferable as the psychometric scoring methodology. Also, female and male singular hit rates for the 150 items and the 30 items in the top 5, 10, and 20 matches outperform standard scoring using the top two, four, and five highest probability scores respectfully. If top match accuracy is not the benchmark for success for the MSPI-R, the 150 items or the 30 items would be preferable as the psychometric scoring methodology for females and males. Further, the 30 items demonstrates fewer gender imbalances in the top match than standard scoring. In addition, when looking at chance expectancy rates, the 30 items obtains predictions for the greatest number of medical specialties at a rate greater than chance for the top match as compared to standard scoring, the 150 items, and the 18 scales.

235

When comparing the hit rates of smaller and larger medical specialties, the 150 items, the 18 scales, the 30 items, and standard scoring obtain higher hit rates with the 12 groups with over 100 in the sample. This could suggest that (a) the larger specialties have more differentiation in skills used in those medical specialties than the smaller specialties, (b) for both person matching and standard scoring there is something in the psychometric scoring methodology that favors larger specialties, or (c) the larger medical specialties are entered by medical students with a more focused way of thinking than the smaller medical specialties. For the 16 medical specialties that standard scoring calculates, hit rates were obtained in very good agreement with what is listed in the MSPI-R manual. This result validates that standard scoring predictive hit rates operate in a very systematic and reliable manner. Although we can compare the four separate calculations and discuss which calculation is preferable given specific parameters, it cannot be denied that there is a significant amount of chance agreement to be found in the predictive hit rates reported. Holland’s warning that interest inventories perform at lower rates than expressed interest and as such are limited in their helpfulness should be taken seriously. Singular Hit Rates The highest singular hit rates for the 150 items, the 18 scales, and the 30 items came from the top 20 matches, which consistently were able to outperform the five highest probability calculations for standard scoring. Further, the top 10 matches using the 30 items was able to outperform the four highest probability calculations for standard scoring. In addition, for the 150 items and the 30 items, the top 5 matches were able to

236

outperform the two highest probability calculations for standard scoring. None of the person matching calculations were able to outperform standard scoring for the top match. The 18 scales can only improve predictive hit rates over standard scoring using the top 20 matches. The 150 items improves predictive hit rates over standard scoring using the top 5 and top 20 matches (equaling standard scoring in the top 10 matches) while the 30 items outperforms standard scoring using the top 5, 10, and 20 matches. It would appear that the 18 scales oversimplifies the comparison of the test-taker to the reference group in a way that inhibits accuracy. The 150 items and the 30 items clearly offer the most singular hit rate flexibility. This tie is significant because the 30 items is based upon discriminant function analysis and therefore contradicts Kuder’s assertion that greater accuracy would be achieved by scoring every item on the interest inventory. Dominant Hit Rates Dominant hit rates were rarely able to meet even the highest probability score for standard scoring. The highest dominant hit rates came from the top 10 matches for the 18 scales and the 30 items. The 150 items were inconclusive for dominant hit rate accuracy in the top 5, 10, and 20 matches. Looking more closely at dominance, the 150 items had the highest number of medical students who had their medical specialty as dominant in their top 1, 5, 10, and 20 and at the same time had the lowest number of medical students have a different medical specialty dominant across their top 1, 5, 10, and 20 matches. The 30 items had the second highest number of medical students who had their medical specialty as dominant in their top 1, 5, 10, and 20 and at the same time had the second lowest number of medical students have a different medical specialty dominant across

237

their top 1, 5, 10, and 20 matches. The 18 scales had the lowest number of medical students who had their medical specialty as dominant across the top 1, 5, 10, and 20 and at the same time had the highest number of medical students have a different medical specialty dominant across their top 1, 5, 10, and 20. The 150 items has more consistency in dominant hit rates than the 18 scales and the 30 items. Data suggest that not scoring every item on the inventory causes generalizations to occur. In the case of the 18 scales, the generalization of the 88 items into 18 scores creates more opportunities to scatter scores by having more medical students show inconsistency in medical specialty preference. In a more mild way, the generalizations of the 30 scores create more opportunities to (a) focus scores by having medical students show consistency in a medical specialty and (b) scatter scores by having medical students show inconsistency in medical specialty preference. The 150 items allows for subtle variations to occur in scoring, therefore accuracy in dominance for the top 5, 10, and 20 matches appears more stable and there are fewer inconsistencies in dominance. These results defend Kuder’s assertion that not scoring every item on the interest inventory causes distortions in the data. Means, Standard Deviations, and Top Match Scores The means, standard deviations, and top match scores of the data reveal several significant pieces of information. First, accurate top matches for person matching generally have higher scores, higher means, and higher standard deviations than inaccurate top matches. Accuracy in predicting the medical student’s medical specialty from the top match is a more complex process than just getting a close score. Top match

238

scores closer to zero tended to be inaccurate while top match scores further from zero tended to be accurate. This same phenomenon occurred with dominance as accurate means, standard deviations, and top match scores tended to be higher than inaccurate dominant means, standard deviations, and top match scores. It would appear that there may be a difference between matching (a) to likeness in personality and (b) to a way of thinking. When using person matching, data suggest that test-takers can more closely match to another person if they share certain personality characteristics that have them answer inventory items in specific patterns. Further, data suggest that test-takers match less closely to another person when they share a way of thinking (which is often found in a profession) that has them answer inventory items in specific patterns. This conclusion is supported by test-takers who matched with several others in their own medical specialty having higher top match scores, means, and standard deviations than test-takers who matched to lots of different professions, yet had lower top match scores, means, and standard deviations. When test-takers match to many people in the same profession, they may be matching to a way of thinking that can be found in that profession. When test-takers match to many people in many different professions, they may be matching to similarities that they all have in personality. Second, the 18 scales result in the 10 groups with over 100 in the sample overall having higher means, standard deviations, and top match scores than the larger medical specialties. These results suggest that the 18 scales demonstrate greater effectiveness with the larger medical specialties as the larger specialties have lower top match scores that are more closely centered together. The 30 items obtained exactly the opposite effect

239

and resulted in the 12 groups with over 100 in the sample having overall higher means, standard deviations, and top match scores. These results suggest that the 30 items demonstrated greater effectiveness with the smaller medical specialties as the smaller specialties have lower top match scores that are more closely centered together. The 150 items obtained a split in efficacy between the larger and smaller medical specialties having higher means, standard deviations, and top match scores. These results suggest that the 150 items demonstrated greater effectiveness with the smaller medical specialties in terms of having scores that are more closely centered together and greater effectiveness with the larger medical specialties in terms of having lower top match scores. This shift in efficiency may explain why the 30 items is able to make predictions at a rate greater than chance for more of the smaller medical specialties than standard scoring, the 150 items, and the 18 scales. Third, females have higher scores, higher means, and higher standard deviations than males no matter how person matching is calculated. This data reinforces Strong’s assertion that female interests are more generalized and that heterogeneity in female interests makes it harder to specify occupational interest patterns for women (Strong, 1943). In conclusion, the means, standard deviations, and top match scores make valuable contributions to the understanding of the psychometric scoring properties of person matching. First, it is likely that accurate top match scores will be higher and more diverse than inaccurate top match scores. This may occur because of a difference between person matching capturing likenesses in personality versus person matching

240

capturing a way of thinking in a profession. Second, tracking closer top match scores, means, and standard deviations can demonstrate where a calculation of person matching has its strengths; larger or smaller medical specialties. Third, females appear to have more diversity in responses when taking an interest inventory and this diversity needs to be accommodated by the psychometric scoring methodology. Individual Medical Specialties When looking at the individual medical specialties for person matching, the highest singular hit rates were obtained using the top 20 matches. However, there was no clear favorite between the 150 items, the 18 scales, and the 30 items. This suggests that each medical specialty interacts with the 150 items, the 18 scales, and the 30 items independently from the other medical specialties in terms of singular matches. In terms of dominant hit rates, the larger medical specialties obtained higher hit rates with the top 10 and 20 matches with no clear preference for the 150 items, the 18 scales, or the 30 items. The smaller specialties obtained higher hit rates with the 30 items with no clear preference for the top 5, 10, or 20 matches. In conclusion for singular hit rates, there is tremendous variability in the three calculations and the accuracy increases as the number of person matches increases. For dominant hit rates, the 30 items obtains the most hits, but there is no clear suggestion for using the top 5, 10, or 20 matches. Ultimately, the medical specialties fluctuate independently from each other. Implications All recommendations are couched in the understanding that the full person matching protocol has not been performed with this study. As such, it is impossible to

241

make direct and definitive comparisons between the full person matching protocol and standard scoring. This research can offer only insight into the usefulness of implementing the full protocol. Theory As discussed in Chapter 1, interest inventories using group norms compare test-takers outside of a profession to occupational scales created from professionals in the field. This research used students who took the inventory in medical school and compared their scores to medical students who took the inventory in medical school and later entered a specific medical specialty. The results of the data suggest that standard scoring and person matching using the 150 items and the 30 items perform at comparable rates (the top match not withstanding). As such, using students as the reference group for person matching appears as successful as using an occupational group in standard scoring. Although different approaches, both standard scoring and person matching compared the interests of adults to adults and achieved similar outcomes. This supports Strong’s assertion that interests show stability at age 15, become more stable over time, and are not influenced by working in an occupation. The results of this research suggest that when the goal of the interest inventory comes from the modernist tradition and uses an objective measurement to offer the test-taker one right answer for selecting their profession, standard scoring, based upon the occupational groups of Strong, is the preferred psychometric scoring methodology. This is reinforced in the data by standard scoring outperforming any calculation of person matching for the top match. Results suggest that removing interests common to everyone

242

is advantageous for top match accuracy. When the goal of the interest inventory comes from the postmodernist tradition and uses similarities in item responses to offer the test-taker professions of interest to explore to construct their preferred future, person matching, based upon the work of Kuder, is the preferred psychometric scoring methodology. This is reinforced in the data by person matching using the 150 items outperforming standard scoring in the top 5 and 20 matches and the 30 items outperforming any calculation of standard scoring outside of the top match. Two new contributions to person matching theory are hypothesized by the means, standard deviations, and top match scores suggested by this research. First, no matter the person matching calculation, means, standard deviations, and top match scores for accurate predictions were generally higher than inaccurate predictions. Data suggest that there are two ways to person match. First, the individuals who had inaccurate predictions had closer means, standard deviations, and top match scores. These results suggest that they may be matching to personality characteristics that account for a similar way of rating the 150 items on the inventory, which results in less deviation in how the 150 items are scored. Second, the individuals who had accurate predictions generally had higher means, standard deviations, and top match scores. The results suggest that they may be matching to a similar way of thinking found in an occupation that accounts for a similar way of rating the 150 items on the inventory, which results in greater deviation in how the 150 items are scored. Second, determining where lower means, standard deviations, and top match scores occur with the larger and smaller medical specialties may predict hit rate accuracy.

243

This is noteworthy because the 30 items was able to perform at better-than-chance expectancy rates for four more medical specialties than the 18 scales and three more medical specialties than standard scoring and the 150 items. Person matching using the 30 items achieves closer scores with the smaller medical specialties and thus may detect subtler nuances with the smaller medical specialties than the larger medical specialties. The data supported the theory that sex differences are evident in interest inventories and that these biases occur when creating group norms. Standard scoring for the MSPI-R uses single-sex scales to calculate the 16 probability scores and does not score every item on the inventory. Both of these actions have been identified as increasing gender differences in predictive hit rates. This may account for the largest gender differences occurring in standard scoring, which is based upon the discriminant function analysis of standard scoring to calculate occupational groups. Further, females obtained higher means, standard deviations, and top match scores than males. This supports Strong’s assertion that females exhibit more heterogeneity in interests, which is harder to capture when using group norms because not every item is used in predictive scoring. Person matching at the item level, as asserted by Kuder, clearly reduced sex bias with the MSPI-R by eliminating the use of group norms. Holland’s assertion that interest inventories would not surpass the 56.3% accuracy rate for expressed choice is suggested by this data. Standard scoring obtained a 33% top match accuracy rate, the 30 items obtained a 23% top match accuracy rate, the 150 items obtained a 22% top match accuracy rate, and the 18 scales obtained an 18% top match accuracy rate. These low top match calculations are further supported by all four

244

calculations obtaining fair to poor agreement observed beyond chance for actual versus predicted medical specialty. Those familiar with the MSPI may be wondering why the 33% top match hit rate for standard scoring from this research is substantially lower than the 53.5% top match hit rate reported in earlier research (Porfeli et al., 2010). The random sample from this study included medical students from 22 different medical specialties and as such top match predictive hit rates were calculated based upon medical students being accurately matched to 22 medical specialties. The random sample from the 2010 study included medical students from the 16 different medical specialties that are predicted by the MSPI-R. Therefore, it would be expected that as medical students were entered into the random sample that were not from the 16 medical specialties predicted by the MSPI-R that the predictive accuracy would drop. There is room to improve the predictive accuracy of the MSPI-R. However, given all the factors that influence occupational choice (such as gender, personality, academic performance, role models, prestige, and other considerations), it should come as no surprise that interest inventories cannot take all of these factors into account. It is possible that interest inventories are more accurate than we are aware in terms of matching a test-taker’s interest to an occupation and at the same time are unable to influence a test-taker in ignoring other factors that influence occupational choice. Research There are eight considerations for research from this data. When considering the validity of the four methods, the 150 items, the 18 scales, and the 30 items outperformed standard scoring as person matching demonstrated the greatest ability to differentiate

245

between specific occupational groups. Standard scoring could only differentiate between 16 medical specialties. All three calculations of person matching were able to accommodate every medical specialty entered by any member of the reference group. Second, the 30 items showed the greatest ability to assign medical students to membership in an occupational group. The 30 items placed medical students at a rate greater than chance in 19 medical specialties. Standard scoring and the 150 items placed medical students at a rate greater than chance in 16 medical specialties and the 18 scales placed medical students at a rate greater than chance in 15 medical specialties. Standard scoring was outperformed by some form of person matching for the MSPI-R in both measures of interest inventory validity. This research examined predictive hit rates for medical specialties. As evidenced by several failed attempts by Strong, predictive hit rates for specialties within occupations are hard to achieve. This study suggests that person matching is more sensitive to subtleties in occupations as evidenced by the 30 items scoring better than chance for the highest number of medical specialties. Person matching offers interest inventories the flexibility of accommodating a quickly changing, global workforce and at the same time a greater measurement of subtler differences between occupations and specialties. Person matching’s ability to add new occupations to the reference group without waiting to test hundreds of people to develop an occupational group is a distinct advantage. Second, when looking at singular and dominant hit rates with this data, dominant hit rates obtained at best half of the predictive accuracy of singular hit rates. Singular hit

246

rates for person matching outperformed standard scoring in the top 20 matches for the 18 scales, the top 5 and 20 matches for the 150 items (equaling standard scoring for the top 10 matches) and the top 5, 10, and 20 matches for the 30 items. Singular hit rates for person matching obtained higher hit rate accuracy than dominant hit rates and deserve further research attention as they can potentially mirror the predictive accuracy found in standard scoring methodologies. Third, this research suggests that not scoring every item on an interest inventory leads to generalizations, which impacts hit rates and gender differences. In terms of hit rates, generalizations can lead to decreased accuracy (as in the case of the 18 scales) or increased accuracy (as in the case of the 30 items). This is evidenced by the 150 items (a) outperforming the 18 scales and (b) underperforming in certain situations as compared to the 30 items. Males clearly benefited more from interest inventories that used only a portion of the items on the interest inventory to make predictions. This is evidenced by standard scoring, the 18 scales, and the 30 items all obtaining higher gender differences than the 150 items. Researches need to be aware of these hit rate and gender differences and perform separate tests to see if including all items when scoring would be beneficial for females taking the interest inventory. All four methods obtained higher hit rates for medical specialties with larger memberships. This inherently occurs for standard scoring as the larger medical specialties were considered valuable when creating occupational group norms for the MSPI-R. For the 150 items, the 18 scales, and the 30 items, there is a vast difference in the number of medical students in the 12 groups with over 100 in the sample (an average

247

of 388 in each group) and the 10 groups with under 100 in the sample (an average of 44 in each group). For person matching, there may be anomalies that occur in the psychometric scoring methodology when there are an uneven number of members in each medical specialty in the reference group. Lastly, research should take a closer look at the hypothesis that there are two ways to person match. As inaccurate predictions had closer means, standard deviations, and top match scores than accurate predictions, it is hypothesized that one can person match to personality traits more closely than a way of thinking that may be found in a profession. Not only may person matching (a) equal the predictive accuracy of standard scoring, (b) give test-takers a narrative biography that they can use to write their preferred future, (c) offer more gender-balanced hit rates, and (d) include more professions in scoring; there is a chance that person matching may also determine if the test-taker would be happier being around a certain type of person versus a certain way of thinking. If these two ways of being in an occupation can be measured through person matching, it could revolutionize how interest inventories assist test-takers. Practice Since medical students taking the MSPI-R are instructed to select the two or three specialties with the highest probabilities for further exploration (Richard, 2010), the recommendations for the MPSI-R in this section will be driven by this directive. Data suggest that person matching using the top 5 singular matches performs at a higher rate than the two highest probability scores for standard scoring. The 150 items and the 30 items using the top 5 singular matches (50%) outperforms standard scoring by 4% for the

248

two highest probabilities (46%) and obtains 4% fewer hits than standard scoring for the three highest probability scores (54%). There needs to be further research to determine the average number of medical specialties listed in the top 5 singular matches to cement this comparison as there can be between one and five different medical specialties listed in the top 5 singular matches for the test-taker. The 150 items and the 30 items consistently performs (overall, for the 12 groups with over 100 in the sample, and the 10 groups with under 100 in the sample) at exactly the same rate as the highest 2.5 probability scores for standard scoring. Person matching therefore meets the MSPI-R’s instructions to students to explore the two or three highest probability scores using the 150 items or the 30 items with the top 5 singular matches. Further, the 150 items and the 30 items obtain hit rates that are more gender balanced than standard scoring. Standard scoring for the two highest probability scores obtains a gender imbalance favoring males by 5% and for the three highest probability scores obtains a gender imbalance favoring males by 2%. The 150 items obtains gender hit rates favoring males by 2% for the top 5 singular matches and the 30 items obtains gender hit rates favoring males by 3% for the top 5 singular matches. Even without the addition of narrative biographies provided by the full person matching protocol on the MSPI-R, the 150 items and the 30 items are able to meet the predictive hit rate of standard scoring for the 2.5 highest probability scores and exceed in gender balancing when looking at the two highest probability scores. When comparing the 150 items and the 30 items, there are several reasons to choose the 30 items. First the 30 items has a slightly higher hit rate than the 150 items. Second, the 30 items is able to predict medical students at a rate greater than chance for

249

the greatest number of medical specialties. Third, the 30 items gains some advantage with the 10 groups with under 100 in the sample as it obtains more gender-balanced hit rates for these groups than the 150 items. For the MSPI-R, there are significant reasons to spend the time and money to (a) score the interest inventory with the 30 items using the top 5 singular matches to improve gender balanced hit rates and (b) collect narrative biographies in order to run the full person matching protocol to study how test-takers value the standard scoring report versus the person matching scoring report. Data suggest that there is great untapped potential when using person matching as the psychometric scoring methodology for the MSPI-R. Additionally, it is hard to ignore any improvement in the MSPI-R that can dramatically improve the outcome for at least half of the test-takers immediately and possibly all of the test-takers once the full person matching protocol has been implemented. Limitations There are several limitations to this research. First, this study has only compared hit rates of standard scoring to person matching and in effect makes person matching compete with standard scoring in a modernist only comparison. By performing a study using the full person matching protocol to directly assess how helpful the two different scoring reports are for test-takers, person matching will be able to demonstrate a full range of postmodernist advantages and then a true comparison of the two methodologies can be made.

250

Second, this research has suggested that the top 5 singular matches approximate the same hit rate as the 2.5 highest probability scores. What is missing is an understanding of how many medical specialties are listed on average for the top 5, 10, and 20 singular matches. Comparisons would be more thorough with this additional information. Third, Kuder stated that to be included in the reference group for person matching the individual would have to be matched to other people who were enthusiastic about their work and scored the interest inventory in the same way (Kuder, 1977a). This research did not ascertain if medical students in the reference group enjoyed working in their medical specialty, which may have hindered person matching’s hit rate performance. Fourth, the reference group contained a very uneven number of individuals in each medical specialty. Internal Medicine had the largest group with 1,007 members and there were six medical specialties with only one member. The extreme range of medical students in each medical specialty in the reference group may have caused imbalances with person matching as a psychometric scoring methodology. Fifth, the study’s sample is derived from medical students who voluntarily took the MSPI-R between the years 2005 and 2008. It is impossible to require every medical student in the United States to take the MSPI-R and as such the sample may be made up of a specific type of medical student that is not representative of all medical students. Therefore, this study was not a random sample. Further, the ex post facto study could not directly control for the inclusion of minority students. Additionally, only medical students who attended a medical college that is part of the AAMC were able to access the

251

MSPI-R and therefore participate in the study. Consequently, the results may not pertain to medical students attending medical schools outside of the United States. Recommendations for Future Research In addition to studying the full person matching protocol to compare test-takers’ views on the value of the two different scoring reports produced by standard scoring and person matching, there other specific questions that have been uncovered by this research. First, Kuder found greater accuracy over Strong’s general reference group method by using the lambda coefficient, which took into account both similarities and differences in scores on the interest inventory without imposing a normal distribution. Even more importantly, Kuder found that the lambda coefficient improved upon Strong’s method with similar occupations. This could prove helpful when differentiating between medical specialties. Based upon the success of combining person matching and standard scoring with the 30 items in certain conditions, it may be helpful to explore using the lambda coefficient with the MSPI-R and then performing person matching to check for improvement upon predictive hit rates. Separately, it may be helpful to explore if using the lambda coefficient would improve upon standard scoring predictive hit rates with the MSPI-R even without person matching. Second, this research has suggested that the top 5 singular matches approximate the same hit rate as the 2.5 highest probability scores, the top 10 singular matches approximate the same hit rate as the four highest probability scores, and the top 20 singular matches approximate the same hit rate as the five highest probability scores.

252

What was not studied by this research was the average number of medical specialties listed as part of the top 5, 10, and 20 singular matches for the 150 items, the 18 scales, and the 30 items. This would help to make comparisons between standard scoring and person matching for the MSPI-R. Third, Kuder stated that to be included in the reference group the individual would have to be matched to other people who were enthusiastic about their work and scored the interest inventory in the same way (Kuder, 1977a). This research did not ascertain if medical students in the reference group enjoyed working in their medical specialty. Since we know that many individuals make compromises when making career decisions, it may or may not be critical for the person matching protocol if medical students were enthusiastic about their work as criteria for including them in the reference group. As such, running the full person matching protocol to compare hit rates for person matching with all individuals in general in the reference group versus only individuals enthusiastic about their work would help to verify Kuder’s assertion about satisfaction and predictive hit rate accuracy. Fourth, research needs to investigate if having large differences in the number of members of an occupation in the reference group negatively impacts person matching. For example, in this research one medical specialty had 1,007 members of an occupation in the reference group out of a total number of 5,143, while six other medical specialties had one member each. Further research is needed to provide an understanding of how large differences in the number members of each occupation in the reference group impacts person matching as a psychometric scoring methodology.

253

Fifth, future research needs to be performed on two possible ways to person match. Inaccurate predictions resulted in closer means, standard deviations, and top match scores than accurate predictions. A hypothesis was developed that test-takers can person match to personality traits more closely than a way of thinking that may be found in a profession. Unlike standard scoring, person matching may be able to determine if the test-taker would be happier being around a certain type of person versus a certain way of thinking, which could revolutionize how interest inventories assist test-takers. Lastly, this study did not examine how standard scoring and person matching hit rates compared for medical students of different ethnicities and cultures. Further research needs to be conducted to examine the impact that different psychometric scoring methodologies have with different ethnicities and cultures Summary Data suggest the following conclusions to the five hypotheses. First, the inclusion of all 150 raw item scores when person matching produced more accurate hit rates than when using the 18 scale scores. These results suggest that person matching on the item level is more accurate than person matching on the factor or scale level. Second, using the top 20 person matches produced the highest hit rates. Third, standard scoring outperformed person matching for the top match. The 150 items and the 30 items were able to outperform standard scoring when looking beyond the top match to offer medical students a few medical specialties to research for further consideration. Fourth, gender differences were less pronounced for person matching than standard scoring. Fifth, the

254

predictive hit rates were slightly higher when combining standard scoring and person matching psychometric scoring methodologies. The task of this study was to determine if the American Association of Medical Colleges (AAMC) would benefit by investing time and resources to provide students with Kuder’s full person matching model with the MSPI-R to further assist medical specialty decision making for medical students. This study has suggested that the AAMC would benefit from investing time and money into person matching as a psychometric scoring methodology for the MSPI-R. Namely, person matching (a) was able to dramatically increase the number of occupations included in the scoring of the MSPI-R, (b) was able to overcome much of the sex bias found in standard scoring, (c) allowed for the scores of students taking the interest inventory to be compared to the scores of students who took the interest inventory and have since entered a specialty, (d) could offer test-takers the ability to receive autobiographic data which may offer a more robust career exploratory experience than receiving an occupational title alone, (e) does not assume stable occupations in a global economy demanding flexibility and evolution in the career paths of individuals facing outsourcing and contractual work, and (f) may be able to determine if the test-taker is more attracted to certain personality types or certain ways of thinking. Given the instructions to students taking the MSPI-R to explore the two or three highest probability calculations, person matching using the 30 items and the top 5 singular matches (which does not equate to a list of five different medical specialties) was able to perform at a rate equaling the 2.5 highest matches for standard scoring. Additionally, the 30 items offered test-takers more gender balanced predictive hit rates

255

than standard scoring. This result occurred despite (a) having medical specialties in the reference group with numbers ranging from 1,007 to one, (b) not ascertaining if reference group members were enthusiastic about their medical specialty, and (c) not being able to compare the divergent scoring reports produced by the two distinct psychometric scoring methodologies. In spite of all of these potential negative influences on Kuder’s person matching methodology, the 150 items and the 30 items both showed great promise when compared to the standard scoring methodology used today. While this research has suggested that performing person matching using the scales of an interest inventory produces weak results, scoring all items on the inventory does suggest greater gender balancing in predictive hit rates. Additionally, there is promise in using aspects of standard scoring and person matching as performed in this study using the 30 items. What this research can emphatically conclude is that person matching is worthy of considerable research attention in interest inventories as the benefits to women (half of the test-takers) and a quickly changing, global workforce could be immense.

APPENDICIES

APPENDIX A IRB APPROVAL FOR PROTOCOL #10-382

APPENDIX A IRB Approval for Protocol #10-382 from KIEHL, LAURIE to Stephanie Burns & Mark Savickas date Tue, Nov 30, 2010 at 11:42 AM subject IRB approval for protocol #10-382 - retain this email for your records Hello, I am pleased to inform you that the Kent State University Institutional Review Board reviewed and approved your Application for Approval to Use Human Research Participants. Approval is effective for a twelve-month period: November 29, 2010 through November 28, 2011. Federal regulations and Kent State University IRB policy require that research be reviewed at intervals appropriate to the degree of risk, but not less than once per year. The IRB has determined that this protocol requires an annual review and progress report. The IRB tries to send you annual review reminder notice to by email as a courtesy. However, please note that it is the responsibility of the principal investigator to be aware of the study expiration date and submit the required materials. Please submit review materials (annual review form and copy of current consent form) one month prior to the expiration date. HHS regulations and Kent State University Institutional Review Board guidelines require that any changes in research methodology, protocol design, or principal investigator have the prior approval of the IRB before implementation and continuation of the protocol. The IRB must also be informed of any adverse events associated with the study. The IRB further requests a final report at the conclusion of the study. Kent State University has a Federal Wide Assurance on file with the Office for Human Research Protections (OHRP); FWA Number 00001853. If you have any questions or concerns, please contact me at 330-xxx-xxxx or [email protected]. Laurie B. Kiehl Compliance Assistant Research and Sponsored Programs 122 Cartwright Hall Kent State University Kent, OH 44242-0001 258

APPENDIX B HIT RATES, MEANS, STANDARD DEVIATIONS, TOP MATCH SCORES, AND KAPPA COEFFICIENTS FOR THE 22 MEDICAL SPECIALTIES

APPENDIX B Hit Rates, Means, Standard Deviations, Top Match Scores, and Kappa Coefficients for the 22 Medical Specialties Table B1 Person Matching Hit Rates for Anesthesiology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 24 2 8% 1/12 (8%) 1/12 (8%) 18 24 5 21% 2/12 (17%) 3/12 (25%) 30 24 5 21% 1/12 (8%) 4/12 (33%) Top 5 150 24 8 33% 5/12 (42%) 3/12 (25%) 18 24 15 63% 6/12 (50%) 9/12 (75%) 30 24 10 42% 5/12 (42%) 5/12 (42%) Top 10 150 24 14 58% 6/12 (50%) 8/12 (67%) 18 24 17 71% 7/12 (58%) 10/12 (83%) 30 24 13 54% 5/12 (42%) 8/12 (67%) Top 20 150 24 16 67% 8/12 (67%) 8/12 (67%) 18 24 19 79% 9/12 (75%) 10/12 (83%) 30 24 16 67% 8/12 (67%) 8/12 (67%) Table B2 Person Matching Hit Rates for Anesthesiology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 24 3 13% 1/12 (8%) 2/12 (17%) 18 24 4 17% 1/12 (8%) 3/12 (25%) 30 24 6 25% 2/12 (17%) 4/12 (33%) Top 10 150 24 7 29% 2/12 (17%) 5/12 (42%) 18 24 4 17% 1/12 (8%) 3/12 (25%) 30 24 9 38% 4/12 (33%) 5/12 (42%) Top 20 150 24 4 17% 1/12 (8%) 3/12 (25%) 18 24 6 25% 2/12 (17%) 4/12 (33%) 30 24 4 17% 1/12 (8%) 3/12 (25%)

260

261 Table B3 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Anesthesiology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 242.33 81.45 297.54 90.95 55.21 9.50 109 434 18 7.37 2.84 11.97 4.26 4.6 1.42 2.94 13.93 30 34.71 12.31 51.08 17.46 16.37 5.15 19 57 Table B4 Standard Scoring Hit Rates for Anesthesiology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 6/24 25% 41% .64 Good 1/12 (8%) 5/12 (42%) Second 1/24 4% 15% .36 Moderate 1/12 (8%) 0/12 (0%) Third 6/24 25% 11% .60 Moderate 3/12 (25%) 3/12 (25%) Fourth 3/24 13% 8% .78 Good 0/12 (0%) 3/12 (25%) Fifth 1/24 4% 7% .65 Good 1/12 (8%) 0/12 (0%) Total Top 5 17/24 71% 82% .65 Good 6/12 (50%) 11/12 (92%) Table B5 Person Matching Hit Rates for Dermatology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 14 3 21% 2/7 (29%) 1/7 (14%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 2 14% 0/7 (0%) 2/7 (29%) Top 5 150 14 8 57% 3/7 (43%) 5/7 (71%) 18 14 3 21% 1/7 (14%) 2/7 (29%) 30 14 3 21% 1/7 (14%) 2/7 (29%) Top 10 150 14 8 57% 3/7 (43%) 5/7 (71%) 18 14 5 36% 2/7 (29%) 3/7 (43%) 30 14 5 36% 2/7 (29%) 3/7 (43%) Top 20 150 14 10 71% 5/7 (71%) 5/7 (71%) 18 14 8 57% 4/7 (57%) 4/7 (57%) 30 14 9 64% 3/7 (43%) 6/7 (86%)

262 Table B6 Person Matching Hit Rates for Dermatology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 14 4 29% 1/7 (14%) 3/7 (43%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 0 0% 0/7 (0%) 0/7 (0%) Top 10 150 14 3 21% 1/7 (14%) 2/7 (29%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 1 7% 0/7 (0%) 1/7 (14%) Top 20 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 1 7% 0/7 (0%) 1/7 (14%) Table B7 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Dermatology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 212.07 66.43 256.50 69.52 44.43 3.09 119 323 18 5.72 2.40 9.48 3.15 3.76 .75 2.57 11.8 30 28.79 9.80 41.21 12.93 12.42 3.13 17 49 Table B8 Standard Scoring Hit Rates for Dermatology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 2/14 14% 25% .59 Moderate 1/7 (14%) 1/7 (14%) Second 1/14 7% 13% .63 Good 1/7 (14%) 0/7 (0%) Third 2/14 14% 8% .63 Good 0/7 (0%) 2/7 (29%) Fourth 3/14 21% 17% .76 Good 2/7 (29%) 1/7 (14%) Fifth 1/14 7% 12% .63 Good 1/7 (14%) 0/7 (0%) Total Top 5 9/14 64% 75% .66 Good 5/7 (71%) 4/7 (57%)

263 Table B9 Person Matching Hit Rates for Emergency Medicine Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 40 14 38% 5/20 (25%) 10/20 (50%) 18 40 10 25% 5/20 (25%) 5/20 (25%) 30 40 10 25% 4/20 (20%) 6/20 (30%) Top 5 150 40 27 68% 12/20 (60%) 15/20 (75%) 18 40 23 58% 9/20 (45%) 14/20 (70%) 30 40 27 68% 12/20 (60%) 15/20 (75%) Top 10 150 40 32 80% 13/20 (65%) 19/20 (95%) 18 40 36 90% 16/20 (80%) 20/20 (100%) 30 40 29 73% 14/20 (70%) 15/20 (75%) Top 20 150 40 35 88% 16/20 (80%) 19/20 (95%) 18 40 40 100% 20/20 (100%) 20/20 (100%) 30 40 33 83% 16/20 (80%) 17/20 (85%)

Table B10 Person Matching Hit Rates for Emergency Medicine Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 40 17 43% 7/20 (35%) 10/20 (50%) 18 40 13 33% 7/20 (35%) 6/20 (30%) 30 40 16 40% 6/20 (30%) 10/20 (50%) Top 10 150 40 21 53% 10/20 (50%) 11/20 (55%) 18 40 20 50% 10/20 (50%) 10/20 (50%) 30 40 19 48% 8/20 (40%) 11/20 (55%) Top 20 150 40 22 55% 10/20 (50%) 12/20 (60%) 18 40 17 43% 6/20 (30%) 11/20 (55%) 30 40 16 40% 7/20 (35%) 9/20 (45%) Table B11 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Emergency Medicine Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 261.93 79.19 322.63 88.80 60.70 9.61 126 449 18 7.99 3.32 13.61 5.32 5.62 2.00 2.50 19.97 30 36.70 12.95 53.05 14.85 16.35 1.90 16 69

264 Table B12 Standard Scoring Hit Rates for Emergency Medicine for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 17/40 43% 58% .71 Good 8/20 (40%) 9/20 (45%) Second 9/40 23% 15% .76 Good 4/20 (20%) 5/20 (25%) Third 1/40 3% 9% .38 Fair 1/20 (5%) 0/20 (0%) Fourth 5/40 13% 7% .72 Good 3/20 (15%) 2/20 (10%) Fifth 2/40 5% 3% .66 Good 1/20 (5%) 1/20 (5%) Total Top 5 34/40 85% 92% .63 Good 17/20 (85%) 17/20 (85%) Table B13 Person Matching Hit Rates for Family Medicine Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 40 14 35% 8/20 (40%) 6/20 (30%) 18 40 10 25% 4/20 (20%) 6/20 (30%) 30 40 11 28% 7/20 (35%) 4/20 (20%) Top 5 150 40 27 68% 13/20 (65%) 14/20 (70%) 18 40 23 58% 10/20 (50%) 13/20 (65%) 30 40 30 75% 15/20 (75%) 15/20 (75%) Top 10 150 40 29 73% 14/20 (70%) 15/20 (75%) 18 40 32 80% 18/20 (90%) 14/20 (70%) 30 40 34 85% 17/20 (85%) 17/20 (85%) Top 20 150 40 35 88% 18/20 (90%) 17/20 (85%) 18 40 34 85% 19/20 (95%) 15/20 (75%) 30 40 37 93% 19/20 (95%) 18/20 (90%) Table B14 Person Matching Hit Rates for Family Medicine Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 40 15 38% 8/20 (40%) 7/20 (35%) 18 40 10 25% 5/20 (25%) 5/20 (25%) 30 40 18 45% 9/20 (45%) 9/20 (45%) Top 10 150 40 19 48% 9/20 (45%) 10/20 (50%) 18 40 13 33% 5/20 (25%) 8/20 (40%) 30 40 20 50% 10/20 (50%) 10/20 (50%) Top 20 150 40 18 45% 8/20 (40%) 10/20 (50%) 18 40 14 35% 6/20 (30%) 8/20 (40%) 30 40 22 55% 10/20 (50%) 12/20 (60%)

265 Table B15 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Family Medicine Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 245.28 90.68 303.50 104.62 58.22 13.94 64 458 18 7.55 3.67 13.21 5.83 5.66 2.16 1.43 17.64 30 37.40 15.58 53.38 20.53 15.98 4.95 11 77 Table B16 Standard Scoring Hit Rates for Family Medicine for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 21/40 53% 59% .85 Very Good 11/20 (55%) 10/20 (50%) Second 9/40 23% 16% .76 Good 4/20 (20%) 5/20 (25%) Third 2/40 5% 11% .64 Good 2/20 (10%) 0/20 (0%) Fourth 2/40 5% 5% 1.0 Very Good 1/20 (5%) 1/20 (5%) Fifth 1/40 3% 3% 1.0 Very Good 1/20 (5%) 0/20 (0%) Total Top 5 35/40 88% 94% .54 Moderate 19/20 (95%) 16/20 (80%) Table B17 Person Matching Hit Rates for Internal Medicine Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 56 17 30% 9/28 (32%) 8/28 (29%) 18 56 15 27% 8/28 (29%) 7/28 (25%) 30 56 20 36% 11/28 (39%) 9/28 (32%) Top 5 150 56 46 82% 23/28 (82%) 23/28 (82%) 18 56 43 77% 21/28 (75%) 22/28 (79%) 30 56 41 73% 18/28 (64%) 23/28 (82%) Top 10 150 56 49 88% 25/28 (89%) 24/28 (86%) 18 56 52 93% 26/28 (93%) 26/28 (93%) 30 56 54 96% 26/28 (93%) 28/28 (100%) Top 20 150 56 55 98% 28/28 (100%) 27/28 (96%) 18 56 56 100% 28/28 (100%) 28/28 (100%) 30 56 56 100% 28/28 (100%) 28/28 (100%)

266 Table B18 Person Matching Hit Rates for Internal Medicine Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 56 31 55% 17/28 (61%) 14/28 (50%) 18 56 23 41% 9/28 (32%) 14/28 (50%) 30 56 27 48% 12/28 (43%) 15/28 (54%) Top 10 150 56 38 68% 18/28 (64%) 20/28 (71%) 18 56 31 55% 16/28 (57%) 15/28 (54%) 30 56 34 61% 16/28 (57%) 18/28 (64%) Top 20 150 56 38 68% 19/28 (68%) 19/28 (68%) 18 56 31 55% 15/28 (54%) 16/28 (57%) 30 56 35 63% 17/28 (61%) 18/28 (64%) Table B19 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Internal Medicine Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 247.73 90.27 298.79 104.39 51.06 14.12 98 545 18 7.39 3.66 12.17 4.83 4.78 1.17 2.38 21.02 30 35.50 13.30 50.45 17.43 14.95 4.13 16 82 Table B20 Standard Scoring Hit Rates for Internal Medicine for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 35/56 63% 65% .96 Very Good 18/28 (64%) 17/28 (61%) Second 12/56 21% 20% .95 Very Good 5/28 (18%) 7/28 (25%) Third 4/56 7% 7% 1.0 Very Good 2/28 (7%) 2/28 (7%) Fourth 2/56 4% 3% 1.0 Very Good 1/28 (4%) 1/28 (4%) Fifth 2/56 4% 2% .66 Good 2/28 (7%) 0/28 (0%) Total Top 5 55/56 98% 97% .66 Good 28/28 (100%) 27/28 (96%)

267 Table B21 Person Matching Hit Rates for Neurological Surgery Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 1 7% 1/7 (14%) 0/7 (0%) 30 14 2 14% 0/7 (0%) 2/7 (29%) Top 5 150 14 2 14% 1/7 (14%) 1/7 (14%) 18 14 3 21% 3/7 (43%) 0/7 (0%) 30 14 5 36% 2/7 (29%) 3/7 (43%) Top 10 150 14 2 14% 1/7 (14%) 1/7 (14%) 18 14 4 29% 3/7 (43%) 1/7 (14%) 30 14 8 57% 3/7 (43%) 5/7 (71%) Top 20 150 14 4 29% 1/7 (14%) 3/7 (43%) 18 14 4 29% 3/7 (43%) 1/7 (14%) 30 14 9 64% 3/7 (43%) 6/7 (86%) Table B22 Person Matching Hit Rates for Neurological Surgery Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 0 0% 0/7 (0%) 0/7 (0%) Top 10 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 1 7% 1/7 (14%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Top 20 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Table B23 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Neurological Surgery Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 286.64 87.86 348.57 103.15 61.93 15.29 142 460 18 9.07 3.83 14.55 5.29 5.48 1.46 1.71 14.69 30 36.29 12.91 56.71 20.66 20.42 7.75 13 56

268 Table B24 Standard Scoring Hit Rates for Neurological Surgery for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Second 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Third 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Fourth 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Fifth 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Total Top 5 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Table B25 Person Matching Hit Rates for Neurology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 14 2 14% 1/7 (14%) 1/7 (14%) 18 14 1 7% 1/7 (14%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Top 5 150 14 2 14% 1/7 (14%) 1/7 (14%) 18 14 2 14% 2/7 (29%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Top 10 150 14 3 21% 2/7 (29%) 1/7 (14%) 18 14 4 29% 4/7 (57%) 0/7 (0%) 30 14 4 29% 3/7 (43%) 1/7 (14%) Top 20 150 14 4 29% 3/7 (43%) 1/7 (14%) 18 14 4 29% 4/7 (57%) 0/7 (0%) 30 14 6 43% 3/7 (43%) 3/7 (43%) Table B26 Person Matching Hit Rates for Neurology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 14 1 7% 1/7 (14%) 0/7 (0%) 18 14 1 7% 1/7 (14%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Top 10 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 1 7% 1/7 (14%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Top 20 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%)

269 Table B27 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Neurology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 259.14 157.02 313.57 173.57 54.43 16.55 153 775 18 9.68 7.85 14.72 10.69 5.04 2.84 2.41 34.85 30 38.43 16.13 53.36 23.39 14.93 7.26 15 80 Table B28 Standard Scoring Hit Rates for Neurology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 1/14 7% 34% .24 Fair 1/7 (14%) 0/7 (0%) Second 1/14 7% 26% .32 Fair 0/7 (0%) 1/7 (14%) Third 2/14 14% 6% .63 Good 1/7 (14%) 1/7 (14%) Fourth 2/14 14% 3% 0.0 Poor 0/7 (0%) 2/7 (29%) Fifth 1/14 7% 5% 1.0 Very Good 1/7 (14%) 0/7 (0%) Total Top 5 7/14 50% 74% .57 Moderate 3/7 (43%) 4/7 (57%) Table B29 Person Matching Hit Rates for Obstetrics & Gynecology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 30 8 27% 6/15 (40%) 2/15 (13%) 18 30 8 27% 6/15 (40%) 2/15 (13%) 30 30 10 33% 5/15 (33%) 5/15 (33%) Top 5 150 30 18 60% 12/15 (80%) 6/15 (40%) 18 30 17 57% 11/15 (73%) 6/15 (40%) 30 30 22 73% 14/15 (93%) 8/15 (53%) Top 10 150 30 23 77% 15/15 (100%) 8/15 (53%) 18 30 22 73% 13/15 (87%) 9/15 (60%) 30 30 23 77% 14/15 (93%) 9/15 (60%) Top 20 150 30 26 87% 15/15 (100%) 11/15 (73%) 18 30 27 90% 14/15 (93%) 13/15 (87%) 30 30 28 93% 15/15 (100%) 13/15 (87%)

270 Table B30 Person Matching Hit Rates for Obstetrics & Gynecology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 30 9 30% 7/15 (47%) 2/15 (13%) 18 30 8 27% 7/15 (47%) 1/15 (7%) 30 30 9 30% 6/15 (40%) 3/15 (20%) Top 10 150 30 10 33% 7/15 (47%) 3/15 (20%) 18 30 14 47% 9/15 (60%) 5/15 (33%) 30 30 14 47% 9/15 (60%) 5/15 (33%) Top 20 150 30 10 33% 8/15 (53%) 2/15 (13%) 18 30 13 43% 9/15 (60%) 4/15 (27%) 30 30 14 47% 9/15 (60%) 5/15 (33%) Table B31 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Obstetrics & Gynecology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 247.37 97.87 299.23 114.21 51.86 16.34 102 564 18 7.78 4.05 12.26 5.74 4.48 1.69 2.70 19.88 30 37.40 17.83 53.03 23.01 15.63 5.18 11 94 Table B32 Standard Scoring Hit Rates for Obstetrics & Gynecology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 12/30 40% 54% .74 Good 8/15 (53%) 4/15 (27%) Second 6/30 20% 17% .89 Very Good 2/15 (13%) 4/15 (27%) Third 3/30 10% 5% .78 Good 2/15 (13%) 1/15 (7%) Fourth 1/30 3% 7% .65 Good 1/15 (7%) 0/15 (0%) Fifth 1/30 3% 4% 1.0 Very Good 1/15 (7%) 0/15 (0%) Total Top 5 23/30 77% 87% .67 Good 14/15 (93%) 9/15 (60%)

271 Table B33 Person Matching Hit Rates for Ophthalmology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 1 7% 0/7 (0%) 1/7 (14%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Top 5 150 14 2 14% 0/7 (0%) 2/7 (29%) 18 14 2 14% 0/7 (0%) 2/7 (29%) 30 14 2 14% 1/7 (14%) 1/7 (14%) Top 10 150 14 3 21% 1/7 (14%) 2/7 (29%) 18 14 4 29% 0/7 (0%) 4/7 (57%) 30 14 5 36% 1/7 (14%) 4/7 (57%) Top 20 150 14 4 29% 1/7 (14%) 3/7 (43%) 18 14 6 43% 1/7 (14%) 5/7 (71%) 30 14 6 43% 2/7 (29%) 4/7 (57%) Table B34 Person Matching Hit Rates for Ophthalmology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Top 10 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Top 20 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Table B35 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Ophthalmology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 284.57 120.67 342.79 127.80 58.22 7.13 137 547 18 6.71 3.28 12.00 4.39 5.29 1.11 3.41 15.89 30 38.00 16.50 53.79 20.73 15.79 4.23 16 70

272 Table B36 Standard Scoring Hit Rates for Ophthalmology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Second 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Third 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Fourth 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Fifth 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Total Top 5 0/14 0% 0% 1.0 Very Good 0/7 (0%) 0/7 (0%) Table B37 Person Matching Hit Rates for Orthopedic Surgery Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 24 11 46% 6/12 (50%) 5/12 (42%) 18 24 7 29% 4/12 (33%) 3/12 (25%) 30 24 6 25% 4/12 (33%) 2/12 (17%) Top 5 150 24 17 71% 9/12 (75%) 8/12 (67%) 18 24 12 50% 8/12 (67%) 4/12 (33%) 30 24 14 58% 8/12 (67%) 6/12 (50%) Top 10 150 24 21 88% 12/12 (100%) 9/12 (75%) 18 24 16 67% 9/12 (75%) 7/12 (58%) 30 24 18 75% 10/12 (83%) 8/12 (67%) Top 20 150 24 23 96% 12/12 (100%) 11/12 (92%) 18 24 19 79% 9/12 (75%) 10/12 (83%) 30 24 20 83% 10/12 (83%) 10/12 (83%) Table B38 Person Matching Hit Rates for Orthopedic Surgery Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 24 9 38% 5/12 (42%) 4/12 (33%) 18 24 6 25% 5/12 (42%) 1/12 (8%) 30 24 8 33% 6/12 (50%) 2/12 (17%) Top 10 150 24 12 50% 7/12 (58%) 5/12 (42%) 18 24 13 54% 9/12 (75%) 4/12 (33%) 30 24 10 42% 6/12 (50%) 4/12 (33%) Top 20 150 24 13 54% 8/12 (67%) 5/12 (42%) 18 24 9 38% 8/12 (67%) 1/12 (8%) 30 24 9 38% 6/12 (50%) 3/12 (25%)

273 Table B39 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Orthopedic Surgery Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 226.75 79.10 283.96 95.09 57.21 15.99 118 440 18 6.21 3.10 11.27 4.60 5.06 1.5 2.24 13.62 30 29.17 11.47 45.50 16.56 16.33 5.09 14 58 Table B40 Standard Scoring Hit Rates for Orthopedic Surgery for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 7/24 29% 50% .58 Moderate 3/12 (25%) 4/12 (33%) Second 3/24 13% 15% .83 Good 2/12 (17%) 1/12 (8%) Third 2/24 8% 9% 1.0 Very Good 1/12 (8%) 1/12 (8%) Fourth 3/24 13% 7% .78 Good 3/12 (25%) 0/12 (0%) Fifth 2/24 8% 4% .65 Good 1/12 (8%) 1/12 (8%) Total Top 5 17/24 71% 85% .65 Good 10/12 (83%) 7/12 (58%) Table B41 Person Matching Hit Rates for Otolaryngology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 14 3 21% 1/7 (14%) 2/7 (29%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 2 14% 0/7 (0%) 2/7 (29%) Top 5 150 14 4 29% 1/7 (14%) 3/7 (43%) 18 14 2 14% 1/7 (14%) 1/7 (14%) 30 14 4 29% 2/7 (29%) 2/7 (29%) Top 10 150 14 5 36% 2/7 (29%) 3/7 (43%) 18 14 4 29% 2/7 (29%) 2/7 (29%) 30 14 5 36% 3/7 (43%) 2/7 (29%) Top 20 150 14 7 50% 3/7 (43%) 4/7 (57%) 18 14 8 57% 3/7 (43%) 5/7 (71%) 30 14 7 50% 4/7 (57%) 3/7 (43%)

274 Table B42 Person Matching Hit Rates for Otolaryngology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 1 7% 0/7 (0%) 1/7 (14%) Top 10 150 14 1 7% 0/7 (0%) 1/7 (14%) 18 14 1 7% 1/7 (14%) 0/7 (0%) 30 14 1 7% 0/7 (0%) 1/7 (14%) Top 20 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 1 7% 1/7 (14%) 0/7 (0%) 30 14 1 7% 0/7 (0%) 1/7 (14%) Table B43 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Otolaryngology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 230.07 97.17 274.79 108.88 44.72 11.71 119 434 18 6.85 3.30 10.59 4.14 3.74 .84 2.41 14.77 30 32.86 14.38 45.14 17.75 12.28 3.37 17 60 Table B44 Standard Scoring Hit Rates for Otolaryngology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 4/14 29% 44% .70 Good 1/7 (14%) 3/7 (43%) Second 1/14 7% 9% 1.0 Very Good 1/7 (14%) 0/7 (0%) Third 0/14 0% 7% 0.0 Poor 0/7 (0%) 0/7 (0%) Fourth 1/14 7% 6% 1.0 Very Good 1/7 (14%) 0/7 (0%) Fifth 0/14 0% 5% 0.0 Poor 0/7 (0%) 0/7 (0%) Total Top 5 6/14 43% 71% .46 Moderate 3/7 (43%) 3/7 (43%)

275 Table B45 Person Matching Hit Rates for Pathology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 20 4 20% 2/10 (20%) 2/10 (20%) 18 20 3 15% 2/10 (20%) 1/10 (10%) 30 20 2 10% 2/10 (20%) 0/10 (0%) Top 5 150 20 7 35% 3/10 (30%) 4/10 (40%) 18 20 6 30% 3/10 (30%) 3/10 (30%) 30 20 7 35% 4/10 (40%) 3/10 (30%) Top 10 150 20 10 50% 5/10 (50%) 5/10 (50%) 18 20 10 50% 5/10 (50%) 5/10 (50%) 30 20 9 54% 6/10 (60%) 3/10 (30%) Top 20 150 20 12 60% 6/10 (60%) 6/10 (60%) 18 20 13 65% 7/10 (70%) 6/10 (60%) 30 20 11 55% 6/10 (60%) 5/10 (50%) Table B46 Person Matching Hit Rates for Pathology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 20 3 15% 2/10 (20%) 1/10 (10%) 18 20 3 15% 2/10 (20%) 1/10 (10%) 30 20 3 15% 2/10 (20%) 1/10 (10%) Top 10 150 20 5 25% 3/10 (30%) 2/10 (20%) 18 20 3 15% 2/10 (20%) 1/10 (10%) 30 20 3 15% 1/10 (10%) 2/10 (20%) Top 20 150 20 3 15% 2/10 (20%) 1/10 (10%) 18 20 5 25% 3/10 (30%) 2/10 (20%) 30 20 1 5% 1/10 (10%) 0/10 (0%) Table B47 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Pathology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 255.35 103.34 314.35 120.77 59.00 17.43 123 501 18 8.31 4.55 13.35 6.56 5.04 2.01 2.83 20.04 30 34.75 15.37 52.35 21.78 17.60 6.41 16 75

276 Table B48 Standard Scoring Hit Rates for Pathology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 4/20 20% 41% .55 Moderate 2/10 (20%) 2/10 (20%) Second 2/20 10% 19% .62 Good 2/10 (20%) 0/10 (0%) Third 2/20 10% 10% 1.0 Very Good 0/10 (0%) 2/10 (20%) Fourth 2/20 10% 7% .64 Good 1/10 (10%) 1/10 (10%) Fifth 1/20 5% 1% 0.0 Poor 0/10 (0%) 1/10 (10%) Total Top 5 11/20 55% 78% .47 Moderate 5/10 (50%) 6/10 (60%) Table B49 Person Matching Hit Rates for Pediatrics Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 44 13 30% 7/22 (32%) 6/22 (27%) 18 44 11 25% 5/22 (23%) 6/22 (27%) 30 44 18 41% 9/22 (41%) 9/22 (41%) Top 5 150 44 29 66% 15/22 (68%) 14/22 (64%) 18 44 24 55% 12/22 (55%) 12/22 (55%) 30 44 34 77% 17/22 (77%) 17/22 (77%) Top 10 150 44 34 77% 17/22 (77%) 17/22 (77%) 18 44 32 73% 16/22 (73%) 16/22 (73%) 30 44 39 89% 20/22 (91%) 19/22 (86%) Top 20 150 44 39 89% 19/22 (86%) 20/22 (91%) 18 44 39 89% 21/22 (95%) 18/22 (82%) 30 44 42 95% 22/22 (100%) 20/22 (91%)

277 Table B50 Person Matching Hit Rates for Pediatrics Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 44 19 43% 10/22 (45%) 9/22 (41%) 18 44 16 36% 10/22 (45%) 6/22 (27%) 30 44 21 48% 10/22 (45%) 11/22 (50%) Top 10 150 44 20 45% 10/22 (45%) 10/22 (45%) 18 44 15 34% 11/22 (50%) 4/22 (18%) 30 44 25 57% 12/22 (55%) 13/22 (59%) Top 20 150 44 24 55% 13/22 (59%) 11/22 (50%) 18 44 15 34% 11/22 (50%) 4/22 (18%) 30 44 26 59% 13/22 (59%) 13/22 (59%) Table B51 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Pediatrics Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 235.84 83.28 289.43 85.52 53.59 2.24 90 527 18 7.09 3.71 12.41 5.80 5.32 2.09 2.99 24.71 30 36.75 13.35 50.70 16.98 13.95 3.63 15 74 Table B52 Standard Scoring Hit Rates for Pediatrics for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 26/44 59% 64% .90 Very Good 11/22 (50%) 15/22 (68%) Second 4/44 9% 14% .78 Good 2/22 (9%) 2/22 (9%) Third 6/44 14% 11% .90 Very Good 5/22 (23%) 1/22 (5%) Fourth 2/44 5% 4% 1.0 Very Good 0/22 (0%) 2/22 (9%) Fifth 3/44 7% 4% .79 Good 2/22 (9%) 1/22 (5%) Total Top 5 41/44 93% 97% .48 Moderate 20/22 (91%) 21/22 (95%)

278 Table 53 Person Matching Hit Rates for Physical Medicine Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 1 7% 0/7 (0%) 1/7 (14%) 30 14 3 21% 1/7 (14%) 2/7 (29%) Top 5 150 14 1 7% 0/7 (0%) 1/7 (14%) 18 14 1 7% 0/7 (0%) 1/7 (14%) 30 14 4 29% 2/7 (29%) 2/7 (29%) Top 10 150 14 1 7% 0/7 (0%) 1/7 (14%) 18 14 1 7% 0/7 (0%) 1/7 (14%) 30 14 4 29% 2/7 (29%) 2/7 (29%) Top 20 150 14 5 36% 1/7 (14%) 4/7 (57%) 18 14 3 21% 0/7 (0%) 3/7 (43%) 30 14 7 50% 4/7 (57%) 3/7 (43%) Table B54 Person Matching Hit Rates for Physical Medicine Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 1 7% 0/7 (0%) 1/7 (14%) 30 14 2 14% 1/7 (14%) 1/7 (14%) Top 10 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 1 7% 0/7 (0%) 1/7 (14%) 30 14 3 21% 1/7 (14%) 2/7 (29%) Top 20 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 2 14% 0/7 (0%) 2/7 (29%) Table B55 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Physical Medicine Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 260.14 89.13 322.21 100.70 62.07 11.57 76 438 18 8.85 4.09 14.66 5.45 5.81 1.36 3.05 15.89 30 37.64 11.15 54.71 16.39 17.07 5.24 15 54

279 Table B56 Standard Scoring Hit Rates for Physical Medicine for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 2/14 14% 36% .46 Moderate 1/7 (14%) 1/7 (14%) Second 4/14 29% 24% .81 Very Good 1/7 (14%) 3/7 (43%) Third 2/14 14% 12% 1.0 Very Good 2/7 (29%) 0/7 (0%) Fourth 1/14 7% 4% 1.0 Very Good 0/7 (0%) 1/7 (14%) Fifth 3/14 21% 4% .44 Moderate 2/7 (29%) 1/7 (14%) Total Top 5 12/14 86% 80% .76 Good 6/7 (86%) 6/7 (86%) Table B57 Person Matching Hit Rates for Plastic Surgery Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 5 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 10 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 1 10% 1/5 (20%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 20 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 1 10% 1/5 (20%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Table B58 Person Matching Hit Rates for Plastic Surgery Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 10 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 20 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%)

280 Table B59 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Plastic Surgery Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 271.30 95.89 329.20 99.87 57.90 3.98 136 413 18 7.95 3.82 14.14 5.44 6.19 1.62 2.36 14.33 30 37.10 15.96 54.30 19.68 17.20 3.72 15 74 Table B60 Standard Scoring Hit Rates for Plastic Surgery for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Second 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Third 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Fourth 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Fifth 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Total Top 5 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Table B61 Person Matching Hit Rates for Psychiatry Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 24 6 25% 1/12 (8%) 5/12 (42%) 18 24 7 29% 1/12 (8%) 6/12 (50%) 30 24 8 33% 2/12 (17%) 6/12 (50%) Top 5 150 24 12 50% 4/12 (33%) 8/12 (67%) 18 24 13 54% 4/12 (33%) 9/12 (75%) 30 24 12 50% 4/12 (33%) 8/12 (67%) Top 10 150 24 15 63% 5/12 (42%) 10/12 (83%) 18 24 15 63% 6/12 (50%) 9/12 (75%) 30 24 14 58% 6/12 (50%) 8/12 (67%) Top 20 150 24 18 75% 7/12 (58%) 11/12 (92%) 18 24 18 75% 8/12 (67%) 10/12 (83%) 30 24 17 71% 8/12 (67%) 9/12 (75%)

281 Table B62 Person Matching Hit Rates for Psychiatry Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 24 8 33% 2/12 (17%) 6/12 (50%) 18 24 10 42% 2/12 (17%) 8/12 (67%) 30 24 9 38% 3/12 (25%) 6/12 (50%) Top 10 150 24 9 38% 2/12 (17%) 7/12 (58%) 18 24 9 38% 1/12 (8%) 8/12 (67%) 30 24 10 42% 3/12 (25%) 7/12 (58%) Top 20 150 24 9 38% 2/12 (17%) 7/12 (58%) 18 24 9 38% 2/12 (17%) 7/12 (58%) 30 24 7 29% 2/12 (17%) 5/12 (42%) Table B63 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Psychiatry Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 255.29 70.50 331.38 88.66 76.09 18.16 80 388 18 8.58 3.70 15.45 6.31 6.87 2.61 2.37 14.67 30 38.67 10.87 56.88 15.25 18.21 4.38 17 54 Table B64 Standard Scoring Hit Rates for Psychiatry for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 10/24 42% 61% .60 Moderate 2/12 (17%) 8/12 (67%) Second 4/24 17% 11% .83 Very Good 4/12 (33%) 0/12 (0%) Third 3/24 13% 8% .78 Good 1/12 (8%) 2/12 (17%) Fourth 2/24 8% 5% .65 Good 2/12 (17%) 0/12 (0%) Fifth 1/24 4% 2% 0.0 Poor 0/12 (0%) 1/12 (8%) Total Top 5 20/24 83% 87% .83 Very Good 9/12 (75%) 11/12 (92%)

282 Table B65 Person Matching Hit Rates for Radiology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 20 3 15% 2/10 (20%) 1/10 (10%) 18 20 0 0% 0/10 (0%) 0/10 (0%) 30 20 2 10% 1/10 (10%) 1/10 (10%) Top 5 150 20 10 50% 5/10 (50%) 5/10 (50%) 18 20 4 20% 1/10 (10%) 3/10 (30%) 30 20 5 25% 2/10 (20%) 3/10 (30%) Top 10 150 20 12 60% 5/10 (50%) 7/10 (70%) 18 20 9 45% 3/10 (30%) 6/10 (60%) 30 20 11 55% 5/10 (50%) 6/10 (60%) Top 20 150 20 18 90% 9/10 (90%) 9/10 (90%) 18 20 11 55% 4/10 (40%) 7/10 (70%) 30 20 15 75% 7/10 (70%) 8/10 (80%) Table B66 Person Matching Hit Rates for Radiology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 20 5 25% 3/10 (30%) 2/10 (20%) 18 20 1 5% 0/10 (0%) 1/10 (10%) 30 20 3 15% 1/10 (10%) 2/10 (20%) Top 10 150 20 6 30% 3/10 (30%) 3/10 (30%) 18 20 1 5% 0/10 (0%) 1/10 (10%) 30 20 4 20% 2/10 (20%) 2/10 (20%) Top 20 150 20 5 25% 3/10 (30%) 2/10 (20%) 18 20 1 5% 0/10 (0%) 1/10 (10%) 30 20 2 10% 1/10 (10%) 1/10 (10%) Table B67 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Radiology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 247.40 72.55 302.85 82.27 55.45 9.72 132 418 18 7.31 3.86 12.58 5.21 5.27 1.35 4.05 18.63 30 36.55 10.38 52.15 13.07 15.60 2.69 19 55

283 Table B68 Standard Scoring Hit Rates for Radiology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 3/20 15% 39% .42 Moderate 1/10 (10%) 2/10 (20%) Second 4/20 20% 15% .83 Very Good 2/10 (20%) 2/10 (20%) Third 3/20 15% 15% 1.0 Very Good 2/10 (20%) 1/10 (10%) Fourth 2/20 10% 7% .64 Good 0/10 (0%) 2/10 (20%) Fifth 3/20 15% 5% .46 Moderate 2/10 (20%) 1/10 (10%) Total Top 5 15/20 75% 81% .86 Very Good 7/10 (70%) 8/10 (80%) Table B69 Person Matching Hit Rates for Radiation Oncology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 1 10% 0/5 (0%) 1/5 (20%) Top 5 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 1 10% 0/5 (0%) 1/5 (20%) Top 10 150 10 1 10% 1/5 (20%) 0/5 (0%) 18 10 1 10% 1/5 (20%) 0/5 (0%) 30 10 1 10% 0/5 (0%) 1/5 (20%) Top 20 150 10 1 10% 1/5 (20%) 0/5 (0%) 18 10 2 20% 2/5 (40%) 0/5 (0%) 30 10 1 10% 0/5 (0%) 1/5 (20%) Table B70 Person Matching Hit Rates for Radiation Oncology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 10 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 20 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%)

284 Table B71 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Radiation Oncology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 239.50 87.21 300.70 95.79 61.20 8.58 142 414 18 7.92 3.50 14.19 5.35 6.27 1.85 4.37 15.37 30 31.20 10.94 50.10 14.85 18.90 3.91 16 55 Table B72 Standard Scoring Hit Rates for Radiation Oncology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Second 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Third 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Fourth 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Fifth 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Total Top 5 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Table B73 Person Matching Hit Rates for Surgery Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 30 8 27% 4/15 (27%) 4/15 (27%) 18 30 6 20% 3/15 (20%) 3/15 (20%) 30 30 8 27% 6/15 (40%) 2/15 (13%) Top 5 150 30 18 60% 10/15 (67%) 8/15 (53%) 18 30 18 60% 9/15 (60%) 9/15 (60%) 30 30 20 67% 9/15 (60%) 11/15 (73%) Top 10 150 30 25 83% 14/15 (93%) 11/15 (73%) 18 30 21 70% 10/15 (67%) 11/15 (73%) 30 30 24 80% 10/15 (67%) 14/15 (93%) Top 20 150 30 28 93% 15/15 (100%) 13/15 (87%) 18 30 26 87% 12/15 (80%) 14/15 (93%) 30 30 28 93% 14/15 (93%) 14/15 (93%)

285 Table B74 Person Matching Hit Rates for Surgery Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 30 8 27% 3/15 (20%) 5/15 (33%) 18 30 5 17% 1/15 (7%) 4/15 (27%) 30 30 7 23% 3/15 (20%) 4/15 (27%) Top 10 150 30 8 27% 3/15 (20%) 5/15 (33%) 18 30 9 30% 3/15 (20%) 6/15 (40%) 30 30 12 40% 8/15 (53%) 4/15 (27%) Top 20 150 30 9 30% 4/15 (27%) 5/15 (33%) 18 30 11 37% 4/15 (27%) 7/15 (47%) 30 30 12 40% 5/15 (33%) 7/15 (45%) Table B75 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Surgery Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 216.67 65.72 266.27 73.98 49.60 8.26 94 381 18 5.68 2.24 10.21 3.41 4.53 1.17 1.86 9.54 30 29.67 9.28 43.63 13.23 13.96 3.95 9 49 Table B76 Standard Scoring Hit Rates for Surgery for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 14/30 47% 48% 1.0 Very Good 5/15 (33%) 9/15 (60%) Second 3/30 10% 19% .62 Good 2/15 (13%) 1/15 (7%) Third 2/30 7% 14% .63 Good 2/15 (13%) 0/15 (0%) Fourth 4/30 13% 7% .63 Good 3/15 (20%) 1/15 (4%) Fifth 3/30 10% 6% .78 Good 1/15 (7%) 2/15 (13%) Total Top 5 26/30 87% 94% .63 Good 13/15 (87%) 13/15 (87%)

286 Table B77 Person Matching Hit Rates for Urology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 14 1 7% 0/7 (0%) 1/7 (14%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 0 0% 0/7 (0%) 0/7 (0%) Top 5 150 14 2 14% 1/7 (14%) 1/7 (14%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 1 7% 1/7 (14%) 0/7 (0%) Top 10 150 14 2 14% 1/7 (14%) 1/7 (14%) 18 14 1 7% 1/7 (14%) 0/7 (0%) 30 14 3 21% 2/7 (29%) 1/7 (14%) Top 20 150 14 2 14% 1/7 (14%) 1/7 (14%) 18 14 1 7% 1/7 (14%) 0/7 (0%) 30 14 5 36% 2/7 (29%) 3/7 (43%) Table B78 Person Matching Hit Rates for Urology Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 0 0% 0/7 (0%) 0/7 (0%) Top 10 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 0 0% 0/7 (0%) 0/7 (0%) Top 20 150 14 0 0% 0/7 (0%) 0/7 (0%) 18 14 0 0% 0/7 (0%) 0/7 (0%) 30 14 0 0% 0/7 (0%) 0/7 (0%) Table B79 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Urology Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 253.29 68.25 309.00 71.36 55.71 3.11 161 367 18 6.05 2.43 10.59 3.40 4.54 .97 3.20 9.75 30 35.43 9.03 51.00 10.49 15.57 1.46 18 47

287 Table B80 Standard Scoring Hit Rates for Urology for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 1/14 7% 40% .19 Poor 0/7 (0%) 1/7 (14%) Second 3/14 21% 16% .76 Good 1/7 (14%) 2/7 (29%) Third 1/14 7% 4% 1.0 Very Good 0/7 (0%) 1/7 (14%) Fourth 1/14 7% 8% 1.0 Very Good 0/7 (0%) 1/7 (14%) Fifth 1/14 7% 4% 1.0 Very Good 1/7 (14%) 0/7 (0%) Total Top 5 7/14 50% 72% .57 Moderate 2/7 (29%) 5/7 (71%) Table B81 Person Matching Hit Rates for Internal Medicine Pediatrics Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 20 1 5% 1/10 (10%) 0/10 (0%) 18 20 4 20% 0/10 (0%) 4/10 (40%) 30 20 3 15% 1/10 (10%) 2/10 (20%) Top 5 150 20 10 50% 4/10 (40%) 6/10 (60%) 18 20 6 30% 1/10 (10%) 5/10 (50%) 30 20 8 40% 4/10 (40%) 4/10 (40%) Top 10 150 20 13 65% 6/10 (60%) 7/10 (70%) 18 20 10 50% 2/10 (20%) 8/10 (80%) 30 20 12 60% 6/10 (60%) 6/10 (60%) Top 20 150 20 15 75% 7/10 (70%) 8/10 (80%) 18 20 14 70% 6/10 (60%) 8/10 (80%) 30 20 15 75% 7/10 (70%) 8/10 (80%) Table B82 Person Matching Hit Rates for Internal Medicine Pediatrics Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 20 0 0% 0/10 (0%) 0/10 (0%) 18 20 2 10% 0/10 (0%) 2/10 (20%) 30 20 3 15% 2/10 (20%) 1/10 (10%) Top 10 150 20 2 10% 1/10 (10%) 1/10 (10%) 18 20 1 5% 0/10 (0%) 1/10 (10%) 30 20 2 10% 1/10 (10%) 1/10 (10%) Top 20 150 20 1 5% 0/10 (0%) 1/10 (10%) 18 20 1 5% 0/10 (0%) 1/10 (10%) 30 20 0 0% 0/10 (0%) 0/10 (0%)

288 Table B83 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Internal Medicine Pediatrics Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 229.20 103.44 277.90 120.41 48.70 16.97 113 512 18 7.57 4.40 12.37 6.30 4.8 1.9 2.04 18.58 30 33.10 20.05 49.00 24.85 15.90 4.8 13 88 Table B84 Standard Scoring Hit Rates for Internal Medicine Pediatrics for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 0/20 0% 0% 1.0 Very Good 0/10 (0%) 0/10 (0%) Second 0/20 0% 0% 1.0 Very Good 0/10 (0%) 0/10 (0%) Third 0/20 0% 0% 1.0 Very Good 0/10 (0%) 0/10 (0%) Fourth 0/20 0% 0% 1.0 Very Good 0/10 (0%) 0/10 (0%) Fifth 0/20 0% 0% 1.0 Very Good 0/10 (0%) 0/10 (0%) Total Top 5 0/20 0% 0% 1.0 Very Good 0/10 (0%) 0/10 (0%) Table B85 Person Matching Hit Rates for Child and Adolescent Psychiatry Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Found at Least Once in the Top 1, 5, 10, or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top Match 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 5 150 10 1 10% 1/5 (20%) 0/5 (0%) 18 10 1 10% 0/5 (0%) 1/5 (20%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 10 150 10 2 20% 1/5 (20%) 1/5 (20%) 18 10 1 10% 0/5 (0%) 1/5 (20%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 20 150 10 2 20% 1/5 (20%) 1/5 (20%) 18 10 1 10% 0/5 (0%) 1/5 (20%) 30 10 1 10% 1/5 (20%) 0/5 (0%)

289 Table B86 Person Matching Hit Rates for Child and Adolescent Psychiatry Utlizing the 150 Items, 18 Scales, and 30 Items Where the Medical Specialty Entered Is Dominant in the Top 5, 10 or 20; Overall and by Gender Calculation n Correct Hit Rate Females Males Top 5 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 10 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Top 20 150 10 0 0% 0/5 (0%) 0/5 (0%) 18 10 0 0% 0/5 (0%) 0/5 (0%) 30 10 0 0% 0/5 (0%) 0/5 (0%) Table B87 Person Matching Means, Standard Deviations, and Top Match High and Low Scores for Child and Adolescent Psychiatry Utlizing 150 Items, 18 Scales, and 30 Items First Match Twentieth Match M SD Low High Calculation M SD M SD ∆ ∆ Score Score 150 236.90 98.27 298.40 109.18 61.50 10.91 125 464 18 7.76 3.71 13.83 5.59 6.07 1.88 2.61 14.33 30 33.10 13.12 49.50 17.90 16.40 4.78 19 66 Table B88 Standard Scoring Hit Rates for Child and Adolescent Psychiatry for the Top 5 Matches Including Kappa Coefficients; Overall and by Gender Match n Observed Expected Kappa Females Males First 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Second 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Third 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Fourth 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Fifth 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%) Total Top 5 0/10 0% 0% 1.0 Very Good 0/5 (0%) 0/5 (0%)

REFERENCES

REFERENCES Betz, N. E., & Rottinghaus, P. J. (2006). Current research on parallel measures of interests and confidence for basic dimensions of vocational activity. Journal of Career Assessment, 14(1), 56-76. Bhattacharya, J. (2005). Specialty selection and lifetime returns to specialization within medicine. Journal of Human Resources, 40(1), 115-143. Borges, N. (2007). Behavioral exploration of career and specialty choice in medical students. Career Development Quarterly, 55(4), 351-358. Borges, N., Gibson, D., & Karnani, R. (2005). Job satisfaction of physicians with congruent versus incongruent specialty choice. Evaluation & the Health Professions, 28(4), 400-413. Borges, N., & Osmon, W. (2001). Personality and medical specialty choice: Technique orientation versus people orientation. Journal of Vocational Behavior, 58(1), 2235. Borges, N., & Savickas, M. (2002). Personality and medical specialty choice: A literature review and integration. Journal of Career Assessment, 10(3), 362-380. Borges, N., Savickas, M., & Jones, B. (2004). Holland's theory applied to medical specialty choice. Journal of Career Assessment, 12(2), 188-206. Borges, N., Stratton, T., Wagner, P., & Elam, C. (2009). Emotional intelligence and medical specialty choice: Findings from three empirical studies. Medical Education, 43(6), 565-572. 291

292

Brott, P. E. (2001). The storied approach: A postmodern perspective for career counseling. Career Development Quarterly, 49(4), 304-313. Brott, P. E. (2004). Constructivist assessment in career counseling. Journal of Career Development, 30(3), 189-200. Bujold, C. (2004). Constructing career through narrative. Journal of Vocational Behavior, 64(3), 470-484. Campbell, D. (1966). Re-analysis of Strong's interest data from medical specialists (Publication No. BR-5-8404). Minnesota University; Minneapolis, MN: United States Office of Education. Campbell, D., & Borgen, F. (1999). Holland's theory and the development of interest inventories. Journal of Vocational Behavior, 55(1), 86-101. Case, J. C., & Blackwell, T. L. (2008). Review of the Strong Interest Inventory. Rehabilitation Counseling Bulletin, 51(2), 122-126. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. Cohen, L., Duberley, J., & Mallon, M. (2004). Social constructionism in the study of career: Accessing the parts that other approaches cannot reach. Journal of Vocational Behavior, 64(3), 407-422. Creed, P., Patton, W., & Prideaux, L. (2006). Causal relationship between career indecision and career decision-making self-efficacy: A longitudinal cross-lagged analysis. Journal of Career Development, 33(1), 47-65.

293

Crites, J. (1969). Vocational psychology: The study of vocational behavior and development. New York: McGraw-Hill. Cronbach, L., & Gleser, G. (1953). Assessing similarity between profiles. The Psychological Bulletin, 50(6), 456-473. Donnay, D. (1997). E.K. Strong's legacy and beyond: 70 years of the Strong Interest Inventory. Career Development Quarterly, 46(1), 2-22. Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7(3), 286-299. Furnham, A. (2001). Vocational preference and P-O fit: Reflections on Holland's theory of vocational choice. Applied Psychology, 50(1), 5-29. Glavin, K., Richard, G., & Porfeli, E. (2009). Predictive validity of the Medical Specialty Preference Inventory. Journal of Vocational Behavior, 74(1), 128-133. Glavin, K., Richard, G., & Savickas, M. (2007). The predictive validity of the Medical Specialty Preference Inventory. Prepared for presentation at the 4th Careers in Medicine Professional Development Conference: Association of American Medical Colleges. Washington D.C. Gottfredson, G. D., & Johnstun, M. L. (2009). John Holland's contributions: A theoryridden approach to career assistance. Career Development Quarterly, 58(2), 99107. Gough, H. (1979). Gough Medical Specialty Preference Scales: A report for counselors. Palo Alto, CA: Consulting Psychologists Press.

294

Guilford, J. P. (1952). When not to factor analyze. Psychological Bulletin, 49(1), 26-37. Hartung, P. (2005). Toward integrated career assessment: Using story to appraise career dispositions and adaptability. Journal of Career Assessment, 13(4), 439-451. Hartung, P., & Blustein, D. (2002). Reason, intuition, and social justice: Elaborating on Parson's career decision-making model. Journal of Counseling & Development, 80(1), 41-47. Hartung, P., Borges, N., & Jones, B. (2005). Using person matching to predict career specialty choice. Journal of Vocational Behavior, 67(1), 102-117. Holland, J. L. (1958). A personality inventory employing occupational titles. Journal of Applied Psychology, 42(5), 336-342. Holland, J. L. (1961). Some explorations with occupational titles. Journal of Counseling Psychology, 8(1), 82-87. Holland, J. L. (1966a). A psychological classification scheme for vocations and major fields. Journal of Counseling Psychology, 13(3), 278-288. Holland, J. L. (1966b). The psychology of vocational choice: A theory of personality types and model environments. Waltham, MA: Blaisdell Publishing Company. Holland, J. L. (1967). Predicting a student's vocational choice (ACT Research Report #18). Iowa City, Iowa: American College Testing Program. Holland, J. L. (1971). A theory ridden, computerless, impersonal vocational guidance system. Journal of Vocational Behavior, 1(2), 167-176.

295

Holland, J. L., Powell, A. B., & Fritzsche, B. A. (1994). The Self-Directed Search professional user's guide (4th ed.). Odessa, FL: Psychological Assessment Resources. Holland, J. L., Whitney, D., Cole, N., & Richards, J. J. (1969). An empirical occupational classification derived from a theory of personality and intended for practice and research (Report No. 29). Iowa City, IA: American College Testing Program. Ihle-Helledy, K., Zytowski, D., & Fouad, N. (2004). Kuder career search: Test-retest reliability and consequential validity. Journal of Career Assessment, 12(3), 285297. Jarecky, R. K., Schwartz, R. W., Haley, J., & Donnelly, M. B. (1991). Stability of medical specialty selection at the University of Kentucky. Academic Medicine, 66(12), 756-761. Kuder, F. (1977a). Activity interests and occupational choice. Chicago, IL: Science Research Associates. Kuder, F. (1977b). Career matching. Personnel Psychology, 30(1), 1-4. Lambert, T. W., Davidson, J. M., Evans, J., & Goldacre, M. J. (2003). Doctors' reasons for rejecting initial choices of specialties as long-term careers. Medical Education, 37(4), 312-318. Lambert, T. W., Goldacre, M. J., Davidson, J. M., & Parkhouse, J. (2001). Graduate status and age at entry to medical school as predictors of doctors’ choice of longterm career. Medical Education, 35(5), 450-454.

296

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. Larson, L. M., Rottinghaus, P. J., & Borgen, F. H. (2002). Meta-analyses of big six interests and big five personality factors. Journal of Vocational Behavior, 61(2), 217-239. Leong, F. T., Hardin, E. E., & Gaylor, M. (2005). Career specialty choice: A combined research-intervention project. Journal of Vocational Behavior, 67(1), 69-86. MacDonald, D. (2008). A dynamical successor to modernism and postmodernism. Counseling and Values, 52(2), 145-155. Markert, R. J. (1983). Stability and change of medical specialty choice in U.S. medical schools. Journal of Medical Education, 58(7), 589-590. Mc Guire, F. L. (1961). The Kuder Preference Record -- personal as a measure of personal adjustment. Journal of Clinical Psychology, 17(1), 41-42. Nauta, M. M. (2010). The development, evolution, and status of Holland’s theory of vocational personalities. Journal of Counseling Psychology, 57(1), 11-22. Newton, D. A., Grayson, M. S., & Whitley, T. W. (1998). What predicts medical student career choice? Journal of General Internal Medicine, 13(3), 200-203. O'Brien, K. M. (2001). The legacy of Parsons: Career counselors and vocational psychologists as agents of social change. Career Development Quarterly, 50(1), 66-76. Peavy, R. V. (1995) Constructivist career counseling. Retrieved from ERIC database. (ED401504)

297

Porfeli, E. J., Richard, G. V., & Savickas, M. L. (2010). Development of specialization scales for the MSPI: A comparison of empirical and inductive strategies. Journal of Vocational Behavior, 77(2), 227-237. Reed, V. A., Jernstedt, G. C., & Reber, E. S. (2001). Understanding and improving medical student specialty choice: A synthesis of the literature using decision theory as a referent. Teaching and Learning in Medicine, 13(2), 117-129. Richard, G. (2005). Manual for the Medical Specialty Preference Inventory 2nd edition. Washington D.C.: Careers in Medicine. Richard, G. (2010). Manual for the Medical Specialty Preference Inventory, Revised edition. Washington D.C.: Association of American Medical Colleges. Richard, G., Savickas, M., Early, L., Calli, J., Englert, C., & Bono, J. (2007). Manual for the specialty indecision scale (2nd ed.). Washington, DC: Association of American Medical Colleges. Rogers, M. E., Creed, P. A., & Searle, J. (2009). The development and initial validation of social cognitive career theory instruments to measure choice of medical specialty and practice location. Journal of Career Assessment, 17(3), 324-337. Rottinghaus, P. J. (2009). The Kuder Skills Assessment-College and Adult Version: Development and initial validation in a college business sample. Journal of Career Assessment, 17(1), 56-68. Savickas, M. L. (1999). The psychology of interests. In M. L. Savickas, & A. R. Spokane (Eds.), Vocational interests: Meaning, measurement and counseling use (pp. 1956). Palo Alto, CA: Davies-Black.

298

Savickas, M. L., Alexander, D., Jonas, A., & Wolf, F. (1986). Difficulties experienced by medical students in choosing a specialty. Journal of Medical Education, 61(6), 467-469. Savickas, M. L., Alexander, D., Osipow, S., & Wolf, F. (1985). Measuring specialty indecision among career-decision students. Journal of Vocational Behavior, 27(3), 356-367. Savickas, M. L., Brizzi, J., Brisbin, L., & Pethtel, L. (1988). Predictive validity of two medical specialty preference inventories. Measurement and Evaluation in Counseling and Development, 21(3), 106-112. Savickas, M. L., Taber, B., & Spokane, A. (2002). Convergent and discriminant validity of five interest inventories. Journal of Vocational Behavior, 61(1), 139-184. Scott, I., Gowans, M. C., Wright, B., & Brenneis, F. (2007). Why medical students switch careers: Changing course during the preclinical years of medical school. Canadian Family Physician Médecin De Famille Canadien, 53(1), 94-95. Seling, M. (1979). Syncrisis: Investigations of a new assessment procedure. Unpublished Doctoral Dissertation, Iowa State University, University Microfilms International. Sodano, S., & Richard, G. (2009). Construct validity of the Medical Specialty Preference Inventory: A critical analysis. Journal of Vocational Behavior, 74(1), 30-37. Sodano, S., Savickas, M., & Richard, G. (2007). Revision and preliminary validation of the Medical Specialty Preference Inventory. Paper presented at the 8th Biennial Meeting of the Society for Vocational Psychology. Akron, OH.

299

Stilwell, N. A., & Wallick, M. M. (2000). Myers-Briggs type and medical specialty choice: A new look at an old question. Teaching & Learning in Medicine, 12(1), 14-20. Stratton, T. D., Witzke, D. B., Elam, C. L., & Cheever, T. R. (2005). Learning and career specialty preferences of medical school applicants. Journal of Vocational Behavior, 67(1), 35-50. Strong, E. K. (1943). Vocational interests of men and women. Stanford University, CA: Stanford University Press. Strong, E. K., & Tucker, A. C. (1952). Use of vocational interest scales in planning a medical career. Psychological Monographs, 66(9), 9-341. Su, R., Rounds, J., & Armstrong, P. I. (2009). Men and things, women and people: A meta-analysis of sex differences in interests. Psychological Bulletin, 135(6), 859884. Tardiff, K., Cella, D., Seiferth, C., & Perry, S. (1986). Selection and change of specialties by medical school graduates. Journal of Medical Education, 61(10), 760-796. Tucker, A. C., & Strong, E. K. (1962). Ten-year follow-up of vocational interest scores of 1950 medical college seniors. Journal of Applied Psychology, 46(2), 81-86. Walsh, W. B., & Savickas, M. L. (Eds.). (2005). Handbook of vocational psychology: Theory, research, and practice (3rd ed.). Mahwah, N.J.: Lawrence Erlbaum Associates. Zimny, G. H. (1979). Manual for the Medical Specialty Preference Inventory. Saint Louis, MO: Saint Louis University School of Medicine.

300

Zimny, G. H. (1980). Predictive validity of the Medical Specialty Preference Inventory. Medical Education, 14(6), 414-418. Zimny, G. H. (2002). Updating the Medical Specialty Preference Inventory. Unpublished manuscript. Retrieved from the American Association of Medical Colleges’ Careers in Medicine website on October 30, 2010 from http://www.aamc.org/programs/cim/mspi2002.pdf Zimny, G. H., & Senturia, A. (1973). Medical student utilization of the Medical Specialty Preference Inventory. Journal of Medical Education, 48(11), 1019-1020. Zimny, G. H., & Senturia, A. (1974). A longitudinal study of consistency of medical student specialty choice. Journal of Medical Education, 49(12), 1179-1180. Zimny, G. H., & Shelton, B. R. (1982). Sex differences in medical specialty preferences. Journal of Medical Education, 57(5), 403-405. Zytowski, D. G. (1992). Three generations: The continuing evolution of Frederic Kuder's interest inventories. Journal of Counseling & Development, 71(2), 245-248. Zytowski, D. G. (2001). Frank Parsons and the progressive movement. Career Development Quarterly, 50(1), 57-65.