Innovations in Integrity-Based Personnel ... - Wiley Online Library

5 downloads 1480 Views 177KB Size Report
forces of information technology, process reengineering, ... used as the primary basis for employment decisions. ... This technology enables job candidates.
INTERNATIONAL JOURNAL OF SELECTION AND ASSESSMENT

VOLUME 10

NUMBERS 1/2

MARCH/JUNE 2002

Innovations in Integrity-Based Personnel Selection: Building a Technology-Friendly Assessment John W. Jones, Eric E. Brasher and Joseph W. Huff NCS Pearson, Chicago, Ilinois

We are currently observing a paradigm shift in the composition of personnel selection inventories from lengthy paper-and-pencil administration and scoring to innovative applications of information technology. These technological advances have been driven primarily by strong demands from human resource professionals for enhancements in speed, effectiveness, and cost containment. In response, an existing pre-employment selection assessment was modified to make it more concise, economical, and `technology-friendly'. The resulting integrity-based assessment system, the Applicant Potential Inventory or API (London House 1997), has been administered and scored using a wide variety of technologies, including telephone, personal computer, fax, and the Internet. This article describes the initial development of the API and reviews nine field studies conducted to examine its reliability, validity, fairness, and financial impact.

Introduction

M

any human resource (HR) systems are presently being modified so they can be administered and scored using various forms of computer technology. The evolution of the virtual HR department is driven by the forces of information technology, process reengineering, high speed management, networked organizations, knowledge workers, and globalization (Jones 1998). This trend, popularly referred to as `virtual HR', has already had a dramatic impact on many traditional HR applications (Harris 2000; The Hunter Group 2000). Human resource professionals are demanding increased speed, flexibility, effectiveness, and cost containment from all the systems they utilize. Training programs traditionally presented by live instructors are being converted to multimedia. Organizations are collecting performance appraisals through company intranets and storing results in human resource information system (HRIS) databases. Many pre-employment screening tests and interviews are being created or modified for administration and scoring via telephones, fax, personal computers, and most recently the Internet. * Address for correspondence: John W. Jones, NCS Pearson, One North Dearborn, Chicago, IL 60602-4431, USA. e-mail: [email protected]

87

This article describes the development of a technologyfriendly personnel selection system that can be administered to job applicants through a variety of media, most notably a computer-based interactive voice recognition (IVR) system. The Applicant Potential Inventory (API) was developed to assist human resource professionals with their responsibilities for selecting and hiring high performing employees. The API is an integrity-oriented assessment that was derived from the Personnel Selection InventoryTM or PSI (London House 1996) and designed as a concise, flexible, and economical alternative to more traditional assessment systems. The API development process incorporated a reduction of items per scale, a decrease in the reading level, a shortening of response scales, and the provision for telephonic and web-based administration, as well as fax-back scoring and reporting. In addition, a variety of research studies are presented as evidence supporting the API assessment system. The first study examines the relationship between the API and the longer paper-and-pencil instrument from which it was derived. Second, the reliability and validity of the API are examined under different methods of administration. Third, archival API data are presented to compare the assessment results for major demographic groups. The next five studies present research that examines the validity of the API assessment system in ß Blackwell Publishers Ltd 2002, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA.

88

JOHN W. JONES, ERIC E. BRASHER AND JOSEPH W. HUFF

predicting employee behavior, performance, and tenure. In the final study, a program evaluation methodology is presented to demonstrate the return on investment gained by an organization from increasing employee tenure.

Development of the API The Applicant Potential Inventory (API) was originally developed by modifying a version of the Personnel Selection Inventory (PSI), a well-established and thoroughly validated integrity-oriented assessment system. With its primary emphasis on decreasing employee counterproductivity, research in a variety of retail and service organizations has shown the PSI to be a reliable and valid screening tool (cf. Jones 1991; Martin and Boye 1998; Martin and Godsey 1999; Martin and Lehnen 1992; McDaniel and Jones 1988; Ones, Viswesvaran and Schmidt 1993). The PSI scales help measure a variety of job-related attitudes and other characteristics that can be used to effectively predict counterproductive work behaviors and hire better performing employees. The API is a modified version of the PSI-7ST (London House 1992), a 108-item inventory comprised of the following scales: Honesty, Drug Avoidance, Employee Relations, Work Values, Supervision Attitudes, Safety, and Tenure. The PSI-7ST also contains the Employability Index, a composite of the above scales that is typically used as the primary basis for employment decisions. Finally, the PSI-7ST includes two scales (i.e. Candidness and Accuracy) that help determine whether an applicant's assessment results should be interpreted. The Candidness and Accuracy scales do not contribute to the Employability Index.

Interactive Voice Recognition (IVR) The API was designed so it could be effectively administered using an Interactive Voice Recognition (IVR) system. This technology enables job candidates to complete tests by calling a toll-free 800 number. Candidates respond to assessment questions using the touch-tone keys on their telephone. The responses are recorded and automatically scored by computer. Employers can receive candidates' results by fax or by accessing another automated phone system. The IVR system offers paperless testing, automated administration, 24-hour testing availability, multiple reporting options and timely feedback.

Item Selection The initial goal in assessment development was to reduce the number of items for the IVR version. We suspected that the audio administration required by the IVR system

International Journal of Selection and Assessment

would be more cognitively demanding for applicants than paper-and-pencil administration. These initial analyses used a database of 1,780 job applicants who completed the PSI-7ST in a variety of industries. Six items from each PSI scale were selected for the API based on their item-total correlations. The items were carefully chosen to represent major theories behind each scale. In addition, two counterproductive behavior admission items were included in the API. The API assessment was then supplemented with a Customer Service scale due to its likely use by retail employers. This scale was created by drawing 12 items from the Customer Service ProfileTM or CSP (London House 1994). The CSP includes three scales measuring customer service attitudes, customer service aptitudes and sales attitudes. Items were chosen based primarily on item-total correlations and reading level. Care was also taken that the selected items adequately represented all of the major theoretical concepts used to develop the original scales. In total, the API includes 64 items: 56 loading on the Employability Index, 6 items loading on the Candidness scale, and 2 counterproductive behavior admission items.

Simplifying Items and Response Format To further simplify the instrument for eventual IVR administration, a common response format was created for all items. This process involved collapsing the response categories from six or seven choices to a four-choice format (Strongly Agree, Slightly Agree, Slightly Disagree, Strongly Disagree) for all items except for the admissions items. The API was further simplified by lowering the reading grade level of some items. This was done by reducing the number of polysyllabic words in the PSI items. According to the SMOG Index of reading difficulty, the reading grade level of the API items was decreased from a seventh grade level to a fifth grade level. Items from each scale were then intermingled throughout the assessment.

Final Assessment Scales The API assessment consists of eight content scales, two validity scales, and one composite. The content scales are: (a) Honesty ± helps measure an applicant's attitudes toward theft and previous theft-related behavior in the workplace. (b) Drug Avoidance ± helps measure the likelihood that the applicant will not sell or use illegal drugs on the job. (c) Employee Relations ± helps measure an applicant's tendencies toward being courteous and cooperative with customers and co-workers. (d) Safety ± helps measure an individual's level of safety consciousness.

ß Blackwell Publishers Ltd 2002

TECHNOLOGY-FRIENDLY ASSESSMENT

89

(e) Work Values Scale ± helps measure an applicant's attitude toward work and positive work habits. (f) Supervision Attitudes ± helps measure the likelihood that the applicant will respond appropriately to supervision. (g) Tenure ± helps measure the likelihood that the applicant will not quit a position after a short period of time. (h) Customer Service ± helps measure an applicant's attitudes toward customers and understanding of how to effectively interact with customers.

Method. The study was conducted with applicants for employment at a parking garage located in a large Midwestern city. Both the API and the PSI were administered to 106 job applicants for a variety of positions including cashier, porter, hiker, and supervisor. The PSI was administered in paper-and-pencil form. The API items were recorded onto an audiotape which was played to all applicants to elicit their responses. Due to missing data, a total of 100 API and 70 PSI assessments were used in the final analysis.

The validity scales are:

Results and discussion. Descriptive statistics and reliability coefficients are shown in Table 1. The reliability coefficients for the API were adequate and comparable to those of its parent instrument. The API yielded an alpha coefficient of .88 for the Employability Index which is similar to the reliability estimate for the PSI Employability Index (alpha = .92). Reliability coefficients for the Candidness scales of the two inventories were also comparable, with .71 for the API, and .76 for the PSI. The overall correlation between the Employability Index scores of the API and the PSI was .81 (p < .001). As shown in Table 2, the scale inter-correlations ranged from .57 to .75 and all were significant at the p < .01 level. Results of the item-to-item correlations were also favorable with all but 3 of the 46 overlapping items producing significant correlations (p < .05) that ranged from .22 to .75, with an average correlation of .49. In conclusion, the results of this study suggest that the shorter audio-administered assessment (API) is an acceptable substitute for the longer paper-and-pencil administered assessment (PSI).

(a) Candidness ± is designed to detect extreme response tendencies to exaggerate positive qualities and minimize negative ones. (b) Accuracy ± is designed to detect unusual response patterns which might result if a candidate failed to understand the questions or complete the instrument carefully. Finally, an Employability Index is computed from an algebraic combination of the eight content scales. Each content scale score is normalized and the scales are weighted equally in computing the index. The Employability Index is an overall score indicating an applicant's general suitability for hire relative to appropriate normative data. Higher scores are considered preferable on all scales. The API scales are reported as normalized standard scores that range from 1 to 99, with a mean of 50 and standard deviation of 20.

API Research Studies Study 1: Evaluating the Shortened Audio Format

Study 2: Comparing Paper-and-Pencil and Audio Administration Formats

The aim of this study was to demonstrate some degree of similarity between key features of the audio-administered API assessment and a longer paper-and-pencil instrument, the PSI-7S (London House 1992). The PSI assessment contained the original unmodified versions of almost all the API scales and items. The audiotape administration format used in this study served as a proxy or analog for the actual IVR administration.

This study examined the reliability and validity of the API in three administration conditions: audiotape administration only, paper-and-pencil plus audiotape administration, and paper-and-pencil administration only. In addition to the API, all subjects completed a counterproductivity checklist. This checklist used a 4point Likert-type response scale (ranging from Seldom to Very Often) to measure the approximate number of

Table 1. Descriptive and reliability statistics for API and PSI (Study 1) Inventory PSI-7S PSI-7S API (Audio) API (Audio)

Scale Employability Index Candidness Employability Index Candidness

Mean

SD

Range

Reliability

52.41 48.12 51.32 52.65

24.56 25.39 19.92 19.69

3 to 99 3 to 99 9 to 99 12 to 99

0.92 0.76 0.88 0.71

Note: The statistics in this table represent normalized standard scores.

ß Blackwell Publishers Ltd 2002

Volume 10 Numbers 1/2 March/June 2002

90

JOHN W. JONES, ERIC E. BRASHER AND JOSEPH W. HUFF

Table 2. Correlations between API scales and PSI scales (Study 1) Scale Validity/Candidness Honesty Drug Avoidance Customer/Employee Relations Safety Work Values Supervision Attitudes Employability Index

n

r

Lower

101 104 104 104 102 105 103 101

.74* .73* .64* .63* .75* .57* .81* .81*

.64 .63 .51 .50 .65 .42 .73 .73

95% CI

Upper .82 .81 .74 .73 .82 .69 .87 .87

Notes: * p < .01 Key: n = sample size for correlation, r = correlation coefficient, 95% CI = lower and upper bounds of the 95% confidence interval around r. times the individual exhibited certain undesirable behaviors during the last month (e.g. stealing from the job, taking unauthorized work breaks, arguing with customers and/or coworkers). Method. Subjects included 123 undergraduates (60 females and 63 males) obtained through the subject pool at a large urban Midwestern university. Most individuals were employed within the past year. Subjects were randomly assigned to one of the three administration groups. The constraints to random assignment were: (1) an approximately equal number of males and females for each administration condition, and (2) all subjects signing up for a particular time were assigned to the same randomly-selected experimental condition. In all conditions, subjects were given a packet containing the research materials along with instructions for how to complete each measure. For Group 1, subjects listened to the API on audiotape and responded using an answer sheet which did not have the survey items listed. For Group 2, the subjects were given a paper-and-pencil version of the API which was accompanied by the

audiotape. Group 3 received only the paper-and-pencil version of the API. After completion of the API, all groups completed the counterproductivity checklist. Results and discussion. Cronbach's alpha was computed as a reliability estimate for the API and counterproductivity checklist within each of the three administration groups. These coefficients are shown in Table 3. To test whether the coefficients were significantly different, Fisher's r-to-z transformations were performed and no significant differences were found for either measure. Thus, for this sample, administering the API by audiotape did not significantly attenuate the reliability of the instrument relative to paper-and-pencil administration. As a rough validity estimate, the API Employability Index score and the composite measure of counterproductivity were correlated for each of the administration groups. Higher scores on the checklist are indicative of less counterproductivity. As shown in Table 3, significant correlations (p < .05) were found between the Employability Index and the counterproductivity measure in all three groups. Fisher's r-to-z transformations were

Table 3. Reliability and validity estimates for three administration conditions (Study 2) Administration Audiotape only Paper/pencil and audiotape Paper/pencil only

n

1

2

r

38 35 38

.72 .81 .74

.78 .78 .83

.29* .52* .60*

95% CI Lower Upper ÿ.03 .23 .35

.56 .73 .77

Notes: * p < .05. Key: n = group sample size, 1 = 1 Cronbach's alpha reliability coefficient for the API Employability Index, 2 = Cronbach's alpha reliability coefficient for the counterproductivity checklist, r = correlation between API Employability Index and counterproductivity checklist, 95% CI = lower and upper bounds of the 95% confidence interval around r.

International Journal of Selection and Assessment

ß Blackwell Publishers Ltd 2002

TECHNOLOGY-FRIENDLY ASSESSMENT

performed and the differences between correlations were not statistically significant. However, it should be noted that the test for significant differences was not very powerful with such small groups. In conclusion, this study provides some preliminary support for the API as a reliable and valid assessment system regardless of whether it is administered in paper-and-pencil or audio format.

Study 3: Adverse Impact and Demographic Group Differences for the API Studies with the PSI (the parent instrument of the API) have consistently shown that its use in selection should not adversely impact protected groups (e.g. Joy 1991; Ones and Viswesvaran 1998). The aim of this study was to evaluate demographic group differences on scale scores and success rates using API job applicant data. Method. The analyses were conducted on cases from the NCS Pearson client database. This study includes all scored API data between 1 December 1998 and 30 November 2000 for which race, sex, or age demographics were available. In this date range, the database contained 82,757 cases with sex data, 78,012 cases with race data, and 71,392 cases with age data. None of these demographic data were known for over half of the available cases. Fairness toward demographic groups was examined using two methods. The first method uses the Equal Employment Opportunity Commission's four-fifths rule of thumb (EEOC 1978). This guideline asserts that the success rate for any single demographic group should be at least 80% (four-fifths) of the success rate for the demographic group with the highest rate. Although

91

clients differ somewhat in their use of selection decision models, a single model was chosen for the reported analyses. This model requires successful candidates to meet or exceed minimum normalized standard scores of 10 on the Accuracy scale, 20 on the Candidness scale and 35 on the Employability Index. In the second approach, subgroup (race, sex, and age) mean differences on API Employability Index scores were examined. As noted by Ones and Viswesvaran (1998), adverse impact is as much a function of personnel selection system characteristics as group differences on any given test. Therefore, it is valuable to go beyond adverse impact analyses that are necessarily dependent on a specific selection model. The comparisons reported in this study were made using standardized mean group differences (i.e. d values or effect sizes). Results and discussion. The results of the adverse impact analyses are shown in Table 4. A higher proportion of females successfully met the API cutoffs, but the resulting fairness ratio easily exceeds the EEOC standard of .80. Individuals identified as under 40 years of age had a lower success rate, but still over 80% of the higher rate. In the analyses by race, White applicants had the highest success rate but the fairness ratios for the other four demographic groups were all above the .80 rule of thumb. It should be noted that the only two groups (Asians and Native Americans) with fairness ratios less than .95 each comprised less than 2% of the sample. Overall, these data strongly support the API assessment system's fairness to demographic groups based on race, sex, and age. The results of the group differences analyses are shown in Table 5. According to Cohen (1977), effect sizes

Table 4. API success rates and fairness ratios by sex, age, and race (Study 3) Category

Sample size

Proportion of sample

Proportion successful

Fairness ratio

By sex Male Female

82,737 60,903 21,834

.736 .264

.829 .861

.963 N/A

By age Under 40 40 and Over

71,392 64,562 6,830

.904 .096

.833 .871

.956 N/A

By race White Hispanic Black Asian Native American

78,012 40,603 19,026 16,563 1,488 332

.520 .244 .212 .019 .004

.856 .819 .830 .744 .783

N/A .957 .970 .869 .915

ß Blackwell Publishers Ltd 2002

Volume 10 Numbers 1/2 March/June 2002

92

JOHN W. JONES, ERIC E. BRASHER AND JOSEPH W. HUFF

Table 5. Sex, race, and age differences in API scores among job applicants (Study 3) Group 1

Group 2

Women 40 years or older Blacks Hispanics Asians Native Americans

Men Under 40 years Whites Whites Whites Whites

n1

n2

x1

x2

SD1

21,834 6,830 16,563 19,026 1,488 322

60,903 64,562 40,603 40,603 40,603 40,603

59.43 59.51 57.42 57.84 50.70 55.27

57.43 57.99 58.74 58.74 58.74 58.74

18.68 18.68 19.44 19.26 19.41 19.41

SD2 SDpooled 19.64 19.53 19.40 19.40 19.40 19.40

19.39 19.45 19.41 19.36 19.40 19.40

d 0.10 0.08 ÿ0.07 ÿ0.05 ÿ0.41 ÿ0.18

Key: n = sample sizes for groups 1 and 2, x = means for groups 1 and 2, SD = standard deviations for groups 1and 2, SDpooled = pooled standard deviation, d value = standardized mean difference between the scores of the protected groups (e.g., women) and the unprotected group (e.g., men) in pooled standard deviation units. d value = (mean for protected group mean for unprotected group)/SDpooled. A positive effect size indicates that the unprotected groups have a higher mean score.

of 0.80 are large, those around 0.50 are moderate, and those around 0.20 are small. Female job applicants scored 0.10 standard deviations higher than Men on the API Employability Index. Job applicants who were 40 years old or older scored 0.08 standard deviations higher than those under 40. These practically trivial results are similar to those reported by Ones and Viswesvaran (1998) in their large-scale investigation using several overt integrity tests (including a version of the PSI). Four separate analyses were performed to examine race differences. Black job applicants scored 0.07 standard deviations lower than Whites. Hispanic job applicants scored 0.05 standard deviations lower than Whites. These practically trivial results are similar to those reported by Ones and Viswesvaran (1998). On the other hand, Asian job applicants scored 0.41 standard deviations lower than Whites. This is a fairly substantial difference and much larger than the one found by Ones and Viswesvaran, but it should be noted that less than 2% of this sample were Asian. In addition, the adverse impact analyses using a standard decision model did not show evidence of adverse impact against Asians. Finally, Native American applicants (less than 1% of sample) scored 0.18 standard deviations lower than Whites. Overall, the observed job applicant group differences suggest that use of the API for selection is unlikely to cause any adverse impact on protected minority groups.

Validation Evidence Study 4: Concurrent Validation Using IVR Technology A concurrent validation study was conducted with a fast food restaurant organization. This study examined the relationship between API scores and the job performance of entry-level employees. The assessment was

International Journal of Selection and Assessment

administered utilizing a computer-based interactive voice recognition (IVR) system. Method. A group of 98 current employees at 22 fast food restaurant locations completed the API by telephone using the IVR system. Four cases were eliminated due to invalid or incomplete responses, leaving a final sample of 94 individuals. Concurrently, data from the employees' most recent performance reviews were gathered and compiled. The performance information included supervisor ratings of ten job behavior factors: personal appearance, sense of urgency, ability to follow directions, acceptance of feedback, communication, teamwork, initiative, cooperation, physical stamina, and timeliness. Individuals were rated on each factor using a 5-point scale (5 = Outstanding, 1 = Unsatisfactory). Finally, an overall performance rating was derived by averaging the ten behavior factor ratings. Results and discussion. The uncorrected and corrected correlations between the API standard scale scores and performance criteria are presented in Table 6. The correlations were corrected for unreliability in the criterion measure using a formula suggested by Ghiselli, Campbell and Zedeck (1981). In this formula, observed correlation coefficients are divided by the square root of the estimated reliability of the criterion. We used a value of .52 for the estimated interrater reliability of supervisory ratings (Viswesvaran, Ones and Schmidt 1996). The Employability Index scores correlated significantly (p < .05) with the composite performance measure (r = .46) and nine out of the ten performance factors. All eight individual API content scales were significantly correlated with the performance composite, with corrected coefficients ranging from .25 to .49. Overall, 79 (80%) of the 99 correlations were statistically significant (p < .05). These results suggest that the API

ß Blackwell Publishers Ltd 2002

API Content Scale ** Performance Dimension

CS

.47 .29 .28 .32 .22 .53 .36 .50 .39 .46 .49

DA

(.34) .29 (.21) (.21) ÿ.01 (ÿ.01) (.20) .11 (.08) (.23) .19 (.14) (.16) .19 (.14) (.38) .29 (.21) (.26) .06 (.04) (.36) .24 (.17) (.28) .07 (.05) (.33) .33 (.24) (.35) .25 (.18)

ER .47 .25 .28 .33 .22 .54 .33 .37 .29 .49 .42

(.34) (.18) (.20) (.24) (.16) (.39) (.24) (.27) (.21) (.35) (.30)

SF .44 .17 .32 .51 .31 .44 .31 .28 .29 .39 .35

(.32) (.12) (.23) (.37) (.22) (.32) (.22) (.20) (.21) (.28) (.25)

WV .29 .11 .19 .26 .17 .35 .29 .28 .19 .40 .29

(.21) (.08) (.14) (.19) (.12) (.25) (.21) (.20) (.14) (.29) (.21)

SA .36 .17 .22 .31 .14 .43 .24 .32 .24 .42 .36

(.26) (.12) (.16) (.22) (.10) (.31) (.17) (.23) (.17) (.30) (.26)

TE .54 .19 .32 .43 .29 .50 .39 .32 .25 .47 .42

(.39) (.14) (.23) (.31) (.21) (.36) (.28) (.23) (.18) (.34) (.30)

EI .54 .21 .36 .42 .28 .54 .37 .40 .28 .53 .46

(.39) (.15) (.26) (.30) (.20) (.39) (.27) (.29) (.20) (.38) (.33)

Notes: * Correlations were corrected for unreliability. Uncorrected correlations are in parentheses. ** Correlations of .21 or higher are significant at p < .05. Correlations of .27 or higher are significant at p < .01. N = 94. Key: CS = Customer Service, HO = Honesty, DA = Drug Avoidance, ER = Employee Relations, SF = Safety, WV = Work Values, SA = Supervision, Attitudes, TE = Tenure, EI = Employability Index

93

Volume 10 Numbers 1/2 March/June 2002

Personal Appearance .28 (.20) Sense of Urgency .10 (.07) Ability to Follow Directions .18 (.13) Acceptance of Feedback .19 (.14) Communication .18 (.13) Teamwork .40 (.29) Initiative .28 (.20) Cooperation .37 (.27) Physical Stamina .12 (.09) Timeliness .36 (.26) Composite .31 (.22)

HO

TECHNOLOGY-FRIENDLY ASSESSMENT

ß Blackwell Publishers Ltd 2002

Table 6. Correlations* between API Content Scales and Supervisory Ratings (Study 4)

94

JOHN W. JONES, ERIC E. BRASHER AND JOSEPH W. HUFF

assessment system with IVR technology can be a useful tool for selecting better entry-level employees.

Study 5: Concurrent Validation with a Cleaning Service Organization A concurrent validation study was conducted with an organization that provides industrial and residential cleaning services. This study tested the relationship between API assessment results and supervisory ratings of performance. Method. Employees (N = 374) from seven different locations completed the API assessment in paper-andpencil format. These individuals were employed in a variety of cleaning positions such as housekeeping, custodian/janitors, and laundry workers. Some 52 (14%) of these individuals did not sufficiently complete the instrument or had invalid scores below standard cutoffs on the Candidness and Accuracy scales. Therefore, a total of 322 individuals were included in the final study analyses. Managers rated these individuals on several performance dimensions by using the Performance Evaluation Checklist or PEC (London House 1995). Managers rated the occurrence of 21 behaviors related to safety, customer service, honesty, drug avoidance, workplace pride, interpersonal cooperation, productivity, attendance, and tenure. In addition, the PEC contains one item requiring the manager to assess how the individual compares with other employees. Results and discussion. The correlation coefficients were corrected using the .52 estimate for the interrater reliability of supervisory ratings (Viswesvaran et al. 1996). Corrected and uncorrected correlation coefficients are shown in Table 7. The Employability Index and each of the seven API content scales were significantly

correlated (p < .001) with a performance rating composite. The corrected correlation coefficients ranged from a low of .29 for Drug Avoidance to a high of .46 for the Employability Index.

Study 6: Predictive Validation with a Cleaning Service Organization The cleaning service organization described in Study 5 also participated in a predictive validation study with a sample of 202 job applicants from another seven locations. Method. The API results were used to select all employees included in the study. Six months to one year after the employees were hired, supervisors completed a Performance Evaluation Checklist (PEC described in Study 5) for each employee included in the study. Data were provided from a range of locations including hospitals, universities, and development centers. Since the API was used for selection, only individuals who passed the standards set for the API were available for the study. Such serious restriction of range in the sample tends to reduce the magnitude of validity coefficients. Therefore, the correlation coefficients were corrected for range restriction in the predictor and unreliability in the criterion measures (Ghiselli et al. 1981; Viswesvaran et al. 1996). Results and discussion. Employee scores on the API Employability Index (EI) were correlated with supervisory ratings of productive and counterproductive behavior. As shown in Table 8, a significant relationship was found between the EI and a composite performance rating (r = .47). EI scores were significantly correlated with several ratings of individual behaviors. These behaviors included theft of company property, coming in late for work, drinking alcohol on the job, failing to use required safety

Table 7. Correlations between API scales and Composite Supervisory Rating (Study 5) API scale

r1*

r2*

Lower

Honesty Drug Avoidance Employee/Customer Relations Safety Work Values Supervision Attitudes Tenure Employability Index

.25 .21 .29 .27 .26 .21 .29 .33

.35 .29 .40 .37 .36 .29 .40 .46

.25 .19 .30 .27 .26 .19 .30 .37

95% CI

Upper .44 .39 .49 .46 .45 .39 .49 .54

Notes: N = 322 * p < .001 Key: r1 uncorrected validity coefficient, r2 validity coefficient corrected for unreliability in the criterion (supervisory rating), 95% CI = lower and upper bounds of the 95% confidence interval around r2.

International Journal of Selection and Assessment

ß Blackwell Publishers Ltd 2002

TECHNOLOGY-FRIENDLY ASSESSMENT

95

Table 8: Correlations of Supervisory Ratings with API Employability Index scores (Study 6) Performance measure Performance composite Theft of company property Coming in late to work Going out of the way to help customers Failing to use required safety equipment `Cutting corners' at work Drinking alcohol on the job

r1

r2

r3

.27** ÿ.18** ÿ.16* .13 ÿ.18** ÿ.13 ÿ.12

.34** ÿ.23** ÿ.20** .16* ÿ.23** ÿ.16* ÿ.15*

.47** ÿ.32** ÿ.27** .22** ÿ.32** ÿ.22** ÿ.21**

95% CI Lower Upper .35 ÿ.44 ÿ.39 .08 ÿ.44 ÿ.35 ÿ.34

.57 ÿ.19 ÿ.14 .35 ÿ.19 ÿ.08 ÿ.07

Notes: N = 202, * p < .05, ** p < .01 Key: r1 = uncorrected correlation, r2 = validity coefficient corrected for range restriction in the predictor (API Employability Index), r3 = validity coefficient corrected for range restriction in the predictor and unreliability in the criterion (supervisory ratings), 95% CI = lower and upper bounds of the 95% confidence interval around r3. equipment, `cutting corners' at work, and going out of the way to help customers. Finally, the Employability Index was significantly correlated with self admissions of company property theft (r = ÿ.36) and admitted drug use (r = ÿ.31). Overall, these results support the predictive validity of the API assessment system.

Study 7: Predicting Employee Performance and Counterproductivity A predictive study was conducted with an automotive supply retailer to assess the validity of the API in predicting employee performance and counterproductive behavior. Job applicants completed the assessment over the telephone using IVR technology. Method. The API results were used to select all employees included in the study, restricting the range of assessment scores. After these employees had been working for at least six months, performance evaluation forms were completed for 139 employees by their supervisors. Data provided by the company consisted of demographic data (date of hire, date of termination and termination reason) and the performance evaluation

form for each employee. The evaluation form consisted of 9 positive or productivity items and 14 negative or counterproductivity items. For each item, supervisors rated the frequency of a specific behavior using a 5-point response scale (never, rarely, occasionally, often, always). Examples of productivity items include `Shows a willingness to perform beyond the customer's expectations' and `Displayed enthusiasm for the job and pride in the workplace'. Sample counterproductivity items include `Arrived late to work without a legitimate excuse' and `Lost his/her temper with customers'. Three composites were created from these ratings: mean productivity rating, mean counterproductivity rating, and overall performance rating. Results and discussion. The API Employability Index scores were correlated with the three performance composite scores. The correlation coefficients were corrected for unreliability in the criterion measure (Ghiselli et al. 1991; Viswesvaran et al. 1996). As shown in Table 9, significant relationships (p < .01) were found for the overall performance composite (r =.32) and the productivity composite (r =.25), but not for the counterproductivity composite (r =.10) The latter

Table 9: Correlations between API Employability Index and Supervisory Ratings (Study 7) Performance measure

a

Counterproductivity Composite (14 items) 0.84 Productivity Composite (9 items) 0.94 Overall Mean Composite (23 items) 0.93

r2

r1 0.07 0.23** 0.18*

0.10 0.32** 0.25**

95% CI Lower Upper ÿ.07 .16 .09

.26 .46 .40

Notes: N = 139 * p < .05 ** p < .01 Key: ± Cronbach's alpha coefficient, r1 uncorrected validity coefficient, r2 validity coefficient corrected for unreliability in the criterion (supervisory ratings), 95% CI = lower and upper bounds of the 95% confidence interval around r2.

ß Blackwell Publishers Ltd 2002

Volume 10 Numbers 1/2 March/June 2002

96

JOHN W. JONES, ERIC E. BRASHER AND JOSEPH W. HUFF

finding may be due to the fact that detected counterproductive behaviors typically have low base rates (Wimbush and Dalton 1997). The other indicator of counterproductive behavior included in this study was involuntary termination (e.g. termination for theft, insubordination, etc.). The results indicate that 27% (13 out of 49) of the employees who had relatively low Employability Index scores (standard scores of 30-55) were terminated, while only 11% (5 out of 45) of the employees who had higher Employability Index scores (standard scores above 74) were terminated. This suggests that employees who scored higher on the API Employability Index were less likely to be terminated involuntarily. However, the associated Pearson Chi Square statistic, v2 (1, 94) = 3.60, only reached a marginal level of statistical significance (p < .10) with the small sample. Overall, this study provides further support for the API as a valid predictor of employee performance.

Study 8: Predicting Employee Tenure A follow-up study was conducted with the automotive supply retailer described in Study 8. Its primary focus was the validity of API results in predicting employee tenure after the company had been using the API assessment program for almost two years. Method. The study included 15,975 applicants who took the API and were hired during a 12-month period. The applicants were classified into two groups based on whether they received high or low scores on the Employability Index of the API. Archival tenure data (through 1 October 1999) were collected for hourly retail and service employees hired between 1 January 1998 and 31 December 1998. Therefore, the maximum possible tenure value for individuals still employed at the time of the study tenure ranged from 273 to 638 days. Results and discussion. The results indicated that applicants scoring below 40 on the API Employability Index remained with the corporation for an average of 168 days, while those scoring 40 or above remained for an average of 195 days (F(1, 15974) = 22.85, p < .001). To further explore this result, individuals with EI scores below 25 were split from those with EI scores between 25 and 39. It was then revealed that employees in the lowest scoring group had an average tenure of only 98 days, considerably less than the values of 176 days for the middle group and 195 days for the highest group. These findings suggest that using the API in selection can be valuable in hiring employees who will stay longer with the organization.

International Journal of Selection and Assessment

Study 9: Financial Impact of Longer Employee Tenure The findings from Study 8 were then used to estimate the financial impact of the API assessment system on the automotive supply retailer's bottom line. This study presents quantifiable dollar values estimating the cost savings associated with a reduced need to hire new employees, as a direct result of the observed significant relationship between the API scores and employee tenure. Method. The average tenure values for high and low API results reported in Study 8 (168 and 195 days) were translated into the number of applicants needed to be hired per position per year. The resulting estimates were 2.17 for the group with lower EI scores and 1.87 for the group with higher EI scores. We then queried the client for the approximate number of employees hired per year (25,000) and the minimal estimated cost to hire a new employee ($500). Finally, archival data from the NCS data storage and retrieval system indicated that approximately 24% of applicants tested had EI scores below 40 and the remaining 76% scored 40 or above. All of these data were then used to generate financial impact estimates. Results and discussion. Taken together, the data listed above indicated that the client organization would need to hire 900 additional employees per year if they did not use the API for selection. Using a conservative estimate of $500 to hire a single employee (cf. Grossman 2000), the cost savings associated with increasing employee tenure was estimated to be at least $450,000 per year. This value does not include any other potential benefits of using the API for selection, such as increased performance, decreased counterproductivity, and decreased accident rates. This conservative estimate of tenure-based cost savings far outweighed the direct costs associated with using the API program.

Conclusion As part of the growing trend toward Virtual HR, many pre-employment assessment systems have been modified to better utilize advances in computer technology and telephony systems. Consideration of client demands for increased speed, flexibility, and practicality led NCS Pearson to develop the `technology-friendly' API. Compared with its parent instrument (the PSI), the API featured fewer items per scale, lower reading level, fewer response options, and a provision for telephonic administration and fax-back scoring and reporting. The research studies described herein provide support for using the API as an employee selection tool. It appears that this instrument retains many of the positive

ß Blackwell Publishers Ltd 2002

TECHNOLOGY-FRIENDLY ASSESSMENT

features previously identified with the PSI, such as reliability and fairness to demographic groups. Collectively, the five validation studies suggest that API results are predictive of work performance, employee counterproductive behavior, and job tenure. In three of these five studies, the API was administered through Interactive Voice Activation (IVR) technology. The observed findings related to tenure were used to estimate the financial benefits and bottom line savings of using the API assessment system. In closing, this article summarized a series of studies predominantly carried out in the field. We freely acknowledge that any single study inevitably exhibits some blemishes in terms of research confounds and/or design challenges to internal validity. However, the consistent pattern of favorable results across the various studies represents a promising start. In addition, since the majority of studies were conducted with `real-world' corporations, few could reasonably challenge the external validity of these findings. Still, validation efforts continue with the API and eventually the results can be accumulated for a strong meta-analytic investigation. We hope that test developers will gain some valuable insights from our experiences as they endeavor to successfully transform other paper-based assessments to be more technology friendly without sacrificing reliability and validity.

References Bureau of Labor Statistics (1998, December) Workplace Injuries and Illnesses in 1997. Washington, DC: Author. Cohen, J. (1977) Statistical Power Analysis for the Behavioral Sciences. San Diego, CA: Academic Press. Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor and Department of the Treasury. (1978) Adoption by four agencies of uniform guidelines on employee selection procedures, Federal Registry, 43, 38290±38315. Ghiselli, E.E., Campbell, J.P. and Zedeck, S. (1981) Measurement Theory for the Behavioral Sciences. San Francisco: W.H. Freeman and Co. Grossman, R.J. (2000) Measuring up: Appropriate metrics help HR prove its worth. HR Magazine, 1-7 (reprinted version). Harris, M.H. (2000) The Internet and industrial/organizational psychology: Practice and research perspectives. Journal of e.Commerce and Psychology, 1, 8±24. The Hunter Group. (2000) The Hunter Group 2000 Human Resources Self Service Survey. Baltimore, MD: Author.

ß Blackwell Publishers Ltd 2002

97

Jones, J.W. (ed.) (1991) Preemployment Honesty Testing: Current Research and Future Directions. New York: Quorum Books. Jones, J.W. (1998) Virtual HR. Menlo Park, CA: Crisp Publications. Jones, J.W., Joy, D.S., Martin, S.L., Orban, J.A., Peirce, W.G., IV and Rospenda, K. (1997) Development and validation of the Applicant Potential Inventory (API) (unpublished technical report). Rosemont, IL: London House. Joy, D.S. (1991) Basic psychometric properties of a preemployment honesty test: Reliability, validity, and fairness. In J.W. Jones (ed.) Preemployment Honesty Testing: Current Research and Future Directions. New York: Quorum Books. London House. (1992) Personnel Selection Inventory. Rosemont, IL: Author. London House. (1994) Customer Service Profile. Rosemont, IL: Author. London House. (1995) Performance Evaluation Checklist. Rosemont, IL: Author. London House. (1996) Personnel Selection Inventory Information Guide. Rosemont, IL: Author. London House. (1997) Applicant Potential Inventory. Rosemont, IL: Author. Martin, S.L. and Boye, M.W. (1998) Using a conceptually-based predictor of tenure to select employees. Journal of Business and Psychology, 13, 233±244. Martin, S.L. and Godsey, C. (1999) Assessing the validity of a theoretically-based substance abuse scale for personnel selection. Journal of Business and Psychology, 13, 323±338. Martin, S.L. and Lehnen, L.P. (1992) Select the right employees through testing. Personnel Journal, 71, 47±49. McDaniel, M.A. and Jones, J.W. (1988) Predicting employee theft: A quantitative review of the validity of a standardized measure of dishonesty. Journal of Business and Psychology, 2, 327±345. Ones, D.S. and Viswesvaran, C. (1998) Gender, age, and race differences on overt integrity tests: Results across four largescale job applicant data sets. Journal of Applied Psychology, 83, 35±42. Ones, D.S., Viswesvaran, C. and Schmidt, F.L. (1993) Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, 679±703. Peirce, W.G, IV and Martin, S.L. (1997) A concurrent validation of the API-7CS at a fast food restaurant company. Paper presented at the 13th Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, TX. Viswesvaran, C., Ones, D.S. and Schmidt, F.L. (1996) Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 8, 557±574. Wimbush, J.C. and Dalton, D.R. (1997) Base rate for employee theft: Convergence of multiple methods. Journal of Applied Psychology, 82, 756±763.

Volume 10 Numbers 1/2 March/June 2002