Improving confidence and accuracy in performance appraisals

0 downloads 0 Views 2MB Size Report
Sep 18, 2015 - performance appraisals for the motivation of employees (Fletcher 2001; Landy & Farr 1980;. Smith 1986). Performance appraisal (PA) refers to.
Journal of the Australian and New Zealand Academy of Management http://journals.cambridge.org/JMO Additional services for Journal

of the Australian and New Zealand Academy of

Management: Email alerts: Click here Subscriptions: Click here Commercial reprints: Click here Terms of use : Click here

Improving Condence and Accuracy in Performance Appraisals Paul L. Nesbit and Robert E. Wood Journal of the Australian and New Zealand Academy of Management / Volume 8 / Issue 02 / January 2002, pp 40 - 51 DOI: 10.1017/S1833367200005010, Published online: 18 September 2015

Link to this article: http://journals.cambridge.org/abstract_S1833367200005010 How to cite this article: Paul L. Nesbit and Robert E. Wood (2002). Improving Condence and Accuracy in Performance Appraisals. Journal of the Australian and New Zealand Academy of Management, 8, pp 40-51 doi:10.1017/S1833367200005010 Request Permissions : Click here

Downloaded from http://journals.cambridge.org/JMO, IP address: 137.111.13.77 on 31 May 2016

IN{PROVING CONFXDENCEAND CH AC CURACY IN PERFORN,{AN APPRAISALS Paur L. Nnsen Macquarie GraduateSchool of Management Macquarie Universitl, NSW 2109 Phone: +61 2 98599908 Email: p [email protected] RospRr E. Wooo Australian GraduateSchool of Management Universit"vof New South Wales SydneyNSW 2052

AtsSTRACT

The paper presentsfindings of a study evaluating the impact of performance appraisal training on rating accuracy and perceived rating ability. 41 supervisorsfrom a telecommunicationsfirm took part in the training evaluatedvideo vignettes and completeda questionnairemeasuringself-effi aacy beliefs about rating, goal intentions, and feelings about future rating behaviour. Supervisors in a control group (n : 12) also rated the video and completed the questionnaire.Trained supervisors showed increased accuracy on video ratings of work behaviourover the courseof the training and an increase in self-efficacy measures. Control group supervisorsby comparisondecreasedtheir accuracyof rating over the sametime period while self-efficacy ratings remained constant.Impact of training on satisfactionand goalsof participantsis also presented. Key Words: Frame-of- Reference, Performance Appraisal, Self-Efficacy

40

Jounutt

oF THEAusrnuuN

AND Ntw Zrtt.tuo

IMPROVING CONFIDENCE AND ACCURACY IN PERFORMANCE APPRAISALS Considerableresearchon performancemanagement has highlighted the important role played by performance appraisals for the motivation of employees(Fletcher 2001; Landy & Farr 1980; Smith 1986).Performanceappraisal(PA) refers to the set of activities through which organisations seek to assessemployees, develop competence, enhance performance and distribute rewards (Fletcher 2001). Feedback,through the formal appraisalprocess,plays a central role in employee motivation since it allows a person to assess whether the level of effiort is sufficient to reach desired goals, thus alerting one to the need to increaseeffort or modifu action plans if required (Locke & Latham 1990). PA also provides information for decisions about distribution of rewards that motivate behaviour through the establishment of a belief that effort will be recognisedand rewarded (Mitchell 1982). While informal feedbackby a supervisorto a subordinate may have substantial motivational importance, most organisations recognise the importance of formal appraisalmechanisms.However, as pivotal as PA is to motivation, organisations face substantialproblems in effectively implementing PA systemsand there is widespreaddissatisfaction with their use.

Problems with Performan ce Appraisals Many managers do not think that the information provided by appraisaljustifies the effiort and stress associated with the ratings (Napier & Latham 1986). This attitude stems from the fact that in many organisationsPA often rely on subjective judgement measures of performance (Lefkowitz 2000; Smith 1986). Not only does this tend to introduce distortion into the measurementprocess it also hampers the acceptance of negative evaluations.One cannot act on feedbackuntil one Ac.tpEMy oF MtiwceurNr

- Vot 8, Nuuaen 2

If

e assi'nN

'8

7O,,1- JNswsgr'NYW JO ,tNsaYJV

'Eururer]qSnorqt lueueer8e leue8uuetusoloruordflemlce lnq elecs Euqer eql uo uuelrJc eql Jo qceo JoJ ecueuro3-ledelrlJoJ:a eulJop srnol^uqeq eql roJ sprepuulsJo les teql uoruruoce sesnosle ,,{Ee1eJ}s uoc eqJ'Jnor^eqeq Eu4enlenour sluetueer8eolelrlrceJol seleJsurqlrl\ Jnor^egeq slueuelels ]ueJeJaJsesrlDn qclr{ \ Jo 'GL61 ueurog) selecs freuurns Jo rogel eql Iernor^Bqoqpuu (Del l1epue) ?g qlltus) selucs peJor{Jue leJnor^er{eqJo eleuorleJ eql spuelxo qceordde slql '(tSO1 feplcng ry urpreureg) Sururerl (UOg) ecuoreJeJ-Jo-oru€gsr uorluege olqereprsuoJpelreJor seq leql qceordde ouo 'pesn 3g ol spJepuuls ot{l uo }ueueerEe uorutuoc e sr ueq} e}ep Eurler Jo fi1p11el llero^o eql Suruur:e1ep ur lueleler ssel eq ,{eu saserq lenpl1\Ipul 'eroJeraql sB]s lueJe.Slp uoe qeq uosuedruoc Jo sasod;nd JoJ posn eJeelep sEurlerlsoru leql Joprsuoce.^l.ueq.ry\ ecuegodtur olqereprsuocJo sl Euqer 3lo ssecord aq] uo snJoJ o1 Eururer1Jorro JeleJruor3lsrseqdrue q UHs s5{J '(tg6l $ess€H T q}}tus 'ar.,fiu1c6) uorlesruuEro ue utqlr.l\ posn spJepuels pue srruoJoql uo eerEeol pue $[sel eler fylenlce o] \oq sreEeueruEurure4spJe,^\olSurure4JoJJeJoleJuro{ fulre UFISe uaeqseqeJoql'uelqord srq] otuocrelo oI 'secueuuo;red eeleJ Julnurs o] s]ueussesse Jel\ol enrE IIIA\ sJeluJ lcrJls seeJer{1nseJocs reqEq enrE lpzn sJeleJluerue.J 'oJueurroJredgo sen4cedsradreler lueJe.grpuro{ sosrJe1p1ss8urler ur '!1,rUcefqns tenemoll '(7SOI so{elnd :ggOt qEneuene;1A eEpeg) s8uqer ecueruroJredyo sseuelrlcegeer{}EurnordurrJoJIurluelod umoqssuq puu ssecordEurler eql unllrl\ soserqJo ecueprcur eql ecnpeJ ol s{ees Sururerl slql 'soserq Jo eJnleu eql tnoqe sJel€Jrog Eurure4 eql uo sesncoJ filrrpcefqns qlr^\ Eurleepo1 qceorddepuoces oql 'elecs Euner eql uo suorlenlelo pue qof eql uo ecueuroJred go suorlelJesqo uee.{ueq)tull eq} lnoqe slueueEpnf e>1etu IIps }sntu oq.&\sJeleJJo seserqssaJppelou op selecsEurleg '(OSOIrre{ q Kpuel:ggO1 e1fe61-zeuo9:rcet eurrog) s8u4er ecueuuo;;ed ur lueurelordur qcnu eleJlsuouep o1 ruees lou op ,{eq} leql sr solers Eurlergo uErsepeql uo Eurpurg eql leraueE 'eueluc Suuur eql qlra serlrnrlceqof Eurqct€ru Jo lueuolordurl eql elrdseq '(tgel l1epue) T qlltus :Otel ueuuog) qof eqt uo pe.,{eldslpoq o} eru leq} sJnorleqaqcrgrcedsuo sncoJ srrrJoJEurlur oslelul ,{q uo4esrroEelecserrnberleql eueluc 1eqo1Eesn

INYTY{Z.usN

aNY N\'ITYAJSIV sHJ tO TYNtrnOf

ueql raqleJ 'snqJ 'sJnor^eqoqpeJrsspJo spJepuels enrlcefqo 'fipuepr ol lees 'sseuenrlcefqnsqtl^\ leep ol selecs Euqer EuruErsepuo posncoJeneq oqzn'sraqcJeese1'Surure;1JeluJpue 'luerudolenep olecs Eurler islueueEpnf ecueuro;red anrlcefqns qlr^{ srualqord oql sserppe ol polecolpe ueeq flleuorlrpeJl eneq serEalerlsoznl

s7nsp,t ddV aJ uDul,roI, aa to ssaua^ltcaIlfl aqt Eulsuanul '(OOO 1 Erequoerg)ecuururogred Jo uorlenlelo ol pe{ql eJu spJe./v\eJ ueq.{\ fllercedse oullcop e ouoceq Iuuorle^rloru JoJ 1sr(1e1ec 1er1ue1od usc suels.(s Vd Eu4erodo .{pood snql '(OOO1 Srequeerg) sprezlreJJelrturs Eurnrecersreurogred Je \o[ ur flrnbeur e{} go suorldecradJrer{]ol enp perJsrlessrpoJe [e^el qEH u le uuoJred oq,r. esoql ?rlqt'ecueuuo#ed uo lueEurluocueql roqler ]uetu -foldrue Jo uorlrpuoJ e se ponrecredeuloceq ,{eql se 'elenrloruol eseocslolol ecuurruoJredEuuegrp JoJ ,{11enbe spruaeJ elnqrJlsrp }eqt surels.,{s pJe/!\e1'len1cosessel{ueql Eurrepuersnql sulelsr{s pJeA\oJJo flrnbo penrecred eql ocnpoJ sEurler luaruel puocag 'uorlesrueEroeq1 pue lenphrpur eqt Jo ]uetulrtop eql ot pesrpln ,'(lenrlcegge ]ou sr ecueuuoJredolur lqErsurJo aoJnosolqenle e 'snq1'1uaudo1e^eplHS se r{JnssuorlcelerperueJ perrnber Jeqto e>[el ol Jo tJo.ge Jo ]ueu4snhe ple ol .,(ressecou{ceqpeeJ eq} elrocoJ lou op sJorruogredrood ']sJIC 'uorl€urrurJcsrp Jo >[Jel sql tuog esrJesecuenbesuoc enrleEeu3io Jequnu V '(t861 elolg ? sru1s 'relceueEuol igg6 rreC [ ry ,{pue1) selucs Suqer go seuoEeleco^u dol eql ut peleJSureq possesse EureqesoqlJo luec red gg-94 ur Euqlnser'slcege .,(cuerualerue4xe r.uo{ Jegns sEur}er lesrerdde ecueuJoJredter{t Eurpurguoruruoce sr ll 'eldruexe JoC 'ecueulroJredgo anqcedserl sEuDeJlueruel {1}uelstsuocEurnrS,{leuqnor fq se}eurpJoqns t{}ll{ slcrguoc puu suelqord leuorl?Arlorr tFull ol {oos uego pue Vd ur ssouolltcelqnsqtl^A pelelJosse swelqord eql esruEocersrosrruedng 'slJoJJopenrecredrrer{lJoJpe}ercerdde ro posruEoceJoq ol ernlreJ eql ,{q pelelrlotuep leeJ osle ,{eur fel'll :osrnredns eql Jo Jleqeq uo ssouelrlcefqns Eururrelc flqer;qsnf 'Eurler 1y\oloql qll^\ porJsrlesuneq III^\ feql 'sluerussosse -Jles ueql Je.,rol peleJ oJe seleurpJoqns 'suorsnlcuoc sll sldecce JI flluenbesuoC

Research on Frame-of-Reference Troining As describedby Bernadin and Buckley (1981) FOR training involves a number of steps. First, participants are given job descriptions and instructedto discussduties and qualifications they believe are necessary for the job. Second, participants are given case examples comprising critical incidents that represent outstanding, average, and poor job perforrnance for different performance areas. Participants using behavioural-basedrating scales rated these case examples. Thir4 the ratings are discussed and participants receive feedback on the accuracy of their ratings, relative to the evaluationsof experts. Through this process,managerswho participate in the training programme develop a common set of standardsas to what constitutesdifferent levels of performancefor the criteria used in appraisalsand a common schemafor assessingperfoffnance. To date, researchfocusing on the effectivenessof FOR training indicates that it is extremely effective with respectto rating accuracy (Pulakos 1984; Schleicher & Day 1998; Sulsky & Day 1992, 1994; Woehr 1994). In a meta-analytic review of performance appraisal rater training, Woehr and Huffcutt (1994) found an average effect size of .83 for studies comparing FOR training with control or no training. Although the findings for the overall efficacy of FOR training are robust, understanding of the underlying reasonsfor this accuracyare unclear.It is suggested that FOR training leads to the development of prototypes or cognitive schema that aid in the creation of impressionsabout ratee behaviour (Sulsky & Day 1992) and recall of specific behaviourinformation aboutperformance (Woehr 1994\. Howeveq performance ratings are embedded in a process that has substantial consequencesand personalrelevanceto raters and ratees(Rice 1985),and may negativelyimpact the use of cognitive schema (Feldman 1994). Consequently,while FOR training may enhance cognitive ability, the motivation to employ skills may be adverselyimpacted by the organisational environment (Mclntyre, Smith & Hassett 1984). One cognitive aspectthat has been associatedwith enhancedmotivations to employ skills is raters' beliefs or self-efficacy about their ability to successfullycarry out performanceappraisals. 4:

Self-efficacy or a trainee'sperceivedcapability to perform a specific task (Gist 1987) has receiveda great deal of attention in the training literature. Self-efficacy has been shown to predict performance in a range of training situations including training of computerskills (Compeau& Higgins 1995,Venkatesh& Davis 1996),complex decision-makingtasks (Ford et al. 1998), multimedia training acceptance (Christopher,, Schoenfeld& Tansky 1998),speedreading(Karl, O'Leary-Kelly & Martocchio 1993) and career counselling training (Heppner et al. 1998). In short, previous research has substantiated the important role that self-efficacyplays in behaviour change in a variety of work settings.As yet the impact of FOR training on participants' selfefficacy beliefs has not been examined. The present study reports on an experimental evaluation of FOR training that examined the impact of training on self-efficacy. Unlike many studies that use student samples in non-work settings,, the study evaluates FOR training programme for supervisors conducted in a large telecommunications firm. Training was carried out over three days, the first two days being conductedon consecutivedays and the third day occurring approximately two weeks later. Thus the length of training provided an opportunity for examining the psychological and skill changesof participantsover an extendedperiod of time. Previous research (Pulakos 1984; Sulsky & Day 1992, 1994; Woehr 1994a, 1994b), has demonstrated that FOR trained raters provide substantiallymore accurateratings than do other types of (or no) training. It is, therefore,expected that FOR trained raters in this study would be more accuratethan controls. Hypothesis I : Participants undergoing FOR training will demonstrate superior rating accuracy in comparison with control raters.

Self-Efficqcyund FOR Truining Self-efficacyis "...defined aspeople'sjudgementof their capabilitiesto organrzeand executecoursesof action required to attain designated types of performances.It is concernednot with the skills one has but with judgements of what one can do with whatever skills one possesses"(Bandura 1986,

Jounwtt oF THEAL,srnttuN.{ND Nrw Zut,t.lto AcloEMI'oF M,lNtctutxr

- Vot I, Nuuntn 2

--

l-

p.391). Self-efficacyexpectationsare task specific derived from four cues within a situation: enactive mastery,vicarious learning, verbal persuasionand physiological arousal.The FOR training procedure provides a rich environment for forming and reinforcing self-efficacybeliefs aroundperformance appraisal rating. There are three ways that FOR training may encourage self-efficacy. First, FOR traineesengagein the experienceof rating a number of case examples and gaining feedback on performance discussion in a non-threatening and supportive environment which is an ideal situation for developmentof self-efficacy through enactive mastery.Second,FOR training encouragesvicarious learning sinceit is carried out amongpeer groups of raters with similar skill levels and experienceswho sharethe need to develop skills to carry out ratings in the near future. Observing the development in performancerating skills of one's peers,influences the observers'perceptionsof their own perforrnance rating ability. Third trainers not only facilitate discussion of viewpoints but act as coaches encouragingparticipants in their learning process. Thus it is expectedthat an important outcome of FOR training will be the development of participants'self-effi cacy. Hypothesis 2: FOR training will increase self-efficacy beliefs of participants. Self-efficacy beliefs regulate performance by determining task choices, effort, and persistence and are also linked with self-aiding or selfhindering thought patterns that accompany performance (Wood & Bandura 1989). Bandura (1989) suggests that because high-efficacy individuals believe that they havesomecontrol over the task they are less likely to fear the task or be unhappy with the task. Other researchershave also shownthat competencein a task is associatedwith more task-enjoyment and satisfaction with task demands (Kanfer 1990). Thus self-efficacy individualsare likely to associatepositive emotions with the task.Sinceit is expectedthat FOR training will increaseself-efficacy it is expectedthat FOR training participants will also develop positive attitudestowards future performance appraisals.

Researchhas demonstratedthat individuals with high self-efficacy are more likely to set themselveschallenginggoals,and are confident of their ability to reach goals (Wood & Bandura 1989). When confronted with obstaclesto the achievementof tasks, self-efficacious individuals display determination and persistence in their efforts to reach their goals. In order to carry out performanceappraisals,ratersmust set themselves goals in terms of expectedrating accuracy,clarity of communications,and level of justifications for ratings. Thus it is expected that FOR training participants will develop and adopt challenging goals relatedto their performanceappraisals. Hypothesis 4: FOR trained participants will be more goal focused for the performance appraisal process in comparison with control raters.

METHOD Subjects Three groups of supervisors (n:41) undergoing FOR training in a telecommunications firm constituted the experimental group. A separate group of 12 supervisors acted as control group. The experimental group supervisors had an averageage of 44 years (SD:5.7 years) and had worked with the organisationfor 24 years (SD:7). Supervisors in the control group came from similar positions in the organisation (same businessunit and hierarchical level) and had an average age of 42 years (SD:9 years) and organisationaltenure of 23 years (SD:9.5 years). Thus the two groups were well matched in terms ofage and experience. The experimental group supervisorstook part in the FOR training as a precursor to actual performance appraisal interviews they were to caffy out in the near future. The control group were supervisorsin a different geographical area, who duplicated the evaluationproceduresbut did not receive any training. They were not scheduled to carry out perfonnance appraisal interviews in the near future.

Hypothesis 3: FOR trained participants will express higher satisfaction with the performance appraisal process in comparison with control raters. Jountut

oF THEAusra,ntt\'.r,l,'o Nrw Zen.tNo ActoeMy oF M.tNtce utur

- Vot 8. Nuuarn 2

45

Rater Training Subjects went through the FOR training in three groups. Content of the programme followed closely the outline of FOR training discussed above. Days one and two were taken up mainly by discussion of behavioural standardsand training on use of behavioural ratings. Participants had numerous opportunities for evaluation of written cases of behaviour and discussion of ratings. Other issues such as communication and interview skills were also discussed.Day three was mainly used to reiterate the content of the first two days and elaborate on interview techniquesand organisationalissuessurrounding the perfofinance appraisalsystem.

Stimulus Mqterisls The stimulus set for measuresof rating accuracy consisted of videotaped vignettes. While many studiesusewritten caseexamples,Kinicki, Hom,, Trost and Wade (1995), have suggestedthat the use of written casesmay unintentionally prime ratee categorisationbecausethey impose lower demands on selective attention and encoding than do videotaped cases. Thus videotaped incidents are possibly a more difficult and valid assessmentof training accuracy.The videotape was made in consultation with the organisation and was specifically targeted to the supervisor group being trained. The organisation intended to use the videotape as a training resource for future FOR trainine. The video showedsix vignettesconsistingof critical incidents associatedwith customer service and teamwork, two dimensionson which supervisorsin the study would haveto evaluatetheir subordinates. The first three scenesfocused on customerservice behaviour representing outstanding, average:, and unsatisfactory performance. The second set of scenesshowedan office basedscenariofocusing on teamwork behaviouq again representing three levels of performance. Each scene lasts approximatelytwo minutes.

Accuraq) Messure Participants were asked to rate the behaviour of the main characterin each of the six-videotaped vignettesusing the following scale: Jounu;t

5: 4: 3: 2: I :

Exceedsall job requirements Mostly meetsjob requirements Consistentlymeetsall job requirements Mostly meetsjob requirements Limited match betweenjob requirementsand staff member'sskills/experience.

Accuracy was the measure of the difference betweenthe supervisors'ratingsof the videos and the correct ratings for the behaviours shown in each video scene.The difference was squaredto ascertain the absolute difference scores. These were summed and averaged to find an overall index score of accuracv. Correct ratings for the video scenes were established by the training department of the organisation in consultation with experienced personnel. An improvement in the accuracy of ratings of the videos (less errors) was evidence that the training programme improved the rating skill levels of the participant supervisors.

Psych ologic al M essures Participantscompleteda 16-itemquestionnairethat T useda 7-point Likert scale(1: strongly disagree,, : strongly agree).The questionnairecontainedten questions examining participants' beliefs about their skills in performing a successfulperformance appraisal. Questions relating to self-efficacy assessedtrainees' beliefs about their ability to to justift collect information, to make assessments, ratings to ratees etc. An example of a fypical questionis - "I believethat I am able to assessand measure a staff member's perfiormance against skills and behaviours identified for each job". There were three questions assessingtrainees' feelings about carrying out performance assessmentsasking them to report on satisfaction, stressand anxiefy.There were also three questions abouttrainees'intentionsand effiortregardinggoals for appraisalsin the future. the Self-efficacy: Ten questions from in about skills questionnaire concerning beliefs performing a successfulperformance appraisal were combined and averagedinto a single index of self-efficacy.(Cronbachalpha :.92) Satisfaction: Three questions from the questionnaireconcerningsatisfactionin carrying

oF THEAusrnstt.tN ANDNtw' ZrtL.tt'to ActoEMt' oF M,tNscrutNr - Vor 8, Nuuatn 2

out future performance appraisals were combined and averaged into a measure of satisfaction. (Internalreliability - Cronbachalpha:.9O)

RESULTS Ruting occurqcy

Challenging Goals: Three questions from the questionnairequestionsconcerninggoals of future performance appraisals were combined and averagedinto a measureof goal setting.(Cronbach alphar:.84)

Table I shows the mean scores and (standard deviation in brackets) for measuresof accuracy, self-efficacy,satisfactionand goal focus for each occasion of measurementfor the FOR trainine group and the control group.

Evaluation Procedure Prior to the commencement of training, participantswere askedto watch and rate the video scenes and to complete the questionnaire. Participantswere informed that their ratings of the videos were being used for an evaluation of the training sfudy and that no individual resultswould be made available to the company. Data was collectedby one of the researchersand sealedin front of participants.

The FOR training significantly increasedaccuracy on video ratings over the courseof the two days of training. Figure I shows the accuracy for the training and control groups. Before training, supervisor ratings of the videos in both the training and control groups varied widely from the correct ratings. Both groups scored similar levels at the pre-training evaluation of the video ruling out any argumentthat the improvementsobserved in the training group were due to inherent rating skills before trainine.

The questionnaire and videotape viewing and evaluationswere also carried out at the end of the first day of training and at the end of the second day of training. Only the questionnairewas usedat the day three training and was carried out at the start and the end of the day's trainine.

The average error score for the training group droppedsignificantly by the end of the first day of the FOR training and further declined by the end of day two. The averageerror in the ratings of supervisorsin the training group reducedfrom 4.6 to 1.9 as a result of the trainine.

The control group duplicatedthe evaluationof the training group. However, at the time of writing, the control group had not undertaken the third day questionnaireevaluation.

Statisticalcomparison(t-testst-value : -7.16) of the two groups showed that supervisors who receivedtraining were significantly (p