Improving the Quality of Undergraduate Peer

0 downloads 0 Views 132KB Size Report
and at least two staff using a marking scheme assessing content and presentation subdivided into 13 criteria rated on a 5-point Likert scale. It was reported that.
71 IETI 39,1

Improving the Quality of Undergraduate Peer Assessment: A Case for Student and Staff Development Holly Smith, University College London, UK Ali Cooper, Lancaster University, UK Les Lancaster, Liverpool John Moores University, UK

SUMMARY

This paper reports an action research project to evaluate an intervention designed to increase students’ conŽ dence in an undergraduate peer assessment of posters in Psychology. The intervention set out to maximize the beneŽ ts of peer assessment to student learning by explicitly developing and working with marking criteria, and improving the fairness and consistency of students’ marking through a trial marking exercise. Evidence from qualitative evaluation questionnaires suggested that students’ initial resistance to the peer assessment was transformed by their participation in these processes. After the intervention the range of marks used by students increased at the same time as variability signiŽ cantly decreased. Quantitative and qualitative data from module appraisal forms completed by students also demonstrated increased transparency and greater conŽ dence in the peer marking process compared with the year before. The study raises issues for student support and staff development in using peer assessment.

INTRODUCTION Peer assessment has been increasingly utilized in HE since studies such as Falchikov (1986, 1988) and Boud (1988) reported beneŽ ts in terms of enhancing student learning. The last decade has seen a number of texts aimed at new lecturers extolling the beneŽ ts of peer assessment, such as Habeshaw et al. (1993) who suggest that participation in this process can develop students’ autonomy, maturity and critical abilities. Brown (1998) recalled the enthusiasm with which Brown and Dove (1991) advocated peer assessment as almost evangelical. However, Topping (1998) recently set out a typology and comprehensive review of peer assessment in higher education, which concluded that while the practice has now been adopted in a wide variety of contexts, evidence for such enhancements remains limited. The present paper does not aim to review and replicate this extensive literature on peer assessment, but to engage with the issue of student attitudes towards the process and draw out the implications for the development of students and staff involved in assessment. Falchikov (1995) made a distinction between peer assessment of performance and peer assessment

of product. The vast majority of publications on peer assessment to date have focused on peer marking of performance by students, typically of their peers’ contribution to a group project or a discussion group. Among the relatively few studies reporting peer assessment of a product of university students’ work has been a small number reporting the peer assessment of posters. These include Orsmond et al. (1996), Billington (1997) and Berry and Nyman (1998). The poster may be a particularly appropriate form of product for peer assessment because of its brevity and aim to communicate clearly to a wide audience. Orsmond et al. (1996) described a Level 1 Comparative Animal Physiology module in which 78 students produced a poster in pairs, then marked 19 or 20 of their peers’ posters anonymously over a 45-minute period using five predetermined criteria rated 0–4 to produce a total mark of 0–20. Orsmond et al. found a 0.73 correlation between tutor marks and mean peer marks for the posters. Student feedback was also gathered through a forced choice questionnaire, adapted from Falchikov (1986), where students can agree, disagree or abstain on a number of statements. A majority of students endorsed the statements that

Innovations in Education and Teaching Internationa l ISSN 1470-3297 print ISSN 1470-3300 online © 2002 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/1355800011010290 4

72 IETI 39,1

peer assessment has made them think more, learn more, be critical and work in a structured way, but also that it was time consuming, enjoyable, challenging, helpful and beneŽ cial. Orsmond et al. focused on the beneŽ ts to students of participation but acknowledged problems with a minority of students treating the exercise in a cavalier fashion, remaining sceptical about how meaningful other students’ marks could be and feeling unqualified and reluctant to mark peers’ work. Billington (1997 ) described a Level 3 Ecosystem Ecology module in which 58 students peer assessed 29 posters using Ž ve predetermined criteria to produce a mark from 1–5. Billington found a 0.80 correlation between staff and student marks but also found a signiŽ cant difference with a staff mean mark of 59% and a student mean of 70%. Student attitudes were reported as being positive about the poster, but ‘the peer evaluation was not quite so well received’ (1997, p. 219), and a deputation of students actually visited the author to ask that posters be assessed anonymously, which in this instance meant that students received no verbal feedback from their peer assessors. Berry and Nyman (1998 ) reported the use of peer assessed posters on a mathematical modelling course taken by 11 mostly Level 1 students. Small groups of students prepared posters, which were marked by all students and at least two staff using a marking scheme assessing content and presentation subdivided into 13 criteria rated on a 5-point Likert scale. It was reported that ‘Student assessment of the posters was remarkably consistent with the instructors marks’ (1998, p. 108), although data are not provided. The validity of peer assessment has been defined mainly in terms of comparisons between grades awarded by students and those given by staff. Evidence of this kind has been con icting, sometimes within the same study, for example, Oldfield and MacAlpine (1995) reported correlations varying between r=0.16 and r= 0.91 for engineering students. In one of the earliest studies of validity, Orpen (1982) found that marks given by students can be as accurate and reliable as those given by lecturers as long as the marking criteria are clearly explained, although the data were produced by very small numbers of staff and students. However, others such as Swanson et al. (1991) have argued that individual peer marks are too unreliable to be used in summative assessments. The general pattern appears to be reasonably high correlation of the order of r=0.8 but with students using a narrower range of marks and tending to overmark in some circumstances. Even this consensus is open to differing interpretations, for example, Freeman (1995) argues that while the range of marks can be addressed by statistical

transformation, such correlations are not high enough to be acceptable for summative assessment. It could be argued that this problematizing of student’s peer marking rather naively assumes the infallibility of staff marking. While it is certainly true that most lecturers are considerably more experienced than students at marking it does not necessarily follow that the marks they award are consistently fair and unbiased, especially when marking is often conducted in very pressured circumstances. For example, the issue of gender bias in marking has produced a great deal of con icting evidence (see Newstead and Dennis, 1990 and Bradley, 1993 for opposing interpretations of the same evidence). This clearly implies that it is by no means safe to assume that either staff or students are free from common social prejudices in terms of gender, ethnicity or disability. However, most discussions of the validity of peer assessment fail to explore the similarities in difŽ culties faced by both staff and students in seeking to make assessment judgements fairly and consistently. A clear lead is provided here by the groundbreaking study of gender bias in peer marking by Falchikov and Magin (1997) who describe a mechanism for identifying such biases. For university staff, Van Dyke (1999) has described best practice in monitoring processes. Fairness in assessment outcomes can only be assured by introducing equal opportunities monitoring of marking for students and staff. Such processes could become emancipatory as the feedback they provide to markers would enable them to identify and address discrimination, which may have been hidden, or unconscious. In previous studies of peer assessment, there has been some attempt to address students’ attitudes towards the process, but as Falchikov noted ‘few student evaluations of peer assessment are reported’ (1995, p. 177). Of those that have considered students’ attitudes, most have been limited to forced choice closed questions. Peters highlighted that ‘there has been little attempt to investigate how students themselves view non-traditional strategies for assessment’ (1996, p. 48) and reported hostility and suspicion towards peer assessment in particular, with 76% of students surveyed disagreeing with it. This resistance to peer assessment was in contrast to attitudes towards other non-traditional assessment methods, with students preferring continuous assessment to examinations, supporting the submission of draft assignments and showing considerable support for ensuring that the criteria by which assessments were judged are made public. Peters concluded that students, quite rationally, are most open to alternative forms of assessment when

Improving Undergraduate Peer Assessment 73

they enhance the learning potential of the assignment without shaking conŽ dence in traditionally maintained academic standards. Sambell et al. (1997) unusually took a qualitative approach to investigating student’s attitudes to 13 alternative assessments, including one case of peer assessment. They found that students contrasted alternative and traditional assessments in terms of the integration of learning with assessment, authenticity and accuracy of the alternative assessments against the contamination of learning, artiŽ ciality and inaccuracy of traditional assessments. However, they noted that peer assessment presented particular difficulties for students as ‘Some felt threatened or unnerved by their insights into the apparent subjectivity of assessment, or failed to develop conŽ dence in their ability to act fairly as an assessor themselves’ (1997, p. 361). The comprehensive review of peer assessment by Topping (1998) concluded that ‘levels of acceptability to students are varied and do not seem to be a function of actual reliability.’ (p. 268). The evidence on students’ attitudes to peer assessment certainly remains confused and inconclusive. There is some indication that while students’ attitudes to nontraditional assessments are generally positive, peer assessment may prove an exception to this. However, there is clearly a need to investigate more fully the thoughts and feelings of students participating in peer assessment in a qualitative way. The present paper describes a case study of the peer assessment of posters in a Level 2 Psychology module that formed part of the summative assessment of the module. In contrast to previously reported studies, students contributed towards the writing of the marking criteria. This assessment was introduced in 1996 and the intervention being reported here was conducted in 1998 to improve both the validity of peer assessment and the responses of students towards the process. Comparisons of 1997 pre-intervention data and 1998 post-intervention data are supplemented by qualitative questionnaires completed by students at three time stages, evaluating their experience of peer assessment. This was intended to provide valuable new qualitative information about students’ attitudes to peer assessment. The implications of the results in terms of training for students in peer marking are explored and linked to the issues in assessment by staff that have been previously reported in the literature.

THE CONTEXT The module HUMAP2008 Independent Investigation is a compulsory Level 2 module on the Psychology

BSc programme at Liverpool John Moores University. The module is based on an extremely open conception of teaching according to the deŽ nition of Van Rossum and Taylor (1987). The student can choose to conduct their project on any psychological topic using an appropriate methodology of their choice. The student is assigned a supervisor whose role is to advise the student on whether their plans are realistic and achievable and congruent with ethical guidelines. The approach is intended to encourage a deep learning approach in students, as originally deŽ ned by Marton and Saljo (1976a, b). In 1997, the module was assessed by a practical report of 3000 words, which contributed 60% of the overall module mark and a peer assessed poster, contributing 30%. The Ž nal 10% of the overall module mark was awarded for participation in the peer marking process. There were 103 students registered for this module and a total of 11 staff were responsible for supervising students’ projects. At the outset of the module, the module leader briefed the students in a single whole group lecture and explained the rationale for the peer marking of posters. The module handbook, which gave extended guidance on preparing and marking the posters, was distributed to students and each student was then allocated a member of staff as a supervisor. Following submission of posters at the end of the semester, each student was allocated a code to ensure anonymity during the marking procedure. The students were then subdivided into eight groups by surname and each group required to display their own posters, assess another group’s posters, and at a later date take down their own posters. In 1997, the students received no feedback on their posters or their participation in the peer marking process; they did receive feedback on their practical report from their supervisor. The rationale for using posters as an assessment is that it is a format that the British Psychological Society has encouraged students to use in competitions and conferences. It is a particularly useful form of assessment because the poster format places very tight constraints on the students and should make them more aware of their audience and help focus their decisions about content and form. It makes students think harder about the communication of their research Ž ndings. The rationale for peer assessment is that by requiring students to apply marking criteria to their peers’ work they will become more aware of how their own work undergoes the same process. The Centre for Psychology gives students very specific marking criteria for all their coursework, but students do not

74 IETI 39,1

always actively use these during their preparation. The rationale for introducing peer assessment of posters speciŽ cally was that, unlike practical reports or essays, posters are designed to be easily and quickly comprehended, which makes them particularly suitable for peer assessment. The validity of the peer marking should have been increased because the students were given a simple marking scheme in the module handbook describing the criteria of presentation and content, which they were to use to judge the posters. However, validity may have been threatened because the students were very inexperienced markers and did not have a chance to practice using the marking criteria. The reliability of peer marking should be relatively high as the mark for each poster is the mean of 10–14 other students. Magin (1993) demonstrated that taking the average of 8–10 individual peer marks increases reliability dramatically; from 0.38 for the individual peer rater reliability to 0.84 for the averaged peer mark reliability in one case and from 0.29 to 0.79 in another. However, this averaging multiple assessments process was compromised in 1997 where some students had not put their posters up in time to be marked by all their peers assigned to do so. Reliability could also be threatened in cases where students were aware of the identity of the author of the posters despite the fact that they were coded anonymously.

MODULE FEEDBACK FROM 1997 Discussion of the appropriateness of this assessment procedure must take into consideration the feedback from students who completed the module appraisal questionnaires. Returns on these are not always good and it is actually rare to get many additional comments. However, 90 of the 103 students registered for HUMAP2008 returned the forms and 38 wrote comments; most of these were extremely negative. To summarize, they were coded into the following categories: • 20 were critical of the peer assessment process; • 12 mentioned logistical problems; • seven had problems with supervisors (mostly contact); • Ž ve had problems speciŽ c to part-time students; • three were critical of the poster as an assessment; • two made some positive comment; • one thought this module required too much work. (each point was coded separately so the total exceeds 38)

The greatest single category was criticisms of peer assessment. Students seemed to be concerned that they were not trained to do marking and may be unreliable, that other students did not take the exercise seriously and may have been cheating by favouring friends (the anonymity clearly being unable to overcome the perception of this problem).

ACTION PLAN FOR INTERVENTION After meeting to discuss the module, the authors decided that while the assessment adds diversity to the current variety of assessment methods in psychology it appeared to be unacceptable to students in its current form. Therefore, an action plan was devised: • provide a very clear rationale for why peer assessment of posters is being used; • introduce workshops at the beginning of the module in which students examine a selection of the previous year’s posters and develop their own marking scheme in small groups, then share their marking schemes and practice using agreed common criteria which would subsequently be used in the real exercise; • resolve logistical problems by getting psychology support staff to put up and take down all the posters so that each student will only have to attend once to assess the posters.

IMPLEMENTING THE INTERVENTION IN 1998 Introduction to the module (1 hour) The students were explicitly told the rationale for both peer marking and posters as forms of assessment, including research Ž ndings on the beneŽ ts to students of such forms of assessment. As in previous years, they also received the module handbook and other administrative information. Workshop 1: devising assessment criteria (2 hours) The students were split into 16 groups of about four or Ž ve with a member of staff facilitating in each of four rooms where four posters produced by students in the previous year were displayed. The students’ task was to carefully examine these posters and, by discussing among themselves what they valued or criticized about them, to develop criteria for a poster marking scheme,

Improving Undergraduate Peer Assessment 75

written on an OHT. Reassembling as a whole group the authors displayed all the poster marking schemes, commenting on similarities and differences. Between the two workshops, the authors met to write a common set of criteria using the students’ marking schemes, which were sufficiently similar not to be concerned about loss of ownership. The marking scheme consisted of 10 weighted criteria and multiple copies were produced in time for workshop 2. Workshop 2: applying assessment criteria to achieve reliability and validity The students were first required individually to use the marking scheme to mark at least three posters blind without consulting their peers. Once they had all recorded marks for the posters in writing they got into the same 16 small groups as in the previous workshop to carry out a moderation exercise. They had been instructed that every group must reach a decision on an agreed mark for each poster, recording both individual and collective marks on an OHT. The authors displayed these anonymously to the whole group commenting on the process of moderation and the distribution of marks for posters.

EVALUATION OF THE INTERVENTION The evaluation had three strands: the qualitative questionnaires completed by students, the analysis of the marking of posters, and the module feedback questionnaires. Qualitative questionnaires These asked a number of open-ended questions about perceptions and responses to the peer assessment of posters at three time points: Time 1. At the end of the introduction to the module.

Time 1 72 returned

Time 2. At the end of workshop 2.

Time 2 65 returned

Time 3. After the module had been completed and they had received their marks and coursework had been returned to them.

Time 3 45 returned

The questionnaires were designed to not only monitor student responses, but also as a means of encouraging

students to re ect explicitly on their reactions to what was for most of them a new assessment approach. Innovative teaching and assessment methods are sometimes thwarted by student conservatism or confusion; allowing students a chance to express their feelings and to have their anxieties addressed can encourage a more positive or open-minded response. At Time 1, the responses revealed general openmindedness, coupled with some ambivalence and negativity. Most students believed that peer assessment would help to bridge the gap between learners and assessors: ‘It will help us to understand how our own work is marked, hopefully help us to increase our own standards’. ‘It integrates a large group into thinking about other students and tutors’ assessment roles’.

There was frequent reference to the significance of empathy: ‘Because we have also done the piece of work we are aware of limitations/problems involved and can hopefully judge fairly’.

Using criteria to mark others’ work would help to clarify the requirements of the assignment and cause deeper thinking about their own work: Peer assessment ‘would help us to recognize what is important in a report and to remember what we are being marked on’.

The responses revealed a seriousness and sense of responsibility towards each other’s and their own assessment. Most students were concerned about how the process would work, what they would have to do, and what protections would be in place to guard against abuse, error, friendship bias, personal preferences and lack of commitment: ‘How are you going to ensure that students take the marking task seriously?’ ‘I’d like to think I’ll be entirely fair, but what should I be looking for?’

There was some anxiety about the relevance and the demands of making posters in Psychology. Provided that it was reliable, many welcomed the poster as a novel method of assessment, which they saw as a creative opportunity to develop new skills and perspectives. Some were concerned about the need for artistic talents and computer graphics expertise

76 IETI 39,1

to present posters attractively and the fairness of this in an assessment in Psychology. At Time 2, there was a deŽ nite increase in conŽ dence about how the peer assessment process would operate. The workshops had clearly allayed most fears about fairness and consistency: ‘It has helped to reduce any fears that I may have had before the assessment in that after the discussions the marks were the same or thereabout’.

A few expressions of concern remained but now focused on the concrete experience of the difŽ culties of assessment. Overall, the students appeared to be dramatically more positive at this stage, and were overwhelmingly certain that they would actively use the criteria and think more about their audience in the preparation of their own assignments. Marking others’ work had clearly also raised their conŽ dence about what was expected in a poster assessment; the indications were that their preparation would be more focused and strategic as they knew who their audience was. At Time 3, the predominant positive outcome was the students’ ownership and active use of the marking criteria when preparing their own work, and their awareness of the purpose and audience of the poster: ‘It certainly affected my approach to preparing my poster, and was particularly effective as collectively we produced the criteria’.

They were much more confident about the requirements of the assessment task. ConŽ dence and positive responses were tempered by procedural concerns over the marking, such as the onset of weariness and repetitiveness, and re ected greater understanding of the inherent challenges in marking: ‘Too many posters to mark – lost interest after Ž ve, therefore 6–10 might not have gotten a fair mark’.

The unease about fairness and consistency remained, with anxieties about friendship bias and competitiveness, and a lack of anonymity. A lack of objectivity was mainly attributed to inexperience and a need for more training was identiŽ ed: ‘DifŽ cult to see fairness and consistency without adequate guidance and training’.

There were also numerous references to the need for feedback and a breakdown of the marks given, and

some reference to the need for tutor moderation, although many made reference to the greater fairness of taking the average of a group of marks. Taken together, this analysis reveals that similar anxieties to 1997 were expressed at the outset but not sustained in terms of competence and purpose. Consistency and bias remained a predominant concern, but were expressed in terms of awareness of legitimate difficulties in standardization of marking. The responses provide clear evidence that the marking criteria were explicitly used in the students’ preparation of their poster assessments, with constructive and positive outcomes, particularly in terms of conŽ dence and awareness. Students also accepted peer assessment as a legitimate assessment procedure, with reservations about time and protocols. They also accepted posters as constituting a legitimate assessment task, with reservations about the time and weighting. Cautious or negative comments were mainly about logistics and the management of reliable moderation and standardization. It is not conclusive from the data whether it was the devising of the criteria, their use in the workshops or the prospect of summative peer assessment which encouraged the active use of the criteria by students, but it is clear that students’ confidence, clarity of objectives and a more strategic approach were encouraged. This study identiŽ es that peer assessment can function as a learning tool; there were three stages of learning in the process: 1. The training and trial marking; 2. Preparing their own posters; 3. Doing the peer assessment. It can only be speculated here which stage was most important in driving the learning; further research is required to identify whether participation in any stage alone could produce the intended learning outcomes or whether only the whole cycle is effective. This has implications for preparing students in the use of peer assessment. Marking of posters The marks students gave to the 14 posters they were allocated to mark during workshop 2 were recorded and, as the posters had been produced and marked in the previous year this allowed direct comparison of marks and standard deviations. The marks given by

Improving Undergraduate Peer Assessment 77

Table 1 Means and standard deviations from individual blind marking of posters in 1997 and 1998 Poster code

Mean 1997

D1

58.82

C13

SD 1997

min 1997

max 1997

N 1997

Mean 1998

SD 1998

min 1998

max 1998

N 1998

5.10

50

68

11

45.95

5.19

38

59

21

57.91

6.17

48

68

11

50.47

6.35

41

63

17

E11

55.30

10.85

33

68

10

50.94

6.54

39

59

17

H3

66.50

10.07

54

83

12

53.10

7.18

40

67

21

H14

60.33

5.03

55

72

12

53.19

5.81

40

68

21

A10

59.08

9.17

45

68

12

53.53

7.06

44

71

19

C2

66.18

10.30

48

85

11

58.18

4.72

50

65

17

D5

60.09

6.22

48

68

11

60.05

5.48

48

70

21

C6

65.45

7.92

52

78

11

60.53

4.21

53

70

19

F1

68.14

5.95

60

78

14

61.37

5.26

50

70

19

F7

68.86

6.75

60

80

14

61.94

6.98

49

70

17

G16

58.83

12.72

40

72

6

62.79

5.86

49

74

19

D8

68.36

8.14

48

78

11

68.95

3.56

65

78

21

H11

63.83

4.17

58

72

12

69.38

8.18

54

89

21

students in 1997 and 1998 are compared in Table 1. One of the most striking features in this comparison is the greater range in mean marks from the 1998 cohort who produced a range of 23% points compared to a range of 14% points from the 1997 cohort. It could be argued that this remains a very small range on a 100% scale but not untypical of undergraduate marks awarded by lecturers. The mean marks for the 14 posters across the two years were signiŽ cantly correlated, r=0.597, p=0.024. However, the mean of the mean marks was 62.69% in 1997 and 57.88% in 1998. The difference between these means was significant, t=3.22, p=0.007. The 1998 cohort like the 1997 cohort were marking alone, but the 1998 cohort were of course aware that their marks would not contribute towards another student’s assessment and this may possibly explain the signiŽ cantly lower marks given. The mean standard deviations for the 14 posters were not significantly correlated. However, the mean standard deviation was 7.75 in 1997 and 5.88 in 1998. The difference between these means was signiŽ cant, t=2.37, p=0.034. This is the most important Ž nding in terms of predictions made by educational theory. Theory would predict that the 1998 cohort using the marking scheme and with training should produce more consistent marks, and indeed they do produce

significantly less variation in their marks for the same posters. This finding is even more impressive considering that in the 1998 cohort there was a greatly increased range and so greater scope for inconsistency. Module feedback The module feedback questionnaires that students complete routinely for all modules at the Centre for Psychology provide a rich source of qualitative and quantitative data. As the same module appraisal questionnaires had been completed in the previous year, this permitted a direct comparison of responses before and after the intervention. In both years, the questionnaires were distributed at the end of the module, following the peer assessment process. Returns of the module appraisal questionnaires are not always good, but 47 of the 90 students registered for HUMAP2008 returned the forms in 1998 and 21 had written additional comments. These comments were coded using the categories developed for the 1997 module feedback: • three were critical of the peer assessment process (20 in 1997); • three mentioned logistical problems (12); • four had problems with supervisors (mostly contact) (7);

78 IETI 39,1

• three were critical of the poster as an assessment (3); • four made some positive comment (2); • one thought this module required too much work (1); • two declined to comment (0); • three were critical of library resources (0); • one expressed confusion (0). (each point was coded separately so the total exceeds 21) This represents a diversiŽ cation from the 1997 module feedback, which focused on criticism of the peer assessment process. The only category to increase was positive comments, although comments declining to comment, expressing confusion and criticizing library resources were not represented in 1997. The module appraisal questionnaire also consists of 10 questions to be rated on a Likert type scale from 1 = strongly disagree to 5 = strongly agree and the ratings in 1997 and 1998 are compared in Table 2. The table clearly shows that student ratings in 1998 after the intervention were higher for nine of the 10 questions, and signiŽ cantly higher for four questions. It is interesting to note that only one of these questions relates directly to assessment, but the intervention appears to have had a global impact on student satisfaction with the module. DISCUSSION The case reported here raises a number of issues about student perceptions of innovative assessments, staff

development and departmental and institutional assessment policies and strategies. The evaluation shows that peer assessment and the use of posters appear to be acceptable assessment methods, with additional beneŽ ts from the student perspective, if students are clear about the purpose and the process. Students welcome a variety of assessment methods, and can accommodate innovation if they are confident of its value. This case also demonstrates that the quality of students’ peer assessment can be significantly enhanced through early intervention. Whilst there is clearly value in peer assessment as a learning tool, this may not be immediately apparent to all students and may demotivate some. The evaluation revealed that a minority of students remained resistant to the principles and process of peer marking despite the intervention, due mainly to a lack of conŽ dence in the ability of their peers to award fair and unbiased marks. However, it is a particular feature of the system reported that the mark received by any individual student for their poster is generated by taking the mean of 10–12 marks of their peers. There is no situation in which blind marking is done by a comparable number of staff. Assuming that the students’ marks are normally distributed taking the mean of so many should be an effective way of ensuring fairness, although it may contribute to a reduced range of marks. Magin (1993 ) has already convincingly demonstrated the increased reliability of this procedure. Another concern of students was the lack of anonymity of the peer assessment process; although every poster is anonymous on display, students have argued that close friends may identify posters by their topic. This bias in

Table 2 Mean rating from 1997 and 1998 and independent t-test results 1997 n=88

1998 n=47

t-test p value

1.

The amount of course content was about right

3.06

3.51

0.01

2.

The module was intellectually stimulating

3.66

3.72

ns

3.

The aims of the module were achieved

3.59

3.68

ns

4.

The module was well organized

2.70

3.09

0.05

5.

The assessment was appropriate

2.80

3.17

ns

6.

The module booklet was adequate

3.23

3.34

ns

7.

Lectures were well presented

2.68

3.09

0.02

8.

There was adequate opportunity for student participation

3.17

3.68

0.02

9.

Student views received sympathetic responses

2.99

3.17

ns

The library facilities matched your needs

3.27

2.97

ns

10.

Improving Undergraduate Peer Assessment 79

marking is not confined to students, as Dennis and Newstead (1994) have argued that knowledge of the student is the most significant bias in lecturers’ marking. Clearly, the effect of familiarity is a key topic for research, which must be addressed in both staff and student marking. A number of strategies might be considered to increase student conŽ dence in the process of peer assessment. Dissemination of the value of the process, as indicated by previous students’ evaluations, might prove reassuring for some. Another strategy could be to require students to write brief feedback against the criteria in addition to giving marks; peer assessment needs feedback, for both learner morale and formative development. In addition to giving the rationale behind peer and poster assessment at an early stage, some invitation to discuss the issues with staff could be offered to students who remain anxious. It is important not to undermine students’ conŽ dence in the system, but raising their awareness of the process and the inherent challenges in marking, by actively working with criteria is a useful strategy. In this study, involving students in constructing the assessment criteria gave an important degree of ownership that appeared to lead to their feeling more conŽ dent and autonomous during the preparation of their assignments. In addition, the intervention contributed towards students being more effective and appropriate markers, which is vital if peer assessment is to have credibility and validity, both formatively and particularly summatively. What emerges from the qualitative questionnaire evaluation is that students’ anxieties about assessment procedures and management were legitimate concerns that could be equally applicable to many staff: lack of experience, consistency and weariness with large amounts of marking, bias and subjective preferences, multiple interpretations of criteria, standardization between markers and the need for explicit feedback at an appropriate time to aid progression. From an educational development perspective, this study sets an agenda for institutions to work on improvements in: • procedures for better management of assessment; • moderation and standardization exercises between staff; • construction of commonly interpreted assessment criteria between staff; • meaningful interpretation and active use of assessment criteria by students at an early enough stage; • involving students more in the assessment of their own and others’ work as a means to understand the

nature and demands of tasks, and to improve standards of achievement; • consideration of assessment strategies which focus students’ attention on the important relationship between audience, purpose, form and content in their work; • front-loading the support for students’ assessment preparation; • monitoring assessment to detect bias as a routine equal opportunities practice. These issues are emerging as problems in need of urgent attention in higher education, see Stephenson and Yorke (1998), Yorke et al. (2000) and Holroyd (2000).

CONCLUSION The following conclusions can be drawn from the evaluation of this intervention, which may be useful to colleagues introducing peer assessment: • Peer marking produces a good deal of anxiety in students that needs to be articulated and addressed if it is to be seen as valid. The intervention clearly increased transparency of the process of peer marking for students and as a consequence student conŽ dence in the process increased; • the intervention increased the range of marks used by students at the same time as reducing the variability in the marks; • the intervention increased students’ awareness and use of marking criteria, generally found to be one of the main beneŽ ts of participating in peer marking; • posters are seen by students as an innovation that is worthwhile but problematic both to produce and to mark; • the peer marking highlighted issues about assessment that are equally valid concerns for academic staff marking. A question remaining, to which future research might be directed, concerns the extent to which students’ increased awareness of marking criteria in the context examined here might transfer to other assessment events. In addition, this work has clearly highlighted the difŽ culties inherent in the assessment process and the importance of systematic training and moderation for both staff and students undertaking marking.

80 IETI 39,1

ACKNOWLEDGEMENTS Thanks to Dr Martin Oliver and Dr Colleen McKenna for comments on an earlier draft. REFERENCES Berry, J and Nyman, M (1998) Introducing mathematical modelling skills to students and the use of posters in assessment, Primus, 8, 103–15. Billington, H L (1997) Poster presentations and peer assessment: novel forms of evaluation and assessment, Journal of Biological Education, 31, 218–20. Boud, D (ed.) (1988) Developing Student Autonomy in Learning, 2nd edn, Kogan Page, London. Bradley, C (1993) Sex bias in student assessment overlooked? Assessment and Evaluation in Higher Education, 18, 3–8. Brown, S (ed.) (1998) Peer Assessment in Practice. SEDA Paper 102. SEDA, Birmingham. Brown, S and Dove, P (1991) Self and Peer Assessment, SCED Paper 63. SCED (now SEDA), Birmingham. Dennis, I and Newstead, S E (1994) The strange case of the disappearing sex bias, Assessment and Evaluation in Higher Education, 19, 49–56. Falchikov, N and Magin, D (1997) Detecting gender bias in peer marking of students’ group process work, Assessment and Evaluation in Higher Education, 22, 385–96. Falchikov, N (1986) Product comparisons and process beneŽ ts of collaborative group and self-assessment, Assessment and Evaluation in Higher Education, 11, 146–66. Falchikov, N (1988) Self and peer assessment of a group project designed to promote the skills of capability, Programmed Learning and Educational Technology, 25, 327–39. Falchikov, N (1995) Peer feedback marking: developing peer assessment, Innovations in Education and Training International, 32, 175–87. Freeman, M (1995) Peer assessment by groups of group work, Assessment and Evaluation in Higher Education, 20, 289–300. Habeshaw, S, Gibbs, G and Habeshaw, T (1993) 53 Interesting Ways to Assess your Students, revised edn, Technical and Educational Services Ltd, Bristol. Holroyd, C (2000) Are assessors professional? Student assessment and the professionalism of academics, Active Learning in Higher Education, 1, 28–44. Magin, D (1993) Should student peer ratings be used as part of summative assessment? Research and Development in Higher Education, 16, 537–42. Marton, F and Saljo, R (1976a) On qualitative differences in learning – I: outcome and process, British Journal of Educational Psychology, 46, 4–11. Marton, F and Saljo, R (1976b) On qualitative differences in learning – II: outcome as a function of the learner’s conception of the task, British Journal of Educational Psychology, 46, 115–27.

Newstead, S E and Dennis, I (1990) Blind marking and sex bias in student assessment, Assessment and Evaluation in Higher Education, 15, 132–9. OldŽ eld, K A. and Macalpine, J M K (1995) Peer and selfassessment at tertiary level – an experiential report, Assessment and Evaluation in Higher Education, 20, 125–32. Orpen, C (1982) Student versus lecturer assessment of learning: a research note, Higher Education, 11, 567–72. Orsmond, P, Merry, S and Reiling, K (1996) The importance of marking criteria in the use of peer assessment, Assessment and Evaluation in Higher Education, 21, 239–50. Peters, M (1996) Student attitudes to alternative forms of assessment and to openness, Open Learning, 11, 48–50. Sambell, K, McDowell, L and Brown, S (1997) ‘But is it fair?’: an exploratory study of student perceptions of the consequential validity of assessment, Studies in Educational Evaluation, 23, 349–71. Stephenson, J and Yorke, M (ed.) (1998) Capability and Quality in Higher Education, 2nd edn, Teaching and Learning in Higher Education Series. Kogan Page, London. Swanson, D, Case, S and van der Vleuten, C (1991) Strategies for student assessment. In Boud, D and Feletti, G (eds) The Challenge of Problem Based Learning, Kogan Page, London. Topping, K (1998) Peer assessment between students in colleges and universities, Review of Educational Research, 68, 249–76. Van Dyke, R (1999) The use of monitoring data on student progress and achievement as a means of identifying equal opportunities issues in course provision and developing appropriate remedial action. In Pearl, M and Singh, P (eds) Equal Opportunities in the Curriculum, Oxford Centre for Staff and Learning Development, Oxford. Van Rossum, E J and Taylor, I P (1987) The relationship between conception of learning and good teaching: A scheme for cognitive development. Paper presented to the American Educational Research Association Annual Meeting, Washington. Yorke, M, Bridges, P and Woolf, H (2000) Mark distributions and marking practices in UK higher education, Active Learning in Higher Education, 1, 7–27.

BIOGRAPHICAL NOTES Holly Smith completed her doctoral research on teacher thinking at the University of Leicester School of Education. At the time of this study, she was a Lecturer in Psychology at Liverpool John Moores University, but has become so intrigued by teaching and learning issues that she has now moved into educational development at University College London.

Improving Undergraduate Peer Assessment 81

Ali Cooper began her professional life as an English teacher before going on to Liverpool John Moores University School of Education. From there, she was seconded first to teach on and then to lead the Post Graduate Certificate in Teaching and Learning at LJMU. Her main interests are in assessment and staff development and she is currently Teaching and Learning Development Co-ordinator at Lancaster University and Course Director of the Certificate in Learning and Teaching in Higher Education. Les Lancaster is deputy head of the Centre for Applied Psychology at Liverpool John Moores University and a Research Fellow in the Department of Religions and Theology at the University of Manchester. When not introducing innovative assessments he pursues his interest in the psychology of religion and consciousness on which he has written extensively. Address for correspondence: Dr Holly J Smith, University College London, Education and Professional Development, 1–19 Torrington Place, London, WC1E 6BT, UK.