a truly generic performance assessment scoring system

0 downloads 0 Views 586KB Size Report
SQUIRE Institute for System Quality and Innovation Research (GERMANY) ... one of a small number of product or performance quality levels, the scoring rubrics, ...
Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

A TRULY GENERIC PERFORMANCE ASSESSMENT SCORING SYSTEM (PASS) Paul Hubert Vossen SQUIRE Institute for System Quality and Innovation Research (GERMANY) [email protected]

Abstract PASS is a generic scheme for learning assessment in higher education. Basic to the model are socalled primary assessment profiles for an arbitrary number of criteria per assignment. Each assessment criterion itself differentiates between just three product or performance quality levels called "below average", "average" and "above average". An assessment profile yields a primary quality score between zero and one, which may also depend upon an index of assessor strictness. Additionally the assessor may take any number of secondary quality factors of the assignment into account, e.g. workload, volume or other product or performance criteria. These so-called productivity factors are positive real numbers with default unit one. By aggregating all factors into a scoring formula which also allows for the specification of assignment difficulty, a single standardized assignment score results, which can be the starting point for feedback to the student about the assignment. If it happens that the course consists of several distinct assignments, all corresponding assignment scores will in turn be merged by means of an aggregation rule using appropriate assignment weights into a single overall course score. The last step of the assessment procedure is to transform this final course score into a numeric, alphabetic or symbolic grade according to institution-specific transformation rules. The PASS model has emerged over several years from the needs of actual teaching practice. The author has developed and tested it in his own courses at several institutions for higher education, and has shared his experiences and tools with colleagues struggling with the same kind of assessment problems. Students have favorably accepted the model too, because it makes the very assessment procedure transparent, traceable and accountable. Furthermore, the model guarantees fair play during assessment, because it offers a lot of potential for systematically awarding “above average” work, and penalizing obvious "below average" work. Generally, students play an important role in the model, as they have to provide and account for most secondary performance data. Keywords: Educational Measurement, Learning Outcomes, Learning Performance, Qualitative Assessment, Quantitative Assessment, Scoring, Marking, Grading

1

INTRODUCTION

Assessment is a challenge, for both learners and teachers. It has always been, and continues to be. Its importance for the educational process is indisputable, during instruction (assessment for learning: formative evaluation) as well as after it (assessment of learning: summative evaluation). However, the precise form it takes differs greatly, from school system to school system, from discipline to discipline, and from one country and culture to another one. Thus, there continues to be a lot of philosophical, didactical and political discussion about the pros or cons of one form of assessment as compared to another one. Some would like to abandon assessment altogether, for others it cannot be formal, strict and normative enough. Nonetheless, on scrutiny, assessment is always there, however implicit or subjective it may be. Moreover, it always boils down to essentially the same kind of structure and process, but it will be heavily adapted to the particular learning environment and learning goals. It is even more astonishing then, that you will not find a rigorously worked out general mathematical model of assessment, which captures this essence of the entire assessment process, and can be taken as starting point for the specification of any concrete assessment scheme for most if not all learning processes.

Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

It is true, that over the past decades educational and instructional experts have suggested several more or less formalized approaches to the business of learning and training assessment, often supported by appropriate guidelines and tools. However, none of them is at the high level of generality and formality – without sacrificing suitability and practicality – that the author will be proposing here. For instance, test psychologists developed the family of so-called item response models, of which the 1 so-called Rasch models form the most popular subclass, as an alternative to classical test theory which has a number of drawbacks [1, 2, 3]. Indeed, it is a very clever and elegant approach to measuring all kinds of cognitive and other abilities, but it is ill suited for many de facto learning environments, in which it is not feasible or defendable for teachers to invest so many time and money beforehand just to get the once-only items for (formative) assessment purposes. As another example, consider the so-called scoring rubrics [4, 5], based on a rather pragmatic approach, relatively easy to set up but sometimes hard to apply correctly and consistently by those who did not participate in its creation. This method assumes that little narratives will help the assessor to find one of a small number of product or performance quality levels, the scoring rubrics, which best describes the observed learning outcome or performance. Unfortunately, the correspondence between qualitative rubrics and quantitative scores is, at best, approximate and, at worst, highly speculative [6]. It is not this author's intention here to deny the two preceding approaches any merits in their respective niches and with regard to their specific goals, on the contrary. However, neither one was an answer to the sort of questions, or offered a solution to the kind of problems that the author was confronted with, when he started his search for a general, suitable and practical assessment framework in the context of vocational education at university level. In fact, except for a couple of ad hoc solutions, there was no viable and scalable approach in the literature that the author scanned over the last five years, that could be taken as the basis of a methodological and mathematical sound and complete model of the assessment process. Therefore, he was literally forced to start his own in-depth research, the result of which is published here for the first time. The rest of the paper is as follows. In section two, the author will describe in detail the goals and requirements for a generic scoring model for use in higher education. In section three, the core concepts and components of the PASS model will be introduced: assignments and their quality, difficulty, productivity, and weight. The next two sections will show how to include multiple productivity factors (section 4) and multiple assignments (section 5) into the model by means of fuzzy-algebraic operations. The final chapter shows an example assessment form by way of a one-page tutorial implemented in Excel, as part of the PASS forms used by the author in his own courses.

2

GOALS AND REQUIREMENTS FOR A GENERIC SCORING MODEL

Would it not be helpful, if there were a judgment, assessment or evaluation model for learning processes and products, which could be used irrespective of the kind of learning environments and learning goals? A model that is flexible enough to take local guidelines for assessment into account. A model with a few well-chosen 'handles', so that you may customize it according to your own needs and preferences. Finally yet importantly, a model that despite all these properties is easy to apply, that is completely transparent and hence compelling, and that does not counteract any legal rules or institutional practices. PASS is such a model. Basic to the model are so-called primary assessment profiles for an arbitrary number of context-dependent assessment criteria with no more than three assessment levels per criterion, called "below average", "average" and "above average", respectively. Experience over many years has shown that these three levels are sufficient to get an adequate and reliable differentiation of learning quality in a reasonable period of time: using such assessment profiles and a special counting method you will get the equivalent of a percentage scoring scale with no more than 13 criteria. This qualitative judgment results in a primary quality statement for a given assignment. In addition a teacher may wish to take a number of secondary assessment factors of the same assignment (test) into account. For instance, this might be an estimate of the workload to produce the learning outcome, e.g., the number of hours to write an essay. Or it might be the size of the resulting outcome, e.g., the number of pages of a term paper, or the length of a presentation. Or she may be interested in the student's adherence to formal assessment criteria, e.g., for paper layout or presentation style, and his proficiency in soft skills, e.g., collaboration and communication during group work. All of these factors 1

After its inventor Georg Rasch (1901-1980), a Danish mathematician, statistician, and psychometrician

Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

are aggregated by means of a scoring formula that has provably correct behavior, no matter how many factors the teacher wants to use. The result of the procedure is a standardized quality score. These quality scores – one for each assignment – form the basis for formative evaluation, i.e. iterative feedback on learning progress during a particular course, as well as summative evaluation, i.e. the final outcome for a course over a sequence of assignments, in form of a single final score. In the last case, one has to specify assignments weights, taking appropriate differentiating criteria between the assignments into account. In addition, one may apply overall outcome or performance factors at course level in order to calculate the final score. Once one has such an aggregated mixed qualitative-quantitative score, the last step of the procedure is to transform it into a (formal) grade. Here it is where country-specific or even institution-specific transformation rules (as part of national or local examination regulations) have to be followed. For the time being, the author has exclusively focused on numeric grading systems, but there is no inherent difficulty of applying an alphabetic or symbolic grading scheme. In hindsight, it appears that the author implicitly followed the seven requirements for such an assessment scheme, which he found – concisely formulated – at the beginning of an interesting article recently published by Johanyák [7]: 1. The method should not increase the time needed for the assessment compared to the traditional evaluation techniques 2. The method should help the grader to express the vagueness in his/her opinion 3. The method should be transparent and easy to understand for both parties involved in the assessment process, i.e. the students and the graders 4. The method should ensure a fair grading, i.e. it should be beneficial for all students 5. The method should allow the teacher to express the final result in form of a total score or percentage as well as in form of grades using a mapping between them that is prescribed by the university 6. The method should be easy implementable in software development terms 7. The method should be compatible with the traditional scoring system, i.e. when the grader provides crisp scores for each response the total score and the final grade should be identical with the one calculated by the traditional way. Indeed, PASS obeys all of those requirements. Moreover, the author has created a special fuzzy algebra (see appendix) to reflect the required behavior of fuzzy quality scores (numbers between zero and one). Furthermore, conversion from fuzzy scores to numeric grades (on an arbitrary scale) is handled by a simple formula (Lamé curve, aka generalized ellipse). All this guarantees highest generality and correctness as far as the basic operations of the model are concerned. Moreover, it is now possible to write down the complete formal model on a single sheet of paper, which shows in another way it coherence, consistency and conciseness. For the time being, concrete assessment schemes created according to the PASS model are implemented on simple Excel sheets with the help of a small number of macros, but plans are underway to develop a standalone application, with an easy to learn, easy to use user interface.

3

CONCEPTS AND COMPONENTS OF THE PASS MODEL

The basic PASS model may be best explained through a visual example, see Fig. 1. We will start with the blue circles, which represent distinct assignments, here A1, A2 and A3. For concreteness' sake, let's assume that we are dealing with a course in software engineering and that A1 represents the team assignment "writing a software requirements specification", A2 represents the team assignment "reviewing the software requirements specification written by a peer team" and A 3 represents an individual test on let's say data flow analysis. See section 3.1 for more on types of assessment. Each assignment will be scored by the assessor on a number of primary quality criteria, the quality profile (for details see 3.2). The composite result will be a primary quality score between zero and one for each assignment, here Q1, Q2 and Q3. Often, this is all that an assessor wants so that the final assignment scores S1, S2 and S3 will be equal to the primary quality scores Q1, Q2 and Q3 respectively.

Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

However, if he wants or needs to take secondary quality factors into account, e.g. in order to be able to differentiate between more productive and less productive learners in a team, he may do so by determining these so-called productivity factors, here P1 and P2. For the sake of simplicity we have assumed here two productivity factors for each of A1, A2 and A3, but of course the number can be smaller or larger for an assignment, independently from the other assignments. Furthermore, each productivity factor gets its own so-called impact parameter 1 or 2 which, as we will explain in section 3.3, is formally identical to assignment difficulty (hence: ), and therefore most often set equal to each other (i.e., 1 = 2). Once the productivity factors together with their parameters p and are determined, the final assignment scores Si are indeed products of the form P2 • P1 • Qi for i = 1…3.

Fig. 1 Overview of the PASS model As assignments may be more or less important (e.g. relevant) for the course in question, we will endow each assignment Si with a weight wi such that the sum of all wi's equals one. This enables us to calculate a weighted quality score S (in the middle of the red circle) using some aggregation scheme (there are several options here; for details see section 6 below). By way of a final touch it will be possible to multiply the weighted quality score S with one or even more productivity factors P on course level yielding an overall final quality score S*. For instance, such a course level productivity factor might result from some optional learning activity or a reasoned overall judgment of student learning progress by the assessor. The last step in the procedure of PASS, transforming the final overall quality score S* into a grade G of some sort, is of course important, but the least problematic, as usually the rules for such a transformation are largely if not completely determined by the institution for higher education where the course is taught (but see [8, 9] for complicating issues in practice). In case of numeric grades, for example, a minimum grade Gmin as well as a maximum grade Gmax will be given (see green circle). Further details depend upon whether the transformation is supposed to be linear (then we are done) or non-linear, in which case parameters should be specified for the degree of non-linearity (r) and/or the strictness of conversion (biased towards the positive or negative end of the scale: ). In case of alphabetic or symbolic grades, it usually suffices to set up a simple table in which a range of scores (between zero and one) is mapped to the appropriate character or symbol.

3.1

Assignments

Assignments can be any kind of tasks, or tests, given to students during or at the end of a course with the purpose of revealing the acquired knowledge or competence in the course's domain. As above,

Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

let's again take the course in software engineering and assume (1) that A1 represents the team assignment "writing a software requirements specification", (2) that A2 represents the team assignment "reviewing the software requirements specification written by a peer team" and (3) that A3 represents an individual test on let's say data flow analysis. Thus we see, that assignments may be individual assignments (A3) or group assignments (A1, A2), in which case we need a method to differentiate between the individual contributions of students to the common end product. Furthermore, assignments may be classical (binary or multiple-choice) tests (A3) with answers that are just correct or false (one extreme), or they may be semi-realistic complex tasks in the tradition of problem-based learning (A1, A2) assessed on a number of different quality criteria, not just correctness (the other extreme). Also, assignments may ask for original outcomes (A1) or they may require a reflective act in the tradition of self, peer or group assessment as in A2 [10]. Finally, assignments may be aimed at constructive competences (A1), analytical competences (A2) or just reproductive competences (probably A3). A truly generic assessment approach like PASS should cater for all of these distinctions [11], and it does. A course may consist of just a single short assignment, e.g. a classical multiple choice test at the end of the course (summative evaluation). Or it may require a larger task, e.g. a term paper to be delivered at the end of the course, with plenty of opportunity for intermediate coaching and feedback (formative evaluation). Or it may consist of a sequence of smaller assignments spread over the entire course, each covering part of the course contents, again enabling students and teachers to closely interact and reflect upon learning process and progress. Depending upon such strategic differences, one or more additional productivity factors may be specified and recorded by the assessor, each modifying his initial qualitative judgment about the learning outcome or learning process.

3.2

Quality of Assignments

How is this initial qualitative judgment of the assessor to be made? As already explained above, the starting point for qualitative assessments are profiles of three-valued quality criteria. The most obvious candidate for such a criterion is of course: correctness. However, for more complex assignments like A1 and A2 (see above) a single criterion will not be enough, and correctness may not even be the most important. For instance, for semi-realistic assignments of type A1 or A2 this author uses at least the following four so-called C-criteria: Coherence, Correctness, Completeness, and Comprehensibility. The three values of each criterion are defined to be "below average" ( ), "average" ( ) and "above average" ( ) (see Fig. 2). The choice of these three quality levels has been motivated, on the one hand, by a well-known observation from experimental and educational psychology, that most people are uncomfortable with and even unreliable in the use of rating scales of many points. I.e., inter-rater reliability continues to be a critical aspect of the kind of assessments we are dealing with here. Even worse, this author has made similar experiences for intra-rater reliability (sic!), and was thus forced to make a radical decision – after several unsuccessful attempts to find another solution – to stick to these three levels, which can be readily defined, explained and distinguished in most cases.

Fig. 2: The basic three-valued scoring model

Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

On the other hand, it turned out that it was relatively easy to define a lexicographic ordering over all assessment profiles, i.e. 3-tuples of the number of occurrences of , and in those profiles, which represents the overall relative quality of the profiles. Turning such a rank order into a score between zero and one didn't pose a problem, either. However, two different lexicographic orderings are possible, the one more lenient to , the other more strict to . The solution to this dilemma was to define a mixture of both lexicographic orderings, and to leave it to the assessor to set the level of strictness as he or she feels adequate. Often it will be wise, to just set = ½. Combining all foregoing considerations yields a formula ) for the primary quality score, in which n is the number of quality criteria, n the number of quality criteria assessed as "below average" called malus, n the number of quality criteria assessed as "above average" called bonus, and is the level of strictness between zero and one2. The behavior of ) = for given and 3 and different combinations of bonus and malus, so-called isoscore curves, is shown in Fig. 3 (n=5). It turns out that this score formula is a direct generalization of the well-known rule "score = number of items answered correctly / total number of items administered" for a test with n items which can only be answered correctly or false: this is the special case for n + n = n and = ½.

Fig. 3: Contour plots of primary quality scores for compensating bonus and malus (the isoscore curves look linear, but are in fact quadratic curves)

3.3

Difficulty of Assignments

It is clear that assignments are not necessarily equally difficult. In order to take this fact into account it appeared necessary, and surprisingly helpful, to introduce a difficulty parameter between zero and one. Assignments which are so easy that every student should be able to master them perfectly get a difficulty parameter of 0; assignments which are so difficult that no student would be able to master them at all get a difficulty parameter of 1. Assignments with =0 or =1 should be the exception, as they provide no information at all. Obviously, useful assignments should have a level of difficulty somewhere in the middle between zero and one.

2

is clearly not symmetric in its arguments, i.e. reversing the roles of bonus and malus will change its value The function ( ) ) = 1. (unless they happen to be equal), but it obeys a De Morgan-style law in the sense that ( 3 ) , and should be The equation ( doesn't always have the origin (0,0) as a solution. In order that (0,0) related by (n-1) + (n+3) = 2. Thus figure 3 only shows such curves for which the latter equation is true.

Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

How can the assessor determine the level of difficulty of a given assignment? First, note that the difficulty of an assignment should not be set dependent upon e.g. the number of students in the course that are able to pass the assignment. On the contrary, difficulty is conceived here as an inherent property of an assignment, given the course contents and goals. Thus it makes sense, to define a number of criteria of difficulty in much the same way as we did for primary quality. The result will be a difficulty profile, and we could apply the same 'machinery' as for primary quality to arrive at a level of difficulty between zero and one. In any case an estimate of difficulty should be given before the students will embark on the corresponding assignment. How will the degree of difficulty be combined with the primary quality as defined in the previous section? Well, without going into the details of the derivation, it turned out, that a very sensible approach ( ) where ( ), i.e. fuzzy is to use the following formula: is defined by implication (see appendix). It is easy to check that the new q* will vary between the old q and 1 as varies between one and zero, i.e. in reverse order; by putting in the denominator of q*, the order is made compatible. In other words, very easy assignments keep the primary quality score, moderately difficult assignments will get some 'bonus' on top of the primary quality score, and extremely difficult assignments will be awarded a one (except when the student didn't try it at all, in which case q* = q = 0). This is all very useful, because now the assessor can concentrate on the pure quality of the learning outcome or performance, without implicitly taking assignment difficulty into account, which would only confuse and confound quality and difficulty.

3.4

Productivity of Assignments

The final step in adjustment of a primary quality score concerns the specification and application of one or more secondary quality criteria, called productivity factors. As explained in section 3 there may be various reasons why an assessor would like to take such factors into account. The basic idea behind the introduction of productivity factors is that the quality of an assessment outcome or performance always depends upon a certain amount of effort put into the assessment process by the learner. In fact, assessors will always, perhaps implicitly, assume that a certain amount of effort will be invested by the students (especially in the case of semi-realistic complex learning assignments). Wouldn't it be sensible then to make this explicit and define this default standard effort beforehand, to communicate it to the students, and to use it "for better or worse" in a further elaboration of the assignment quality formula? This is exactly what the author did. The result is a somewhat complicated formula when fully written out, however, using fuzzy-algebraic notation [12], it turns out to be just slightly more complicated than the formula above for incorporation of the assignment difficulty, and still relatively easy to interpret. Besides, we can visualize the global effect of different degrees of productivity on a given score ranging between zero and one in the following diagram (Fig. 4):

Fig. 4: Diagram showing the effect of multiplication of a quality score with a productivity factor

Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

In this formula, assignment difficulty will be combined with a new parameter: assignment productivity ( ) is then defined by • ( ) ( ). Here • ( ) is the scalar p. A productivity factor ) (see section 3.3 above). If p = 1, product of the productivity parameter p with fuzzy implication ( i.e. the assumed default, the productivity factor is just 1, implying that the quality score will not be changed. In the case of p = 0, i.e. complete lack of effort, the productivity factor will be 0 too, yielding an assignment quality of 0, as expected. If p lies somewhere between 0 and 1, we have underachievement yielding a malus; if p lies above 1, we have overachievement yielding a bonus. In the case of extreme high p the productivity factor reduces to the formula above for q*, the highest possible bonus on top of the original quality score. Thus these productivity factors ( ) behave in a very sensible way, reflecting our intuitions, i.e. keeping it as it is in case of standard (default) learning effort, decreasing it in case of underachievement (malus) and increasing it in case of overachievement (bonus). A most interesting application of productivity factors is as a well-founded solution for one of the original problems this author was confronted with: how to assign differential quality scores to individual students who have collaborated in teams performing an assignment? Using productivity factors this problem can now be easily solved by collecting all sorts of evidence (as time permits, of course) about individual contributions of the team members to the end product of the team. For example, students can record in a document history the way they have collaborated on a common document and who signs responsible for how many (and which) pages or parts of the document. Or, when presenting the results of a common R & D project, it will be feasible to record how many minutes a team member is actively involved in the presentation and how his presentation style is. All this information can be used to individually adjust the primary quality score based on the team product. If moreover data about the work process of the team is available, these can also be used for further differentiation and fine-tuning. An obvious objection against this approach could be to assume or expect that collecting this kind of evidences is harmful for collaboration and team spirit. However, even after several years of experience with this assessment method, the author has not found any clear evidence for such negative effects. What he did observe now and then were teams that probably tried to avoid any such discussions and effects in the first place by reporting equal shares and contributions to their common end product for all team members: the ultimate mode of cooperativeness!

3.5

Weight of Assignments

Last but not least, assignments may differ from each other with respect to their relevance or importance to the learning goals of the course module. Again, like assignment difficulty, this is something that can and should be judged and determined beforehand by the assessor himself on the basis of the module description and perhaps other formal or informal information from faculty. However, these assignment weights should not be confused with assignment difficulty (see section 3.3). Moreover, an important condition for later aggregation of the final assignment scores into a single overall score (see section 5) will be, that the sum of all weights wi of assignments Ai equals one, because that will enable us to calculate a weighted quality score S over all assignments using some wellbehaved aggregation scheme. There are several options here, the most important ones being an additive/disjunctive aggregation versus a multiplicative/conjunctive scheme.

4

INCLUDING MULTIPLE PRODUCTIVITY FACTORS

In previous sections, notably section 3.4, we have already alluded to the fact that each assignment may have zero, one or any number of productivity factors attached to it. If the assessor didn't specify any productivity factor, this implies that the original primary quality score determined by the assessor on the basis of basic quality criteria (see section 3.2) will also be the final assignment score, except for the case that he is prepared to take assignment difficulty into account, in which case he should use the simple formula of section 3.3. If, however, the assessor has good reasons to introduce one or more productivity factors, he has to specify, for each one, an impact parameter (usually equal to assignment difficulty ) and a productivity parameter p (based on an idea about default effort for the assignment in question, see Section 3.4). In section 3.3 the author has already suggested, that the impact (difficulty) parameter may be determined in much the same way as a primary quality, i.e. setting up a difficulty profile and combining the constituent difficulty criteria through the formula in section 3.2.

Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

An astonishing fact is that also the productivity parameter may be determined in such a way. Suppose that we have identified a productivity factor, i.e. a secondary quality factor, which consists of a number of clearly distinct quality criteria. For instance, if document layout is such a productivity factor, it is easily possible to distinguish more than ten distinct criteria having to do with the presence and layout of title page, table of content, headers, footers, page numbers, section numbers, tables, figures, etc. So let's assume that we have found and settled on n=13 such criteria, and that we want to treat , "above average", equally strict as , "below average", so that =½. Then we may declare ½ (0,0) = ½ to be our standard or default achievement, which would not have any effect on the original score. After all, ½ (0,0) tells us, that this (fictitious) student has scored "average" on all document layout criteria. ½ Taking the quotient ½ (0,0) thus gives us a well-founded productivity parameter for document layout for a student scoring ½ ). By adjusting =½ up or down one will get any desired effect reflecting the teacher's strictness bias. Usually however, productivity parameters will be based on some variables which can be easily counted or measured, e.g. number of pages, amount of time, etc. The assessor has to prescribe, based upon his deep knowledge of the assignment, the amount of work (effort) on one or more dimensions that should be considered the default or standard productivity. Taking the quotient of observed effort and standard effort then gives the productivity parameter.

5

INCLUDING MULTIPLE WEIGHTED ASSIGNMENTS

Including multiple assignments into a single overall quality score presupposes two things: (1) all assignments have got weights, summing up to one (see section 3.5); (2) some aggregation rule has been fixed according to which the final assignment scores will be 'merged' into a single overall quality score. Different kinds of aggregation rules are possible. A most useful, and simple aggregation rule is the additive/disjunctive aggregation rule, which states that the final overall score is the weighted sum of final assignment scores: . An objection against this aggregation scheme might be that simple (weighted) addition is not appropriate for scores which are not guaranteed to lie on an interval scale (one of the reasons why Rasch models have been proposed). Aside from this objection, it is not quite suitable in the context of a fuzzy algebra in the first place, where addition is defined as x y x + y – xy = 1 – (1-x)*(1-y) and scalar multiplication is rather ex( ) suggests ponentiation. Instead a multiplicative/conjunctive aggregation rule like itself (see appendix), but has some disadvantages too. We are still working on this problem.

6

EXAMPLE ASSESSMENT SHEET

Fig. 5: Tutorial explaining the core concepts, components and formulas of the PASS model

Presented at Tuesday, March 8, 2011 and published in the proceedings of the INTED 2011 Conference, Valencia - Spain, March 7-9, 2011

The above copy of a tutorial sheet (Fig. 5) as part of the PASS assessments forms used by the author in his own courses in software engineering may clarify how the above ideas can be realized in Excel. The upper-left grey rectangle is what would appear on a real assessment form, the rest are tutorial explanations: the bottom-left rectangle replicates the calculation of the final assignment score starting with the primary quality score and successively applying three productivity factors; the rectangles to the right explain the parameters and formulas used in the assessment form to the left.

REFERENCES [1] Bond T. G. & Fox C. M., Applying the Rasch Model: Fundamental Measurement in the Human Sciences, Lawrence Erlbaum, 2001 [2] Borsboom D., Measuring the Mind: Conceptual issues in modern psychometrics, Cambridge University Press, 2009 [3] Masters G. N., Advances in Measurement in Educational Research and Assessment, Pergamon Press, 1999 [4] Arter J.A. & McTighe J., Scoring Rubrics in the Classroom: Using Performance Criteria for Assessing and Improving Student Performance, Corwin Press, 2000 [5] Stevens D.D.& Levi A.A., Introduction to Rubrics: An Assessment Tool to Save Grading Time, Convey Effective Feedback and Promote Student Learning, Stylus Publishing, 2004 [6] Johnsson A. & Svingby G., The use of scoring rubrics: Reliability, validity and educational consequences, Educational Research Review 2, pp. 130-144, 2007 [7] Johanyák Z.C., 2009, Survey on Four Fuzzy Set Theory Based Student Evaluation Methods, in: Proceedings of Kecskemét College, Faculty of Technology (GAMF), Kecskemét, XXIII (2008), pp. 121-130 [8] Guskey T. R. & Bailey J. M., Developing Grading and Reporting Systems for Student Learning, Corwin Press, 2000 [9] Yorke M., Grading Student Achievement In Higher Education: Signals And Shortcomings, Routledge, 2007 [10] Roberts T.S., Self, Peer and Group Assessment in E-Learning, Information Science Publishing, 2006 [11] Johnson R. L., Assessing Performance: Designing, scoring, and validating performance tasks, Guilford Press, 2009 [12] Novak V., Perfilieva I. & Mo ko J., Mathematical principles of fuzzy logic, Kluwer Academic Publishers, 1999

Appendix: A fuzzy scoring algebra The PASS model can be formalized as a special adjointness algebra [0,1], 0, 1, , where [0,1] is the unit interval (representing scores), 0 is the null score (additive unit), 1 is the full score (multiplicative unit), is the complement score of s defined as 1 – s, is score multiplication defined as the usual product, is score addition defined as the probabilistic sum = + = (1 )(1 ), and is score implication defined as = =1 (1 ). Furthermore, we introduce first as an abbreviation ([12], p 27): =



=1

(

)

Obviously, this can be turned into real scalar multiplication for any r 0 and any score s. Finally, ( )• ( ) ( ), which satisfy 0 1 for all productivity factors ( ) are defined as r, and s, are monotonically increasing in r and , and monotonically decreasing in s.