Testing as a measure of worker health and safety ...

2 downloads 0 Views 111KB Size Report
to emergency hazardous materials incidents [OSHA, 1989]. .... 1994]. Such cases either imply or demonstrate that the inherently complex workplace renders ...
AMERICAN JOURNAL OF INDUSTRIAL MEDICINE 37:221±228 (2000)

Testing as a Measure of Worker Health and Safety Training: Perspectives from a Hazardous Materials Program B. Louise Weidner,

PhD



Background Health and safety training for hazardous materials workers is among OSHA's major policies. A large and growing workforce in this area, and the resulting risks for these workers and the public, make quality training critical. Measuring trainees' individual knowledge following training is a common but controversial practice. Methods Technical issues and bene®ts in testing, strategies for mitigating the limitations of testing, and the relevance of testing at a broader policy level were examined from the perspective of a large and diverse program. Results Knowledge data from individuals greatly aided in evaluating program effectiveness at the time of training and in assessing workplace impact later. Use of sound testing principles and creative examination methods and materials, and collaboration across programs, all helped to address concerns for individual programs and the ®eld generally. Conclusion Programs would bene®t from fully considering the bene®ts and options related to knowledge assessment in training. Those who choose to assess individual knowledge could move the process forward through added rigor, collaboration, and documentation of efforts. Am. J. Ind. Med. 37:221±228, 2000. ß 2000 Wiley-Liss, Inc. KEY WORDS: worker training; training effectiveness; pro®ciency assessment; health and safety training

INTRODUCTION Health and safety training for the two million workers who handle hazardous materials is critical to their safety and to the safety of the public. It is addressed by the Superfund Amendments and Reauthorization Act (SARA) of 1986 [US Congress, 1986]. OSHA's Final Rule for such training, 29

Director of Evaluation and Assistant Co-Adjutant Professor, Division of Public Education and Risk Communication, Department of Environmental and Community Medicine, Environmental and Occupational Health Sciences Institute (EOHSI), Robert Wood Johnson Medical School ö University of Medicine and Dentistry of New Jersey, Piscataway, New Jersey. Contract grant sponsors: National Institute of Environmental Health Sciences, NIH and EPA/DOE; Contract grant number: U45E506179. *Correspondence to: B. Louise Weidner, 19305 218th Place NE,Woodinville,WA 98072. Email: [email protected] Accepted 7 June1999

ß 2000 Wiley-Liss, Inc.

CFR 1910.120, covers workers at operations at uncontrolled hazardous waste sites; at Resource Conservation and Recovery Act sites; at hazardous substance operations in treatment, storage and disposal facilities; and those who respond to emergency hazardous materials incidents [OSHA, 1989]. The rule was promulgated on March 6, 1989 and became effective on March 6, 1990. Given that virtually every community in America is potentially affected by on-road incidents alone, this training clearly has far-reaching implications. Many organizations thrust anew into the world of regulated worker health and safety training by 29 CFR 1910.120 are familiar with the technical aspects of hazardous materials but lack the expertise to provide, or the resources with which to obtain, quality worker training. Similarly, training regulations, while based on known scienti®c principles related to hazards, often lack the underpinnings of the principles of adult learning and assessment. In

222

Weidner

response to these limitations, as well as the enormous proportion of industry affected by the Final Rule, SARA also called for the establishment of a worker training program. Accordingly, in 1987 the National Institute of Environmental Health Sciences established the NIEHS Hazardous Materials Worker Training Program. Its awardees are representative of the diverse organizations and workers affected by the Final Rule, and have provided guidelines for both training and assessment relative to the rule. These include guidelines for minimum criteria for training [Moran and Dobbin, 1991] as well as elaboration of those criteria with particular reference to educational objectives. Both of these documents now comprise Appendix E of the rule [OSHA, 1989]. More recently, a comprehensive evaluation guide was compiled which details the very wide range of evaluation objectives and instruments that have been generated by the NIEHS training programs [George Meany Center et al., 1997]. Much agreement has been forged among the NIEHS grantees regarding sound training principles in a ®eld where both knowledge and performed skills are required. It is agreed, for example, that preparing trainees for the application of equipment or devices during work duties should include hands-on instruction, and that assessment of such instruction should entail demonstration of skills. However, other issues remain markedly controversial, not the least of which is the use of examinations to measure knowledge, and the application of a criterion score as a standard for test outcomes. Disagreement on these issues is rooted in fundamental differences in the philosophical and practical aspects of training, which certainly predate the NIEHS program for hazardous materials. Indeed, the question of testing is constantly addressed by programs as they attempt to reconcile training and assessment methods with the very basic differences between education and adult training. This is complicated by the seemingly con¯icting goals of providing quality instruction compliant with the Final Rule, and providing instruction that is responsive to the realities of the workplace. The New Jersey/New York Hazardous Materials Worker Training Center is an NIEHS-funded program serving all four of the worker audiences targeted by OSHA. It offers at least thirty courses at any given time and has provided over one million training hours to over 150,000 workers. The volume of training and the nature of the program require assessment procedures which are both ef®cient and effective, which utilize computing and other technology and which often include testing [Gotsch and Weidner, 1994]. Over the course of a decade of evaluation at NJ/NYHMWTC, the bene®ts of testing have appeared signi®cant and the drawbacks have appeared workable. That being so, the following discussion focuses speci®cally on testing as an assessment choice and the issues commonly associated with it. While a comparative discussion regard-

ing the bene®ts of testing and alternative methods is not the focus here, its training programs are encouraged to give both testing and its alternatives full consideration as they develop procedures consistent with the needs of their own training audience.

TESTING AND PROGRAM EVALUATION Issues and Concerns Programs that collect and use examination data for program evaluation purposes are often criticized for preferring to evaluate trainees rather than to submit themselves to evaluation, and for having trainees assume all of the risk and none of the bene®ts in the process. This is in large part because trainees are unlikely to bene®t directly from program changes which exit examinations may precipitate and, in the absence of detailed feedback, are also unlikely to derive any of the potential learning bene®ts which testing may offer. Even when it is expressly for internal use, the exam process can create anxiety and has also been criticized as disrespectful to working adults who have acquired knowledge from other sources, including experience. These concerns are particularly relevant where training content now required under the Final Rule has long been germane to job performance. Unless pre-tests are used, which are considered similarly questionable, trainees' prior knowledge can in¯ate test outcomes, masking program de®ciencies rather than providing guidance for change. It is also at issue whether structured exams can assess experiential knowledge at all, since it is acquired, processed, internalized, and adapted in an informal and unstructured manner. Another source of concern is the presumably limited value of test data to the ultimate measure of program effectiveness, namely workplace impact. Since knowledge is a poor predictor of behavior, it is argued that programs would do better by measuring change in more in¯uential traits, such as attitudes, risk perception, motivation, etc. Given these constraints, many programs employ both qualitative and quantitative alternatives to determine training effectiveness and the likely program impact. Various group-based or anonymous methods are used which address issues of anxiety and dignity while providing a forum for content reiteration and clari®cation. Some programs omit knowledge assessment altogether in favor of pre- and/or post training measurement of opinions, perceptions, attitudes or other subjective traits. These data are not only useful in evaluation of training effectiveness, but provide baseline information and direction for subsequent impact assessment. Satisfaction surveys requesting trainee feedback on course features are by far the most popular, and are a favorite where assessment of trainees is considered inappropriate. Like examinations, these surveys are typically administered at the end of training and may not

Testing as a Measure of Worker Training

directly bene®t trainees. Unlike exams, however, they tacitly acknowledge trainees' opinions and experience as well as their knowledge regarding the workplace hazards they face.

The Unique Role of Knowledge Data in Program Evaluation Of course, alternatives to testing, indeed all evaluation methods, contribute to program evaluation objectives in unique and signi®cant ways, producing outcomes which are not necessarily interchangeable in purpose or meaning. Likewise, direct measurement of knowledge is also unique and, just as it alone is an insuf®cient indicator of program effectiveness, so, too, are other data insuf®cient without it. Indeed, evaluation at NJ/NYHMWTC has required investigative and explorative analysis of the same diverse data which are critical to the evaluation of large-scale educational policies and programs, including variables regarding program features, objective student traits (demographics, etc.), subjective student traits (attitudes, values, etc.), and student knowledge. Such outcomes are known in education to aid interpretability, validity, comparability, reliability, and generalizability of data; to move programs from mere documentation of learning toward understanding the factors which in¯uence it; and to have relevance to larger program or policy efforts [Messick, 1987]. The importance of varied data became clear early in our program. Satisfaction surveys indicated that trainees consistently assigned the highest ratings to the most engaging and animated instructor. His personal in¯uence on the attitudes and motivation of trainees was well known, and was still detectable in follow-up surveys twelve months after training. When it was noted that mean ratings for his classes were also higher on variables unrelated to his instruction (quality of materials, facilities, etc.), additional analyses were performed on test, survey, and background data. Results revealed that the instructor to whom trainees consistently assigned the lowest ratings was also consistently as effective at communicating core technical information, and that this was so across all demographic groups. The relative merits of attitudes and enthusiasm provided by one trainer vs. the knowledge and information provided by both, are clearly debatable for some. Just as clear, however, as those factors that in¯uence and subsequently indicate that learning may in fact be subtle, and that obvious conclusions are not necessarily accurate. Regarding impact assessment, the behavioral in¯uence of attitudes, perceptions, and myriad workplace factors external to workers' control are well documented in programs of highly varied pedagogy and technique [Morgan, 1983; Hopkins et al., 1986; Komaki, 1978, 1986, 1987; Noe, 1986; Mergler, 1987; Pidgeon, 1991; Wallerstein and Weinger, 1992; LaMontagne et al., 1992; Ford and Fisher, 1994]. Such cases either imply or demonstrate that the

223

inherently complex workplace renders knowledge an unlikely sole predictor of behavior. But this same complex workplace also precludes attributing behavior directlyÐ much less solelyÐto changes in subjective traits that are documented during training. Impact assessment of any kind is further complicated by its delayed nature and by the unavailability of some worker groups. Within such constraints, it is arguable that workplace impact is a phenomenon more worthy of being assembled than assessed. This has been particularly apparent at NJ/ NYHMWTC with regard to the emergency responders, whose duties are unscheduled by de®nition and take place under conditions that preclude monitoring or observation, either by peers or outsiders. Piecing together the impact which training has probably had among these individuals has required multiple and varied sources and methods [Weidner et al., 1998].

The Role of Knowledge Assessment in Program Accountability Perhaps even more signi®cant than the evaluative bene®ts of knowledge assessment is that such assessment recognizes the individual as the intended bene®ciary of training, and recognizes knowledge as the fundamental objective for which programs are accountable. With regard to the ®rst, it is clearly not regulatory intent that training occur among ``most'' of a vaguely de®ned group who ``might be'' at risk for minor or obscure hazards. Rather, the intent is to reach all individuals whose jobs present potential hazards of a serious and speci®c nature. Just so, effectiveness is not demonstrated with ambiguous claims that ``most'' trainees seem to ``get it'' at the conclusion of training. Rather, it is demonstrated through speci®c outcomes among individuals, which are considered individually and in aggregate. Regarding knowledge as the fundamental objective for which programs are accountable, it is clearly presumed that health and safety training, ®rst of all, encompasses an essential body of knowledge. This is so regardless of topic (hazard assessment, personal protection, etc.), learning objectives (hands-on skills, knowledge, etc.), impact objectives (behavior control, empowerment, etc.) or pedagogy (constructivism, cognitive learning, etc.). Thus, while knowledge alone is a poor predictor of behavior, zeal without knowledge will neither fully produce, nor sustain for long, the changes which re¯ect intent of training. Knowledge is simply the most direct measure of training and of its potential for lasting impact; conclusions from all other data are extrapolative concerning it, regardless of their importance or measurability. Finally, technical and ethical concerns over testing and its effects on trainees are legitimate only to an extent, and can be readily addressed through procedures. Detection of

224

Weidner

baseline knowledge with pre-tests is an obvious and simple way to help control interpretation of outcome scores vis-aÁvis training effectiveness. Testing can also tolerate very creative solutions in addressing concerns with privacy, dignity, and technical limitations in testing. Such options simply require procedures which do not alter knowledge while assessing it, which distinguish one learner from another, which provide an unclouded context for observing or assessing skills and/or knowledge, and which allow for assessment of interactions across variables.

TESTING TO EVALUATE TRAINEES Issues and Concerns Examinations which are used to directly evaluate trainees are typically more comprehensive than examinations used to evaluate basic program objectives and also have greater personal implications for the dignity, privacy, and employability of trainees. Not surprisingly, ¯aws that are considered inherent to testing are therefore more troubling in that context. As noted above, among these concerns is the question whether examinations are compatible at all with assessing experiential knowledge, which is highly in¯uential on behavior but which is acquired, processed, applied, and internalized in an informal and unstructured manner. Also questionable is the value of both training and examination where content is irrelevant to workers' regular duties, and where a familiar or an applied frame of reference is therefore absent. Training and evaluation are also believed by some to further shift the onus of safety from the employer to the employee, holding the latter responsible for understanding and responding to hazards which employers are legally or morally bound to mitigate to negligibility, if not eliminate altogether. Finally, since health and safety training derives from a vast technical ®eld that cannot be fully addressed in the limited context of training, training is criticized as engendering in workers a false sense of security regarding their ability to perceive, respond to or function around hazards. As a result, testing is believed to provide what trainees may consider documentation of their quali®cations or expertise to work with hazardous materials.

The Implications of Testing for the Individual While each of these concerns have clear importance to programs that seek to acknowledge trainees as individuals rather than targets, opposing viewpoints are relevant on the same merit. First, the workplace is comprised of individuals whose behaviors will either imperil or protect their health and safety and that of their coworkers and, while knowledge certainly is not a predictor or guarantor of certain behaviors,

ignorance is. That being so, it is expedient to be conservative rather than speculative where presumed information is concerned. Speci®cally, while it may be a logical conclusion that experiential knowledge cannot be assessed by formal or structured methods, such a conclusion is not supported in the literature. Conversely, both logic and experience indicate that knowledge that is too vague to be retrieved and conveyed may also be inadequately rooted in concepts or cognition needed for consistent application, or for correct application in inconsistent conditions. It is dif®cult to imagine a worse scenario than the pairing of a precipitous and extreme hazard, whose redress requires speci®c information, with vague or general understanding which has never been formally processed or rehearsed. A need for ¯uency in and familiarity with concrete information is also why training and assessment are not only critical, but are most critical, where content is not directly relevant to workers' regular duties. Initial training and assessment, and annual refresher training, all require relatively little time, yet they provide what is often the only context wherein information is formally processed. Regarding the underlying implications of the testing and training processes, concerns that employers are implicitly less responsible if workers receive health and safety training seem justi®ed by the incidents which occur daily due to refusal on the part of some employers to accept legal and moral responsibility for work conditions. Again, however, the workforce is comprised of individuals who, as workers, are ultimately at risk. Their own knowledge is the critical ®rst step toward their ability to exact appropriate accountability from others. Training and testing recognize the right of individuals to receive information directly applicable to them, their right and ability to receive and deal with feedback regarding their understanding of that information, and their ability and desire to exercise sound judgement in applying their knowledge in their own work environment. Under this model workers are active and accountable partners in their health and safety rather than passive, reluctant recipients of imposed information. This same rationale applies to concerns over the vast technical ®eld from which training is derived. Concluding that a learning process with limitations should be eliminated altogether not only calls into question all forums of learning but, more importantly, raises ethical questions regarding whose right it is to make and enforce such a determination. While a little knowledge is a dangerous thing, a complete lack of knowledge is certainly worse. Training and evaluation assume that knowledge is power, and that more of both is better where workers are concerned. A false sense of security among workers, or any other misinterpretation, can be prevented if programs set realistic goals, fully inform workers of the objectives and limitations of training and assessment, and explain orally and in writing the signi®cance and limits of all documentation procedures.

Testing as a Measure of Worker Training

Convergence of these issues became clear to us through a worker who was contacted for a follow-up interview several months after his training in our program. At the time of his training he was a long-term employee in an industrial setting, and safety procedures and precautions had been integral to his regular duties. Upon returning to work he was able to recognize the multiple and serious hazards which had long been adjacent to his work area, and that these included a large cache of drums containing illegally stored hazardous materials. He was successful at having himself relocated, but his repeated complaints regarding the risks to others went unheeded, and a reactive incident eventually ensued. He eventually became aware of so many egregious safety violations that he sought other employment. While this clearly should not have been his only recourse, he noted that his training was basic to his continued safety and would have been basic to any solution he would have sought. He also concluded that workers should take a willing part in increasing their ability to identify hazards for themselves, and should be less willing to relinquish control of their own safety to others, including those who sincerely act in the workers' best interests.

A STANDARD FOR TESTING Issues and Concerns As with some of the issues addressed above, concerns over application of a test standard in training are directly linked to concerns with the training process itself. Application of a test standard across programs is considered by many to be a contributing factor in the false sense of security among workers which was discussed above, lending validation to trainees' perception that a passing score quali®es them to handle hazardous substances. Given the broad range in programs and tests, it is also argued that such a standard would be applied arbitrarily and would thus be meaningless. The OSHA standard, which speci®es that a passing score be set no lower than 70%, is also believed to reinforce mediocrity as a training objective.

Practical and Technical Aspects of a Standard Again, it has been our experience that a false sense of security among workers is more likely to occur when programs fail to disclose the limitations and purpose of their training, testing and documentation procedures. Regarding a standard as a reinforcement for mediocrity, it is important to bear in mind that 70% is a minimum standard, signifying that programs may set higher, but should not set lower, standards. Indeed, the NIEHS Worker Training Program recommends a standard of 80% [OSHA, 1994]. Further, mediocre training and mediocre assessment are products of

225

a mediocre program, not the result of a standard having been selected and applied. (Otherwise the converse would also be true, and a standard of 100% would vastly improve quality across programs.) Instead, trainees receiving a dif®cult examination may score lower yet know more than trainees taking a less rigorous examination, and, depending on the standard, neither or both may receive a passing score. The most pertinent issue regarding a standard is not mediocrity, but arbitrariness. Contrary to common assumption, arbitrariness is a concern not because the application of a standard to a broad and varied range of programs is baseless or capricious, or because a standard it is arbitrarily or subjectively selected for ``sounding about right'' as an achievement goal. Indeed, a 70% score is widely accepted in testing as an indicator of moderate knowledge, 80% of moderately high knowledge, and so forth [Angoff, 1984]. Rather, the arbitrary nature of a general standard concerns its application at the program level, where the critical issue is whether training is appropriate to the intended audience as regards content and dif®culty, and whether testing is appropriate to the training on the same terms. Thus, applying a standard is less a matter of agreement on a score and more a matter of applying appropriate procedures and analyses to ensure reasonable consistency between a standard, the training, the trainees, and the test. If such consistency were achieved across the board, 70% would represent the same moderate achievement for similar programs. The extent to which this is possible depends on the extent to which programs internally apply accepted, standard principles in test development, test administration, and outcome analysis. Achievement of more consistent and reliable testing for health and safety training also depends on the extent to which programs collaborate and communicate regarding learning and test objectives. Indeed, progress in the area of testing is largely stymied not by variation in training and testing, but because efforts across programs are frequently unknown, and the referential value of reported data to the larger trends therefore remains unknown as well. Collaborative work across similar programs, or across similar segments of diverse programs, would give a broader context for data interpretation and would facilitate explorative analysis within and across groups, which was noted earlier as critical to program evaluation and to larger policy efforts. Statistical manipulations, such as correlation matrix analysis, can assess a diversity of interactions across data sets, or maximize data from individual programs. These and other techniques can be particularly powerful in larger sampling frames, where matrices form quickly and where factors that in¯uence training and learning can be more readily detected. Lastly, although mediocre training is a function of program quality rather than of a training standard, the fact that worker training and testing objectify pro®ciency rather than mastery seems to further evidence the inherent

226

Weidner

limitation of training and testing, and therefore requires explanation. First, because the range of potential knowledge is continuous rather than discrete, the difference between subject mastery and subject pro®ciency is a matter of degree rather than nature. Mastery as a learning objective is therefore particularly unreasonable in the training context, where a practical balance is sought between health and safety imperatives, and the economic considerations of workers and employers. Because assessment can never repeat the fullness of learning, mastery is even more illusory as an assessment goal. Still, within these constraints, it can be argued that even the brevity of training allows for a de®nable transfer of knowledge. It also allows assessment activities to comprise a comparatively high proportion of program time, thus providing a reasonably thorough indication of pro®ciency in the material covered.

OBJECTIVE EXAMINATIONS Issues and Concerns The limitations of training, and the consequential limitations of testing, are considered by many to be particularly exacerbated by the use of objective examinations. While ®xed format examinations (multiple choice, true/false, etc.) are by far the most popular because they are ef®cient and easily modi®able, they are also most open to criticism. One concern is the potential for instructors to ``teach to the test''. This is often mistaken as being a result of instructors having access to or control over examination content, then tailoring instruction accordingly. Instead, teaching to the test concerns the possibility that the conceptual level of training will be in¯uenced by the test. Examinations which require rote factual response rather than higher cognitive skills (evaluation, analogyzing, problem solving, etc.) can drive training to a lower cognitive level: instructors teach to the (lower) test, and students in turn prepare for it. A related concern is that concepts which are integrated and connected in learning and application are often arti®cially isolated and fragmented for examinations, and are not therefore assessed authentically in that context [Frederiksen, 1990a]. The ®xed format examination is also seen by some to further dilute any potential value of a training standard. Scoring usually entails a single ®nal outcome, which means trainees can receive an acceptable test score overall with complete ignorance in one or more critical topic domain.

Practical and Technical Aspects of Examinations While the constraints just described are real, they are not inevitable. The ``teaching to the test'' effect of examinations on training quality is again a matter of control over

training itself. Given the distinctions between training and testing, programs would bene®t from obtaining expertise speci®c to training, obtaining additional expertise speci®c to testing, and keeping the two functions harmonious but distinct in their operations. Minimizing other problems is a matter of applying basic principles of test development. For example, the convenience of the ®xed format examination lies with the ®xed response options, which greatly facilitates data input, management, and analysis (and, therefore, tests research and development). It does not, however, require that concepts be isolated or disconnected; does not preclude items which draw upon skills other than rote memory; does not preclude use of item sub-pools for topic domains; and does not restrict scoring within or across topic domains. In addition, interview techniques and other alternatives are available which, if re®ned for the training experience, have the potential to preserve the goal of assessing content speci®cs while improving assessment of concept ¯uency, applied skills, and experiential knowledge [Campion et al., 1988; Harris, 1989; Hedge and Teachout, 1992; Ford and Fisher, 1994; Thissen et al., 1994]. These meet the ultimate objectives of individual knowledge assessment, yet do so without many of the concerns related to structured examining. Computers are increasingly useful in generating, administering, and the analyzing tests, and the anxiety speci®c to computer testing seems to be a manageable phenomenon [Legg and Buhr, 1992; Powell, 1994]. A computerized item bank can facilitate and systematize development of examinations where topic-based criteria are preferred, and computerized scoring methods are easy to develop and apply. Networks of item sub-pools can generate an endless number of examinations and then invoke topical, weighted and composite scoring methods by using pre-speci®ed criteria. Computer programs track patterns in response choices and alterations, provide item- and response-speci®c data for analysis of reliability and validity or help investigate the effects of examinee traits on outcomes. Computerized adaptive tests (CAT) perform all of these functions as well as self-adjust to examinee performance, which allows programs to assess knowledge with fewer and more ef®cient items [Wainer, 1990a]. Computers are particularly useful to training because their versatility and multi-media features can allow for a more authentic representation of concepts and scenarios. Testing as an embedded function of such training clearly represents a greater consistency between learning and assessment, addressing concerns regarding the arti®cial fragmentation of concepts testing can impose. Such advances are constantly evolving and improving, and signify that many drawbacks previously associated with training and testing can be addressed through technologybased data collection, storage, and analysis. For those so inclined, sophisticated programs and test models are also available that draw more fully upon the

Testing as a Measure of Worker Training

psychometric and cognitive aspects of testing. However, programs that apply such methods need not necessarily conclude that their training is effective, particularly if they ignore the more blatant indicators of effectiveness or if efforts go toward the intricacies and subtleties of diagnostic testing at the expense of assessing knowledge of core content. Further, while instruction which helps trainees apply cognitive skills to health and safety is useful, baseline assessment and development of such skills is often beyond the scope of training, and undue attention to them may dilute efforts in more fundamental or applied principles. Cognition and psychometrics have also been investigated primarily with computerized testing and/or within the context of education, which context uses real world representation, accommodates diverse learning styles, is problem/resolution centered, and instructs toward mastery with gradual and integrated additions [Frederiksen, 1990]. Some programs, such as those conducted in-house, can integrate instruction with job tasks and can provide ongoing feedback over days, weeks or longer. However, most trainees will attend open enrollment courses at unfamiliar facilities that have ®xed schedules and limited resources. It is thus advisable that programs be optimistic in what they hope to achieve in training, and modest in what they hope to demonstrate through assessment.

SUMMARY AND RECOMMENDATIONS Testing as an assessment measure has a place in worker health and safety training. As a rule, while training speci®cs vary widely across programs, mandated training has speci®c and non subjective instructional requirements which can be directly assessed through testing. Because complete subject mastery is usually outside the scope of worker health and safety training, it is advisable to seek agreement on an acceptable rather than maximum pro®ciency goal, the latter of which would be prohibitive to trainees and programs, but not more appropriate or demonstrable. Since effective training is de®ned differently by each program and assessment methods and materials vary greatly, training quality is dif®cult to measure across programs. Still, prudent application of basic principles in training and testing, as well as explorative data analysis and collaboration across programs, can help reduce some common pitfalls of testing. Use of creative alternatives to traditional written examinations can also alleviate technical concerns regarding the relevance of such exams to the work experience. In any event, programs would bene®t from systematic methods to verify that learning objectives are met. To that end, they should de®ne a reasonably rigorous standard of expectation, re®ne their training and testing accordingly and, ®nally, consider test data to be as much a measure of their own performance as it is of the trainees. It is also advisable that programs be as thorough as possible

227

in training, as speci®c as possible in giving feedback to trainees regarding knowledge gaps and gains, and as forthcoming as possible regarding constraints of both training and assessment. Through developing examination items and analyses that assure topic diversity, require a reasonable level of cognitive skills, interface with other programs and make use of technology, examinations will become more versatile in assessing the nature and extent of change among trainees. Data will also have broader referential value from which conclusions can be drawn across the ®eld. Still, many related issues also remain largely unexplored. For example, given some methodological realities of testing and practical realities of training, should testing become more, or less, sophisticated? Of what relevance to training are complicated models for item or examination analysis? How might that relevance be in¯uenced by foreseeable progress in computerized or virtual training? What can be learned from the models or data sets of other large-scale training or education efforts, either in terms of the factors that in¯uence learning and training at the time of instruction or those that in¯uence impact at the workplace? How can test procedures be improved to meet the learning and assessment objectives of a speci®c program genre? What creative alternatives have yet to be developed, or re®ned, for trainees with special needs or concerns? What alternatives to traditional written tests combine innovation with rigor, and meet the responsibility to assess knowledge among individual workers? What place does technology in general have in testing or training, and at what cost/bene®t in terms of resources and effectiveness? In what way can, or should, test outcomes help shape training policies nationally? Moving past the misconceptions regarding testing, and more fully addressing the legitimate concerns and creative potential, will help improve a practice which is unlikely to cease and which can be of potentially great bene®t to trainees, programs and policy efforts.

ACKNOWLEDGMENTS Contents of this manuscript are solely the responsibility of the author and do not necessarily represent the of®cial views of the NIEHS and EPA/DOE.

REFERENCES Angoff WH. 1984. Scales, norms and equivalent scores. Princeton, New Jersey: Educational Testing Service. Campion MA, Pursell ED, Brown B. 1988. Structured interviewing: raising the psychometric properties of the employment interview. Personnel Psychol 41:25±42. Ford JK, Fisher S. 1994. The transfer of safety training in work organizations: a systems perspective to continuous learning. In: Colligan M, editor. Occupational medicine: state of the art reviews. Philadelphia: Hanley and Belfus, 9(2):241±259.

228

Weidner

Frederiksen N. 1990a. Introduction. In: Frederiksen N, Glaser R, Lesgold A, Shafto MG, editors. Diagnostic monitoring of skill and knowledge acquisition. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Frederiksen N. 1990b. Intelligent tutors as intelligent testers. In: Frederiksen N, Glaser R, Lesgold A, Shafto MG, editors. Diagnostic monitoring of skill and knowledge acquisition. Hillsdale, New Jersey: Lawrence Erlbaum Associates, p 1±24. Frick TW. 1992. Computerized adaptive master tests as expert systems. J Edu Comput Research 8(2):187±213. George Meany Center for Labor Studies, National Institute of Environmental Health Sciences Worker Training Program and Awardees, NIEHS Clearinghouse for Worker Safety and Health Training. 1997. Resource Guide for Evaluating Worker Training. Gotsch AR, Weidner BL. 1994. Strategies for evaluating the effectiveness of training programs. In: Colligan M, editor. Occup Med: State Art Rev 9(2):171±188. Green KL. 1988. Issues of control and responsibility in workers. Health Edu Quart 15:473±486. Harris MH. 1989. Reconsidering the employment interview: A review of recent literature and suggestions for future research. Personnel Psychol 42:691±726. Hedge JW, Teachout MS. 1992. An interview approach to work sample criterion measurements. J Appl Psychol: Am Psychol Assoc 77(4): 453±461. Hopkins BL, Conrad RJ, Smith MJ. 1986. Effective and reliable behavioral control technology. J Am Ind Hyg Assoc 47:785±791. Komaki J. 1986. Promoting job safety and accident prevention. In: Cataldo MF, Coates TJ, editors. Health and industry: A behavioral medicine perspective. New York: John Wiley & Sons, p 301±320. Komaki J, Barwick KD, Scott LR. 1978. A behavioral approach to occupational safety: Pinpointing and reinforcing safe performance in a food manufacturing plant. J Appl Psychol 63:434±445. LaMontagne AD, Kelsey KT, Ryan CM, Christiani DC. 1992. A participatory workplace health and safety training program for ethylene oxide. Am J Ind Med 22:651±664. Legg SM, Buhr DC. 1992. Computerized adaptive testing with different groups. Edu measurement: issues and practice, Summer. Lunz ME, Bergstrom B. 1995. Computerized adaptive testing: Tracking candidate response patterns. J Edu Comput Research 13(2): 151±162.

Mergler D. 1987. Worker participation in occupational health research: Theory and practice. Int J Health Serv 17:151±167. Messick S. 1987. Large-scale educational assessment as policy research: Aspirations and limitations. Eur J Psychol Edu 2(2): 157±165. Moran JB, Dobbin D. 1991. Quality assurance for worker health and safety training programs: Hazardous waste operations and emergency response. Appl Occup Environ Hyg 6(2):107±113. Morgan WP. 1983. Psychological problems associated with the wearing of industrial respirators: A review. J Am Ind Hyg Assoc 44:671±676. Noe RA. 1986. Trainees' attributes and attitudes: Neglected in¯uences on training effectiveness. Acad Manage Rev 11:736±749. Occupational Safety and Health Administration. US Department of Labor, 1989. 29 CFR 1910.120, Hazardous Waste Operations and Emergency Response; Final Rule, Federal Register 54:9294±9336, March 6. Pidgeon, NF. 1991. Safety culture and risk management in organizations. J Cross-Cultural Psychol 22:129±140. Powell Z-HE. 1994. The psychological impacts of computerized adaptive testing methods. Edu Technol Oct 41±47. Thissen D, Wainer H, Wang Xiang-Bo. 1994. Are tests comprising both multiple choice and free response items necessarily less unidimensional than multiple choice tests? An analysis of two tests. Journal of Educational Measurement. Vol. 31, No. 2, p 113±123. US Congress: Public Law 99-499, Section 126(g)(2), Superfund Amendments and Reauthorization Act of 1986. Washington, DC, US Government Printing Of®ce, October. Verma DK, Shannon HS, Muir DCF, Nieboer E, Haines AT. 1988. Multi-disciplinary, problem-based, self-directed training in occupational health. Occup Med 38:101±104. Wainer H. 1990a. Introduction and history. In: Wainer H, editor. Computerized adaptive testing: A primer. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Wallerstein N, Weinger M. 1992. Health and safety education for worker empowerment. Am J Ind Med 22:619±635. Weidner BL, Gotsch AG, Delnevo CD, Newman JB, McDonald W. 1998. Worker health and safety training: Assessing impact among responders. Am J Ind Med 33:241±246.