Assessment of pilot performance and mental ...

2 downloads 0 Views 1MB Size Report
Jul 6, 2010 - ERIK J. SIREVAAG , ARTHUR F. KRAMER , CHRISTOPHER D. WICKENS MARK ..... gation of mental activities (Ponder and Kennedy 1927).
Ergonomics

ISSN: 0014-0139 (Print) 1366-5847 (Online) Journal homepage: http://www.tandfonline.com/loi/terg20

Assessment of pilot performance and mental workload in rotary wing aircraft ERIK J. SIREVAAG , ARTHUR F. KRAMER , CHRISTOPHER D. WICKENS MARK REISWEBER , DAVID L. STRAYER & JAMES F. GRENELL To cite this article: ERIK J. SIREVAAG , ARTHUR F. KRAMER , CHRISTOPHER D. WICKENS MARK REISWEBER , DAVID L. STRAYER & JAMES F. GRENELL (1993) Assessment of pilot performance and mental workload in rotary wing aircraft, Ergonomics, 36:9, 1121-1140, DOI: 10.1080/00140139308967983 To link to this article: http://dx.doi.org/10.1080/00140139308967983

Published online: 06 Jul 2010.

Submit your article to this journal

Article views: 116

View related articles

Citing articles: 25 View citing articles

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=terg20 Download by: [University of Leeds]

Date: 16 October 2015, At: 03:58

ERGONOMICS,

1993, VOL. 36, No.9, 1121-1140

Assessment of pilot performance and mental workload in rotary wing aircraft ERIK

J.

SIREVAAG, ARTIlUR

MARK REISWEBER. DAVID

F. KRAMER, CHRISTOPHER D. WICKENS L. STRAYER and JAMES F. GRENELL*

Downloaded by [University of Leeds] at 03:58 16 October 2015

Aviation Research Laboratory, Institute of Aviation, University of Illinois, at Urbana, n, 61801, USA • Boeing Helicopters, Philadelphia, PA Keywords: Mental workload; Psychophysiology; Event-related brain potentials (ERPs); Heart rate variability (HRV); Helicopter flight.

This research examined the processing demands imposed upon experienced pilots by two different communication formats, digital and verbal, in a high fidelity simulation of an advanced multi-function helicopter. The mental workload imposed by the type and magnitude of communications was assessed by a battery of subjective, performance, secondary, and physiological measures. The performance data indicated that the pilots had difficulty adhering to the Nap of the Earth altitude criterion with high communication demands, particularly with the digital communication system. This was presumably due to the requirement to spend more time scanning the multi-function displays with the digital than with the verbal communication system. On the other hand, the pilots were less prone to task shedding when they used the digital communication system possibly due to the provision of a permanent Jist of queries that was unavailable with the verbal system. Measures of heart rate variability -and blink rate were larger with the verbal than with the digital system, presumably reflecting increased respiratory demands in the verbal condition as well as increased visual processing demands with the digital format. Finally, the probe evoked P300 component decreased in amplitude as a function of increases in the magnitude of communications. The results are discussed in terms of the structural and capacity demands of the communications systems that were proposed for the advanced multi-function helicopter.

1. Introduction In the last decade researchers and practitioners have increasingly acknowledged that no single metric can provide a valid and reliable measure of all of the important components of the mental workload experienced by operators of complex systems (Kramer 1991, Lysaght et al. 1989, O'Donnell and Eggemeier 1986, Wickens and Kramer 1985), In fact, unlike the modal view in the 19605 and early 1970s that mental workload was a unitary construct (Kahneman 1973, Moray 1967) most modem day models suggest that several different dimensions of mental workload should be considered in the evaluation of system effectiveness (Kinsboume and Hicks 1978, North 1985, Laughery et al. 1986, McCracken and Aldrich 1984, Navon and Gopher 1979, Polson and Freidman 1988. Sanders 1979, Wickens 1984). Within these models, mental workload can be described as the cost of performing one task in terms of a reduction in the capacity to perform additional tasks, given that the two tasks overlap in their resource demands. In addition to accounting for processing costs that can be attributed to the demand for similar resources or structures these models also suggest OOI4-()I39J93 SIO-OO C 1993Taylor & Francis Ltd.

Downloaded by [University of Leeds] at 03:58 16 October 2015

1122

E. J. Sirevaag et al.

that the magnitude of the demands is an important determinant of the mental workload experienced by the operator. As an example of one such conception of mental workload consider Wickens's (1980, 1984) multiple resource model. In this model. information processing activities are divided into three dichotomous dimensions with each level of a dimension representing a separate resource. These dimensions include: stages of processing (perceptual/central and response), codes of processing (verbal and spatial), and modalities of input and output (input: visual and auditory, output: speech and manual). Within the multiple resource model, workload, increases to the extent that two tasks require the same type of processing resources. Thus, a task such as navigation through unknown terrain should interfere with a vehicle control task to a greater extent than a verbal communication task due to the greater degree of resource overlap in the navigation and control tasks. The measurement techniques employed in the assessment of mental workload have kept pace with the theoretical developments in the field of multi-task processing. Thus, while the initial goal in the workload assessment field was the discovery of the 'best' measure of capacity allocation (Knowles 1963), more recent workload measurement reviews and taxonomies have emphasized the importance of designing a battery of measures that would tap different dimensions of mental workload (Gopher and Donchin 1986, Leplat 1978, Moray 1989, 0' Donnell and Eggemeier 1986, Ogden et at. 1979, Wickens 1979). In the present study we have applied such a battery of measures to the assessment of the mental workload imposed upon high time helicopter pilots by different types of communication devices. More specifically, we examined the relative costs of digital and verbal communication technologies as experienced pilots flew a series of realistic reconnaissance missions in a motion-based helicopter simulator. The study had two distinct but related goals. The first was to evaluate the efficacy of different measures in a relatively 'noisy' and realistic flight scenario. The second goal was to determine, using our workload battery, the relative advantages and disadvantages of digital and verbal communication formats during low-level, high speed helicopter flight. The results of this assessment were then incorporated into the design of communication devices for a new generation helicopter. Four different categories of workload measurement techniques were included in our assessment battery: primary task measures, secondary task measures, subjective ratings of workload and effort, and physiological measures of operator state. Each of these categories of measurement techniques has advantages as well as limitations. Primary task measures are ideal in that they provide an indication of both operator and system performance. However, primary task or performance measures do not necessarily provide an indication of the spare capacity possessed by operators. For example, while two operators may exhibit equivalent performance, one operator may be incapable of performing additional tasks while the other operator may possess sufficient resources to perform additional tasks. Another disadvantage of performance measures is that they are not easily transferred from one task scenario to another. Thus, while root mean square tracking error may prove to be a reliable metric of performance in a vehicular control task, response time and accuracy may be the measures of choice for a communication task. In the present study we employed measures of aircraft control (e.g., number of object and ground collisions, aircraft attitude, and speed measures) and communication efficiency (response speed and accuracy) in an effort to quantify important aspects of operator performance.

Downloaded by [University of Leeds] at 03:58 16 October 2015

Pilot performance and mental workload

1123

Secondary task measures have the advantage of providing a measure of 'residual resources' that is not easily obtained from primary task or performance measures. Thus, changes in the performance of a secondary task are assumed to reflect changes in the processing demands of the primary task. Given that these measures are not part of the primary tasks they are easily transferred among different task scenarios. However, given that secondary tasks occasionally intrude into the performance of the primary task they are often unacceptable from a safety standpoint. In an effort to resolve this problem with intrusiveness we have employed measures of task-shedding in our study. Thus, rather than impose additional demands on the operator by the requirement to perform an extraneous task, we evaluated the degree to which the pilots delayed or failed to perform low priority task components during different segments of their flight mission. Subjective measures have the longest history of any of the classes of workload measures, possibly due to their high face validity. In fact, several researchers have suggested that subjective methods provide the most appropriate measure of mental workload because they tap the operators' experience of cognitive effort (Johannsen et al. 1979, Sheridan 1980). Other advantages include ease of implementation, low cost, lack of intrusion into the tasks of interest, and high operator acceptability. As with other workload assessment techniques. subjective measures also have a number of limitations. For example, while subjective measures usually correspond to measures of operator performance, there have been a number of reports of dissociations between these two classes of measures (Eggemeier et at. 1984, Vidulich and Wickens 1986, Yeh and Wickens 1988). Subjective measures appear to be more sensitive to global than specific processing demands, are more sensitive to intermediate than high levels of workload, and are most reliable when collected soon after task performance. With these caveats in mind we employed a well validated subjective workload assessment technique, the NASA TLX scale, to assess the demands imposed upon the pilots by different flight segments and communication formats (Hart and Staveland 1988). Physiological methods comprise the last class of workload assessment measures that we incorporated into our assessment battery. In the last decade a large variety of physiological measures have been used to assess operator state and mental workload (see Kramer 1991 for an in-depth review of this class of measures). Physiological measures possess several advantages relative to the other classes of workload assessment techniques that we have described. For instance, physiological measures do not require overt responses, a clear plus when the task of interest is primarily cognitive in nature. Second, physiological measures can be recorded throughout the performance of a task thereby providing a continuous record of operator state. Finally, physiological measures are inherently multidimensional and therefore can be expected to provide a number of views of operator mental workload. Of course, there are also a number of limitations associated with this class of measurement techniques, most notably the cost, expertise required for interpretation, and difficulty in excluding artefacts. However, in recent years many of these technical limitations have been successfully dealt with in both simulated and operational contexts (see Kramer 1991, Wilson and Eggemeier 1991). The physiological' measures that we have incorporated into our assessment battery include event-related brain potentials (ERP), blink measures, measures of heart rate and heart rate variability, and measures of respiration. A particular component of the ERP, the P300, has been shown to vary in amplitude with the perceptual/central processing demands of a task (Kramer et al. 1983, 1985, 1987, Sirevaag et at. 1989).

Downloaded by [University of Leeds] at 03:58 16 October 2015

1124

E. J. Sirevaag et al.

Thus, in a multi-task paradigm, P300s elicited by events in the primary task increase in amplitude with increases in the demands of the task, while P300s elicited by secondary task events decline in amplitude with increases primary task difficulty. The resource-reciprocity demonstrated by the P300 measure is consistent with resource tradeoffs predicted by multi-task models of information processing (Navon and Gopher 1979, Wickens 1980. 1984). In the present study we employed an irrelevant probe technique to elicit P300s (Papanicolaou 1984). Pilots performed their flight tasks and every so often heard either a high or low pitched tone over their headphones. The pilots were instructed that they did not need to respond to the tone. The logic of this technique is that the P300s elicited by the irrelevant tones will vary with the processing demands of the primary task, in this case the flight mission. much in the same way that a secondary task would be expected to reflect the processing demands of a primary task. However, the advantage of this technique over a secondary task methodology is that subjects are not required to respond to the probes. Measures of heart rate and heart rate variability have also been successfully used to evaluate workload in laboratory and operational environments. It is interesting to note that while heart rate seems to reflect global aspects of mental workload (Harris et al. 1989, Lindholm and Cheatum 1983, Roscoe 1984). measures of heart rate variability, particularly those measures of variability obtained in the frequency domain, appear to be sensitive to different aspects of mental workload and operator effort (Aasman et al. 1987, 1988, Mulder 1979, Van Dellen et al. 1985, Vicente et at. 1987). Three major frequency bands have been investigated. The lowest, which ranges from 0·02 to 0-06 Hz, is associated with vasomotor activity involved in the regulation of body temperature. The intermediate band, which includes frequencies from 0-07 to 0·14 Hz. is related to mechanisms involved in short-term regulation of arterial pressure. Finally, the highest band, which ranges from 0·15 to 0·50 Hz, mainly reflects the effects of respiratory activity on heart rate variability. Although each of these bands has been investigated with respect to mental workload, the intermediate band appears to be the most promising. The centre point of this band, referred to as the 0·10 Hz component, has been found to decrease with increases in the amount of effort invested in a task (Mulder 1979). More interestingly, however, this component appears to be sensitive to resource-limited but not datalimited processes (Van Dellen et al. 1985). Aasman et al. (1987) found that while changes in the amount of visual noise in a task influenced reaction time, the 0·10 Hz component was insensitive to this manipulation. On the other hand. the 0·10 Hz component reliably reflected changes in the memory load in the task. Measures of blink activity have been employed for over sixty years in the investigation of mental activities (Ponder and Kennedy 1927). The rate of blinking has, in general, been found to decrease with increases in the difficulty of processing in visual tasks (Sirevaag et al. 1988, Stem and Skelly 1984) and when operators transition from auditory to visual tasks (Goldstein et al. 1985). Measures of blink duration have also be shown to be sensitive to changes in visual workload. Closure duration decreased when co-pilots took over control of the aircraft from pilots (Stem and Skelly 1984), decreased in actual versus simulated flight (Wilson et al. 1987), decreased in multitask situations relative to single task environments (Sirevaag et al. 1988), and increased with time in task (Bauer et al. 1985, Oster and Stem 1980), presumably due to increases in operator fatigue. In the present study measures of blink rate and blink

Downloaded by [University of Leeds] at 03:58 16 October 2015

Pilot performance and mental workload

1125

duration were used to assess changes in operator workload as a function of communication format and mission segment. In the present study we employed our multi-measure workload assessment battery in an effort to decompose the magnitude and type of processing demands imposed upon six high time military pilots by a high fidelity simulation of a new multi-function helicopter. More specifically, we were interested in evaluating the effects of communication format and magnitude of communications on the pilots workload and performance. The pilots were briefed that they were members of an Air Cavalry Troop (ACT) that was assigned to perform a reconnaissance mission. To accomplish the missions it was necessary for the pilots to co-ordinate the .reconnaissance with a wingman and report sightings of threats and their locations to both the wingman and the Squadron Tactical Operation Center (TOC). The pilots were also required to report time of contact over each of their predesignated waypoints. Each of the pilots performed four missions, two of the missions with a digital communication system and two with a verbal communication system. Two of the missions required a modest amount of communication while in the other missions communication requirements were quite heavy.

2. Methods 2.1. Subjects Six US Army helicopter pilots participated in this study. Each of the pilots flew a practice mission prior to the experimental sessions. One of the pilots developed severe simulator sickness and was replaced by a Boeing test pilot. The pilots' previous helicopter flight experience ranged from 1900 to 5000 h. 2.2. Apparatus All data were collected at the Boeing Helicopter facilities. A motion-based flight simulator was positioned in a vision dome. High fidelity graphics were displayed on the dome during all experimental conditions. All pilots wore a helmet-mounted-display (HD) which provided them access to various sources of information (speed, altitude, etc.) without having to look down at their instrument panel. During each mission, auditory background chatter recorded during an army training mission was presented to the pilots through headphones. At preplanned points during the missions, experimenters playing the roles of ground control units and wingmen requested pilots to provide information such as current position and fuel status. These queries were mixed into the background chatter that was presented over the headphones. Auditory probes generated by an 80386 microprocessor-based personal computer were presented in a similar fashion. The simulator stored information concerning various attributes of flight performance on magnetic tapes. These tapes were re-sampled off-line in order to extract information concerning aircraft attitude and control. Other performance measures were obtained through an analysis of videotapes recorded inside the cockpit during each mission. These tapes displayed four panels which appeared on the video screen simultaneously. One panel presented a view of the pilot's head and face. Another panel displayed a heads-up, outside the cockpit view; while the other two panels contained the Multi-Function Displays (MFDs) which provided the pilot with menu-driven access to a variety of information sources and control functions (such as the tuning of radio frequencies, display of contour maps, incoming and outgoing digital message buffers, etc.).

Downloaded by [University of Leeds] at 03:58 16 October 2015

1126

E. J. Sirevaag et aI.

The motion capabilities of the simulator required the development of a radio telemetry system for the acquisition and recording of all physiological signals. Beckman silver/silver chloride electrodes were used to acquire vertical and horizontal electrooculographic (EOG) data; electroencephalographic (EEG) activity from frontal, . central and parietal electrode sites; and electrocardiographic data (ECG) data. Two channels of respiratory activity were also recorded. The first via a thermistor attached to the microphone on the pilot's headset, the second by means of a strain gauge positioned around the pilot's chest. The signals acquired by these transducers were input to a transmitter developed at the University of Illinois which was placed in the simulator cab directly behind the pilot. Each signal was then transmitted via a separate radio frequency to a receiver connected to an 80386 microprocessor-based personal computer which continuously digitized the signals at the rate of 200 Hz and stored the data to disk. The signals were amplified with a lOs time constant and an upper half amplitude of 100 Hz, 3 dB/octave roll-off.

2.3. Experimental task This experiment was designed to assess the impact of varying both the type (digital vs. verbal) as well as the frequency (low vs. high load) of communications required of pilots performing a reconnaissance task. All of the pilots flew both a digital and a verbal mission. Furthermore, each mission contained both a low and a high communication load segment. The order of the verbal and digital segments was counterbalanced across subjects. Pilots were briefed that they were a member of an Air Cavalry Troop (ACT) assigned to conduct a zoned reconnaissance mission. To accomplish this mission, the ACT was divided into Scout Weapons Teams (SWTs), each consisting of two aircraft. Each SWT was assigned a specific area which was divided into two sectors, one sector to be the responsibility of the pilot, the other to be reconned by his wingman. The wingman operated at a distance beyond the pilot's visual range and reported independently to complete his segment of the mission. Hence, adherence to preplanned control measures (waypoints) and communications procedures was stressed during the briefing. To avoid detection by enemy radar installations, the pilots were to obey Nap of the Earth (NOE) flight criteria (meaning that at no time was their altitude to exceed 30 ft). During each mission, the pilots were to locate and report on known threats and possible enemy elements at specific co-ordinates as well as their arrival at specific locations (waypoints). The positions of threats and waypoints were indicated on paper maps provided during their briefings as well as on the moving map located in the crew station. If any threats were cited in the vicinity of a waypoint, flight paths minimizing exposure to those threats were to be chosen. These reports represented the bulk of the communications load in each scenario. The following reporting instructions were given to the pilots prior to the digitaJ and verbal missions: I. report all threat sightings to the next higher command and your wingman, utilizing the spot reporting capabilities of the aircraft; 2. report any obstructions of possible avenues of approach (roads and bridges) to the next higher command, in this case to the Squadron Tactical Operating Center (SQDN TOC); . 3. report arrivals at designated waypoints and position reports to the SQDN TOC and wingman.

Downloaded by [University of Leeds] at 03:58 16 October 2015

Pilot performance and mental workload

1127

2.4. Performance measure analyses Performance measures related to aircraft attitude (roll, pitch, sideslip, speed, altitude and position) and control (collective, stick and pedal movements) were abstracted from magnetic tapes containing data output directly by the simulator. Other performance measures were derived from the videotapes recorded in the cockpit. The analysis of the videotape tape data was conducted on a minute-by-minute basis. Three separate passes were made through each videotape. On the first viewing the frequency and timing of eye and head movements were recorded. The second and third viewings were devoted to recording other mission-related activities. A detailed description of the analysis categories employed is provided below. Pilots could transmit messages either verbally over their radio, or digitally via keypad. Outgoing digital and verbal messages were further subdivided in terms of the kind of information transmitted. These sub-categories included spot reports to headquarters of 'the size, location, disposition, orientation, and intentions of enemy or unknown forces; position reports indicating the location of the sender; situation reports concerning the disposition of the sender; authentication procedures prior to the acknowledgement or continuance of a series of communications between two or more forces; the act of physically changing the electronic frequencies of certain radios to commence communications with other elements; and waypoint passages. Any format change of the menu-driven MFDs was also coded. Running sums of the total number of queries from the ground controllers for information from the pilots were obtained for each minute of the mission. The total number of responses to such queries was also obtained. In addition, timely responses to queries produced within lOs following a request for information were scored as normal responses; while responses occurring 10 to 60 s following receipt of the query were categorized as long responses. Information concerning visual scan patterns and strategies was obtained by recording, on a minute-by-minute basis, the duration and frequency of head movements devoted to scanning cockpit instrumentation and MFDs. Measures of pilot performance breakdowns were also obtained from the video tapes. These measures included occurrences of ground contact and tree strikes as well as an indication of when the pilots did not perform a required task. This latter category, termed task-shedding, included the failure to provide a timely spot report, situation report, response to a query with a latency less than 60 s or report of waypoint passage. Finally, responses to irrelevant background chatter were also coded. 2.5. Subjective ratings The NASA TLX scales (Hart and Staveland 1988) were used to collect subjective ratings from the pilots. Ratings were collected following the low and high communication load segments of both the digital and verbal missions while the pilot was still in the simulator. The pilots rated the mission segments on the following hi-polar scales: mental demand, physical demand, temporal demand, effort, performance, and frustration level. The ratings on each of the scales for each mission and mission segment were further aggregated into a global measure of subjective workload. 2.6. Psychophysiological analyses Most of the psychophysiological measures employed in this study required an analysis period in which the task demands remained constant for at least 5 min. Relatively artefact free 5 min intervals during which pilots were performing the tasks as instructed

1128

E. J. Sirevaag et al.

Downloaded by [University of Leeds] at 03:58 16 October 2015

were selected during the quantification of the video tape data. AU of the psychophysiological analyses were performed on the same 5 min samples.

2.6.1. Electrocardiographic (EeG) analyses: ECG activity was recorded from electrodes placed just below and to either side of the heart. Occurrences of R-waves (the major positive deflection the ECG signal) were calculated off-Jine. A software Schmitt-trigger level was set separately for each subject at a value which was intermediate between the average peak ampJitudes of the Rand Twaves (the Tsweve, though considerably smaJler than the R-wave, is the next largest positive deflection in the ECG signal). The latency of the R-wave was determined to be the midpoint between the time when the positive and negative slopes of the R-wave crossed the Schmitt trigger level. All R-wave detections were inspected visually and incorrect detections and omissions were corrected manually. Data samples consisting of the time between consecutive R-waves were interpolated in order to create equidistant time series in the manner prescribed by Mulder (1980). Following the removal of DC artefact the data were smoothed using a modified cosine function, zero-padded, and entered into a program for Fast Fourier Transform analysis. This procedure estimated the power in spectral bands 0·01 Hz wide ranging from 0 to 0·50 Hz. Aasman et al. (1987) have identified three major frequency bands within the spectrum of cardiac variability. The first band (low-frequencies from 0·02 to 0·06 Hz) is related to vasomotor activity involved in the regulation of body temperature. Energy in the second, mid-frequency band (0·07 to 0·14 Hz) is related to the short-term regulation of arterial blood pressure. Finally, the high-frequency band (0·15 to 0·50 Hz) reflects the influence of respiration upon the distribution of cardiac inter-beat intervals. Separate estimates of spectral energy were computed for the temperature, arterial blood pressure and respiration bands. The absolute power in these three bands was submitted to a log transform in order to diminish the impact of individual differences present in the absolute power scores. 2.6.2. Respiratory activity: Two different techniques for measuring respiratory activity were "employed. The first involved attaching a thermistor sensitive to airflow to the microphone on the pilot's headset. When properly positioned the thermistor transduced respiratory activity through both the nose and the mouth. The second method employed a strain gauge which encircled the pilot's chest. This technique gauged expansion and contraction of the diaphragm as a function of inhalation and expiration. The data collected via the thermistor were subsequently found to provide the more stable estimate of respiratory activity because the strain gauge was extremely sensitive to movement artefacts. The respiratory data were submitted to the same algorithms which computed power in the cardiac inter-beat interval samples. 2.6.3. Electrooculographic (EOG) analyses: Vertical EOG activity was recorded by two electrodes placed above and below the right eye. Blinks were identified off-line by first filtering the data (- 3 dB at 6·27 Hz; 0 DB at14·29 Hz) and then identifying voltage deflections which met specified criteria of polarity, amplitude. duration and velocity. The specific criteria used were tailored to conform to the morphology of the blinks produced by each individual subject. All blinks were inspected visually and incorrect detections and omissions were corrected manually.

1129

Pilot performance and mental workload Table 1.

Performance measures (frequency of events) which corroborate the manipulations of communication load and verbal/digital communication formats.

Measure

Verbal communication

Digital communication

Low load

High load

Low load

High load

10·5 10·0 0·7 2·5

23·3 21·3 0·2 1·3

6·3 5·0 5·8 29·0

13·2 11-8 14·5 58-5

Downloaded by [University of Leeds] at 03:58 16 October 2015

Verbal messages received Verbal messages transmitted Digital messages received MFD reformats

Following identification, the amplitude of the blink was scored as the voltage difference between the voltage at the time point identified as blink onset and the peak amplitude of the blink. Closure duration was calculated as the difference in time between onset and the first point following the peak amplitude that the voltage reached a level less than or equal to 20 per cent of the blink's amplitude. 2.6.4. Event-Related Potential (ERP) Analyses: Electroencephalographic activity was recorded from electrodes located according to the International 10-20 System at Fz, Cz and Pz referenced to linked mastoids (Jasper 1958). ERP epochs beginning lOOmsec prior to the presentation of an irrelevant auditory probe and continuing for 1000 ms were extracted off-line. The epochs were digitally filtered (- 3 dB at 6·27·Hz; OdB at 14-29 Hz) prior to further analysis. The latency the P300 component was calculated as the time point associated with the maximum positive voltage in a time window extending from 300 to 800 ms post-stimulus. P300 amplitude was determined by computing the mean amplitude of the voltages in a 50 ms window centered around the peak. Because the helmet-mounted displays worn by the pilots introduced artefacts at the Fz and Pz electrode sites during head movements, subsequent analyses were restricted to the Cz placement which was relatively artefact free. P300 amplitude and latencies were estimated for each single trial in all conditions. In order to correct for large individual differences in P300 amplitude, the average absolute amplitudes for each subject were range corrected prior to further analysis. This correction was accomplished by computing the minimum and maximum average amplitudes. Then for each condition the minimum amplitude was subtracted from the average amplitude and the result was divided by the range of amplitudes. As a result, the condition associated with the smallest P300 for a given subject was transformed to a score of 0 and the condition associated with the largest P300 amplitude received a value of I.

of

3.

Results

3.1. Performance measures Table I illustrates the efficacy of the manipulations of communications modality and load. The number of verbal requests for information was higher during the verbal mission than during the digital mission (F[I, 5] = 59·65; P < 0·01) and the average number of verbal responses to such queries was also greater during the verbal mission (F[l, 5] = 59·65; p < 0·01). Furthermore. more verbal requests for information were received during the high load segments of both missions than during the low load

E. J. Sirevaag et al.

) 130

Table 2.

Performance measures concerning aircraft control. All measures except time above NOE altitude and Heads down time are in simulator units.

Downloaded by [University of Leeds] at 03:58 16 October 2015

Measure Pitch variability Roll variability Sideslip variability Mean velocity of collective movements Mean velocity of cyclic movements Mean rotor torque Mean speed Speed variability Time above NOE altitude per min (5) Heads down time per min (s)

Verbal communication

Digital communication

Low load

Low load

High load

High load

3·5 4·2 0·3

3·1 3·5 0·2

3·5 3·6 0·3

3·8

0-02 0·10 155·8 22·2 10·9

0·02 0·08 157·3 20·2 9·8

0·02 0·07 159·5 19·8 10·6

0·02 0·07 161·2 19·3 10·7

19·1 12·9

J9·8 19·5

21·4 19·1

29·5 19·6

3·3 0·3

segments (F[ I, 5] = I 12·29; p < 0·01), with a corresponding pattern of results holding true for the number of responses to such queries (F[ I. 5] = 132-31; P < 0-01). Finally, the load effect was larger during verbal missions producing a significant modality by load interaction for the verbal request for information measure (F(I,5]:::: 7·71; p < 0·05) and a marginal interaction for the response measure (F[ I, 5] = 4·18; p = 0·09). Conversely, the number of digital messages received was greater during the digital mission than during the verbal mission (F[ 1,5] = 651-86; P < 0-01). The requirement to respond digitally to these information requests entailed reformatting of the multifunction displays in order to obtain and then send the information requested. Thus, the number of display format changes is proportional to the frequency of digital information transmissions made by the pilots. As expected, display reformatting activity was considerably higher during the digital than during the verbal missions (F[I,5J = 164·41; p < 0·01). The number of digital messages received was also sensitive to the manipulation of load. Thus, during the low load segments subjects received an average of 3·25 digital messages while during the high load segments they received 7-33 such requests (F[I.5] = 74-56; P < 0·01)_ Finally, the data concerning the number of display reformats also indicates that the low and high load segments placed different demands upon the pilots (F[I, 5] = 35·35; p < 0·01). The effects of the load manipulation were largest in the digital mission for both the number of digital messages received (F( I, 5] = 37-72; P < 0·01) and the number of MFD reformats (F[l,5]:::: 73·60; p < 0·0 I). In summary. these measures of the structural aspects of the mission support our claim that the mission segments were differentiated in terms of the type (digital versus verbal) and amount of communications. Performance measures related to the aircraft's attitude and pilot control movements are presented in table 2. Many of these variables (pitch, roll and sideslip variability; velocity of collective movements; engine torque; speed and speed variability) were not significantly affected by either the digital/verbal or the low/high load task

Pilot performance and mental workload Table 3.

Performance measures adjusted for mission duration. The measures represent the average number of events per minute. Verbal communication

Measure Task-shedding Incorrect responses Ground/tree strikes

Downloaded by [University of Leeds] at 03:58 16 October 2015

1131

Digital communication

Low load

High load

Low load

High load

0·12 0·05 0·05

0·10 0·11 0·03

0·01 0·06 0-00

0·04 0-07 0-01

manipulations (p > 0·10) indicating that during all mission segments pilots maintained comparable levels of control over the aircraft. However. the time spent above the maximum NOE altitude criterion was significantly longer for the digital than the verbal missions (F[ 1.5] = 6·45; p < 0-05) with a marginally significant interaction between modality and communication load (F[ 1.5] = 4·22; p < 0·09)_ This suggests that the pilots were more attenti ve to the NOE flight requirement during the verbal mission, particularly under conditions of low load, than during the digital mission. This conclusion is supported by the fact that faster control stick movements occurred during the low load segment of the verbal mission producing significant modality x load interactions (F[ 1.5] = 9·32; p < 0·05) for the cyclic contro1. Faster control movements are typically associated with NOE flight. Furthermore, subjects also spent more time in a heads down attitude (which is incompatible with NOE flight) during the digital missions than during the verbal missions (F'[ l , 5] = 9·44; p < 0·05) and this effect interacted with communication load (F[I,5] = 12-60; p < 0·01). The average durations of the digital and verbal missions were equivalent. However, because the high load segments covered more distance and included more waypoints, they took longer to fly. For this reason. a number of frequency measures needed to be adjusted in order to take these differing segment durations into account. Table 3 presents the variables which were adjusted for mission duration in order to separate the influences of task duration and communication load. These measures of frequency now represent the average number of observations per minute. The task shedding index is a measure of the extent to which the pilots failed to perform an instructed task. There was an increased likelihood of task shedding during the verbal mission as compared to the digital mission (F[I. 5] = 7·27; p < 0·0·5). Furthermore, during the verbal mission subjects were more likely to make incorrect responses to information queries during the high load flight segments resulting in a significant modality by load interaction (F[ 1,5] = 8·64; p < 0-05)_ The number of ground and tree collisions did not differ as a function of modality or load (p > 0·] 0). In summary, the performance data indicate that the pilots achieved the greatest success at meeting the NOE criterion during the segment of the verbal mission requiring relatively few communications by the pilot. Both segments of the digital mission and the high load segment of the verbal mission required the pilot to spend more time in a heads down attitude than in the verbal low load condition. Had the pilots chosen to hover when they were scanning their instruments NOE performance would no doubt have improved; however, this improvement would have come at the cost of an increase in the duration of the mission.

E. J. Sirevaag et al.

1132

Table 4.

Average subjective ratings for each scale of the NASA TI..X. Verbal communication

Measure

Downloaded by [University of Leeds] at 03:58 16 October 2015

Menta) demand Physical demand Temporal demand Effort Performance Frustration Total

Digital communication

Low load

High load

Low load

High load

19·2 2·7 )4·1 16·0

18·8 3·0 12·4 14·6 9·4 6·6 64·3

17·7 2·8 12·5 IS·) 9·8 7·7 65·2

19·8 2·6 13·6 16·2 9·7 7·3 68·9

9·9 7·6 69·1

While the verbal communication. format was superior with regard to the adherence to NOB flight criteria and the amount of time available for terrain scanning, the digital format resulted in fewer cases of task shedding and fewer incorrect responses to queries than the verbal format. These latter differential effects of communication format can most likely be attributed to the fact that the information was recallable in the digital but not in the verbal format. Thus, the pilots did not need to rely on memory to the same extent with the digital as they did with the verbal communication system. 3.2. Subjective ratings Table 4 summarizes the NASA TLX ratings assigned by the pilots to the different conditions. While pilots tended to view the mental demand scale as tile most important component of workload involved in flying the missions and the physical demand scale as the least important, none of the ratings differed as a function of the digital/verbal or lowlhigh load task manipulations. The failure to find a difference in subjective ratings as a function of mode of communication is consistent with the suggestions by Yeh and Wickens (1988) that subjective measures are not, in general, sensitive to distinctions between different varieties of workload. However, it is surprising that there was no difference between low and high communication load conditions. One possible explanation suggested by the pilots was that although they realized that there was a difference between missions in the amount of communications, they did not believe that the demands exceeded their capacity to perform the missions successfully and therefore rated the missions as equivalent in workload. It is interesting to note, however, that for some flight parameters performance was better in the low than in the high communication load condition. Thus, as has been previously suggested, subjective measures do not always correspond to overt measures of operator performance. In the present case it appears that the pilots' criteria for unacceptable workload was sufficiently high so that none of the conditions were distinguishable, at least in terms of the subjective experience of workload. 3.3. Psychophysiological measures 3.3.1. ECG and respiratory activity: Table 5 summarizes the results obtained from the analysis of the BeG and respiratory data. The mean inter-beat-interval (a reflection of heart rate) did not differ as a function of either the verbal/digital or lowlhigh

1133

Pilot performance and mental workload

Table 5.

Average measures of cardiac and respiratory activity.

Downloaded by [University of Leeds] at 03:58 16 October 2015

Measure Mean IBI (ms) Overall variability ECG power (0·02 to 0·06 Hz) log units ECG power (0·07 to 0-14 Hz) log units ECq power (0· 15 to 0·50 Hz) log units Respiratory power (0·02 to 0-06) log units Respiratory power (0·07 to 0·14) log units Respiratory power (0·15 to 0·50) log units

Verbal communication

Digital communication

Low load

Low load

801 1579

High load 802 1798

810 1549

High load 803 1512

6·4

6·5

6·2

6·2

6·5

6·9

6·4

6·2

5·9

6-2

5·9

5·8

13·4

14·4

n-t

11·6

13·9

14·5

11·2

11·5

13·6

14·2

12·0

12·1

communication load manipulations. An overall measure of heart rate variability (the sum of the absolute value of the difference between successive IBIs) showed increased variability during the verbal mission relative to the digital mission segments (F[ 1,5] ;;;;: 6·29; p < 0·05) with the greatest variability associated with the high communication load segment of the verbal mission resulting in a significant mission by load interaction (F[ 1, 5] = 24·63; p < 0·05). With regard to the spectral estimates of cardiac pulse variability, power in the temperature band was not affected by either the modality or load manipulations (p > 0·10). Power in the arterial blood pressure band was significantly larger during the verbal than during the digital missions (F[l, 5] ;;;;: 16·19; p < 0·01) with no effect due to the load manipulation. Finally. power in the respiratory band was elevated during the high load segment of the verbal mission producing a significant modality by load interaction' (F[ 1,5] = 8·92; p < 0·05). Since increased workload is generally associated with decreased heart rate variability, these results suggest that the verbal missions (particularly during the high load segment) were associated with lower workload than the digital missions. However, an analysis of the respiration data presented in table 5 indicates that the power in the respiration signal within the same three frequency ranges as analysed for the ECG data was in each case greater during the verbal than the digital mission. Thus, respiratory power in the band from 0·02 to 0·06 Hz was significantly greater during the verbal than the digital missions (F[I, 5] = 24·69; p < 0·01) and during the high as opposed to the low communication load segments (F[l,5] = 30·57; p < 0·0l). Respiratory power in the band from 0-07 to 0-14 Hz was also larger during the verbal than during the digital mission (F[ 1,5] ;;;;: 22·06; p < 0·01) with a marginally significant effect of communication load (F[l, 5] :::;; 4·47; p < 0·08). Finally, in the 0-15 to 0·50 Hz range (where the dominant respiratory frequency is generally expected) the verbal mission once again was associated with more power than the digital mission (F[I,5] = 6·35; p < 0·05) with a marginally significant main effect of load (F[I,5];;;;: 4·47; p < 0·09).

E. J. Sirevaag et a1.

1134

Table 6.

Measures of blink activity. Verbal communication

Measure

Downloaded by [University of Leeds] at 03:58 16 October 2015

Number of blinks per 5 min period Average closure duration (ms) Average blink amplitude

Digital communication

Low load

High load

Low load

High load

73 132 85

71 130 80

59 135

128

93

72

48

This pattern of increased respiratory power during the verbal and high communication load mission segments is consistent with the increased levels of verbalization present in these conditions. Furthermore, because there was a significant correlation between the respiration and ECG power scores (r = 0·35; p < 0·01) the power in the ECG signal was most likely influenced by the differential patterns of vocalizations occurring during the various mission segments. For this reason, it is impossible to determine whether modulations of ECG power were due to differences in cognitive load or merely to differences in respiration due to vocalizations.

3.3.2. Electrooculographic data: Table 6 presents the blink data collected during the study. There was a marginally significant main effect for communication format (F[ 1,5] = 4·11; p < 0·09) indicating that subjects blinked less during the digital mission (when incoming and outgoing messages were presented visually) than during the verbal mission (when the report modality was auditory). The median closure duration of blinks was shorter during flight segments associated with increased communications load regardless of the modality of the report (F[I, 5] = 6·78; p < 0·05). Blink amplitude was unaffected by either the modality or load manipulations. The dissociation between the two measures of blink activity is interesting in that while the blink rate measure distinguished between the verbal and digital conditions the blink duration measure was sensitive to the amount of communication, regardless of the format in which the communication took place. However, these findings are consistent with previous studies that have found blink rate to be a reliable measure of visual workload but not particularly sensitive to verbal or cognitive demands while closure duration seems to provide a measure of workload that is sensitive to both visual and auditory demands (Kramer 1991, Wilson and Eggemeier 1991). 3.3.3. Event-related potential (ERP) data: Grand average waveforms associated with the experimental conditions are presented in figures 1 and 2 for the low and high probability tones, respectively. A positive deflection in ihe latency range between 300 and 500 ms post-stimulus (identified as the P300 component) is visible in most conditions. Table 7 presents the mean range corrected P300 amplitudes associated with the rare and frequent irrelevant auditory probes in all of the experimental conditions. Larger P300 amplitudes were associated with the low communication load segments during both the verbal and digital missions (F[ 1, 5] = 21·46; p < 0·05). Larger P300s were also associated with the rare tones, but only during the low communications load segments yielding a significant load by probability interaction (F[l, 5] = 7·73; p < 0·05).

Pilot performance and mental workload

1135

-20 -16

-12 ~

-8

'0 >0

-4

.-J

~

0

i '-'"

0

Downloaded by [University of Leeds] at 03:58 16 October 2015

CI)

btl

4

'0 >

8

~ .-J

Verbal Verbal Digital Digital

12 16 20

o

Low Load High Load Low Load High Load

100 200 300 400 500 600 700 800 900 1000

Time (rnsec) Figure 1. Grand average ERPs recorded at the Cz site superimposed for the four experimental conditions. There waveforms were elicited by the low probability tones.

These results are consistent with laboratory studies which have found that the amplitude of the P300 provides a reliable index of the processing demands of a task irrespective of the modality of input (Kramer et al. 1983, 1985, 1987. Sirevaag et al. 1989). The decreased probability effect in the high load condition provides further evidence for differential resource demands in the two load conditions. Previous studies have found an effect of probability, with larger amplitude P300s elicited by low probability events, as long as subjects attend to and actively process the eliciting events (see Kramer and Spinks 1991, Pritchard 1981). In the present study it would appear that while the pilots may have had sufficient spare capacity to process the irrelevant probes in the low load condition, the high communication load conditions were sufficiently demanding to preclude active processing of the probes.

4. Discussion The present study had two major goals. The first was to determine, using our workload battery, the relative advantages and disadvantages of digital and verbal communication systems during low-level, high speed helicopter flight in a high fidelity motion-based simulator. The second goal was to evaluate the efficacy of different workload and performance measures in an extra-laboratory environment. With regard to the first goal each of the communication systems had both advantages and limitations. The digital system was associated with more frequent violations of the NOE altitude criterion than the verbal communication system most likely due

Downloaded by [University of Leeds] at 03:58 16 October 2015

E. J. Sirevaag et al,

Verbal Verbal Digital Digital

o

Low Load High Load Low Load High Load

100 200 300 400 500 600 700 800 900 1000 Time (msec)

Figure 2.

Grand average ERPs recorded at the Cz site superimposed for the four experimental conditions. There waveforms were elicited by the high probability tones.

to the requirement for more instrument scanning time in the digital condition. However, given that the pilots were relatively inexperienced with the digital system. it would be expected that the amount of time required to configure the system should decrease with additional training. Task-shedding occurred more frequently when the pilots used the verbal than when they used the digital communication system. A related finding was the fact that incorrect responses also occurred more often when the verbal system was used, particularly in the high load mission segments. A reasonable interpretation of these performance failures is that while the pilots were provided with a historical overview of the messages received and transmitted with the digital format this information was not available with the verbal communication system. Thus, it might be expected that

Table 7.

Probe type

Rare tones Frequent tones

Range corrected P300 amplitude scores (at Cz),

Verbal communication

Digital communication

Low load

High load

Low load

High load

0·76 0·54

0·42 0·40

0-73 0·63

0·34 0-39

Downloaded by [University of Leeds] at 03:58 16 October 2015

Pilot performance and mental workload

1137

this differential effect of communication format could be reduced by including a recall option for the verbal messages, in either visual or auditory format. In general, the performance and derived secondary task measures (task-shedding) suggest that the differences between the verbal and digital systems were structural in nature. Thus, there appears to be little evidence that either of the systems are more resource demanding. Instead, the design characteristics of the systems appear to impose data limits on each of the systems that may be modifiable through increased operator training (i.e., to reduce the time to configure the digital system) and system re-design (i.e., provide a historical account of the messages in the verbal system). The subjective measures, in the form of the NASA TLX scales, failed to distinguish between either the communication format or the communication load manipulations. While it is not particularly surprising that the subjective ratings did not distinguish between the verbal and digital formats (see Yeh and Wickens 1988), the failure to find a difference in subjective ratings as a function communication load would appear to be at odds with some of the physiological and performance metrics. However, when the subjective data are viewed within the context of the pilots' views of workload this dilemma can be resolved. According to information obtained from the pilots during the debriefing, it appears that they did not view any of the conditions to require unacceptably high levels of workload and therefore rated each of the conditions as equivalent. While our simulated scenarios may not have achieved the levels of workload experienced during actual NOE flight, a subset of the performance and physiological measures did discriminate between levels of communication load. Therefore, we must conclude that the subjective ratings should be interpreted with caution since subjective experience did not appear to correspond to other assessments of performance and workload. Two of the physiological measures, closure duration for blinks and the P300 component of the ERP, discriminated between levels of communication load. More specifically, the irrelevant probe technique used to elicit P300s provided results consistent with laboratory studies that have found decreased secondary task P300s with increases in the difficulty of a primary task (Kramer et al. 1983, 1985, 1987, Sirevaag et aL. 1989). In the present case, the P300s elicited by the probes decreased in amplitude from the low to the high communication load conditions. These results are important in at least two respects. First, unlike most previous studies the probes used in the present study did not require an overt response. Thus, it appears that the disadvantages usually associated with secondary task methodologies can be surmounted by the use of the irrelevant probe technique (but see Kramer 1991 for other concerns about the irrelevant probe technique). Second, the fact that the low probability tone provided the most reliable discrimination between levels of communication load suggests that large amounts of ERP data are not required for reliable discriminations between conditions (see also Humphrey et al. 1990). Although the absolute difference in closure duration for eyeblinks was small it was quite reliable across pilots. Therefore, this measure also appears to be promising for the evaluation of workload in complex settings. The spectral decomposition of the ECG signal suggested that variability was higher, in the intermediate and high frequency bands, for the verbal than for the digital conditions. Based on previous results this pattern of data would be interpreted as evidence for higher demands for the digital than for the verbal communication format. Unfortunately, however, it appears that the differences in the ECG variability measures were mirrored by differences in the respiratory spectrum such that the

1138

E. J. Sirevaag et al.

Downloaded by [University of Leeds] at 03:58 16 October 2015

ECG differences could either be due to increased processing demands in the digital conditions or increased verbalizations in the verbal conditions. Thus, these results suggest that differences in the amount of verbalization across conditions should be taken into account when heart rate variability measures' are used to assess cognitive demands in applied settings.

Acknowledgements The research was supported by a grant from Boeing Helicopters to the second and third authors. The research was made possible through heroic efforts by the simulation facility staff at Boeing Helicopters and the University of Illinois equipment design team of Brian Foote and Michael Anderson. We would also like to thank Bob Beggs and Nancy Holup for their support throughout the project. Requests for reprints should be addressed to Arthur F. Kramer, Department of Psychology, University of Illinois at Urbana Champaign, 603 East Daniel Street, Champaign, lL 61820, USA. References AASMAN, J., MULDER, G. and MULDER, L. 1987, Operator effort and the measures of heart rate variability, Human Factors, 29, 161-170. AASMAN, J., WIJERS,A., MULDER, G. and MULDER L. 1988, Measuring mental fatigue in normaJ daily working routines, in P. Hancock and N. Meshkati (eds), Human Mental Workload (Elsevier, Amsterdam), 117-138. BAUER, L., STROCK, B., GOLDSTEIN, R., STERN, J. and WALRATH, L. 1985, Auditory discrimination and the eyeblink, Psychophysiology, 22, 636-641. EOGEMEIER, T., MELVILLE, B. and CRABTREE, M. 1984, The effect of intervening task performance on subjective workload ratings, Proceedings of the Human Factors Society 28th Annual Meeting (Human Factors Society, Santa Monica, California), 954-958. GOLDSTEIN, R., WALRATH, L., STERN, J. and STROCK B. 1985, Blink activity in a discrimination task as a function of stimulus modality and schedule of presentation. Psychophysiology,

22, 629-635. GOPHER, D. and DONCHIN, E. 1986, Workload-an examination of the concept, in K. Boff, L. Kaufman and J. Thomas (eds), Handbook of Perception and Performance (Wiley. New York). HARRIS, R., BONADIES, G. and COMSTOCK, J. 1989, Usefulness of hear measures in flight sirnulalion, Proceedings of the Third Annual Workshop on Space Operations. Automation 'and Robotics (NASA Johnson Space Center, Houston, Texas). HART, S. and STAVELAND, L. 1988, Development of the NASA-TLX (Task Load Index): results of empirical and theoretical research, in P. Hancock and N. Meshkati (eds), Human Menial Workload (Elseiver, Amsterdam). JASPER, H. 1958, The ten-twenty electrode system of the International Federation. Electro-

encephalography and Clinical Neurophysiology, 10, 371-375. JOHANNSEN G., MORAY, N., PEw, R., RASMUSSEN, J., SANDERS, A. and WICKENS, C. 1979, Final report of the experimental psychology group, in N. Moray (ed.), Mental Workload: Its Theory and Measurement (Plenum Press, New York), 101-114. KAHNEMAN, D. 1973, Attention and Effort (Prentice-Hall, New Jersey). KINSBOURNE. M. and HICKS, R. 1978, Functional cerebral space, in J. Requin (ed.), Attention and Performance VII (Erlbaum, New Jersey), 345-362. . KNOWLES, W. 1963, Operator loading tasks, Human Factors. 5, 155-161. KRAMER, A. F. 1991, Physiological metrics of mental workload: a review of recent progress, in D. Damos (ed.), Multiple Task Performance (Taylor & Francis, London). KRAMER, A. F. and SPINKS, J. 1991, Capacity views of information processing, in R. Jennings and M. G. H. Coles (eds), Psychophysiology of Human Information Processing: An Integration of Central and Autonomic System Approaches (Wiley, New York). KRAMER, A. F., WICKENS, C. D. and DoNCHIN, E. 1983, An analysis of the processing demands of a complex perceptual-motor task, Human Factors, 25,597-622.

Pilot performance and mental workload

1139

KRAMER, A. F., WICKENS, C. D. and DONCHIN, E. ]985, The processing of stimulus attributes: evidence for dual-task integrality, Journal of Experimental Psychology: Human Percep-

lion and Performance, 11, 393-408. KRAMER, A. F., SIREVAAG, E. J. and BRAUNE, R. 1987, A psychophysiological assessment of operator workload during simulated flight missions, Human Factors, 29, 145-]60. LAUGHERY, R., DREWS, C. and ARCHER, R. ) 986, A micro SAINT simulation analyzing operator workload in a future helicopter, Proceedings of NAECON (IEEE Press, New York),

896-903. LEPLAT, J, 1978, Factors detennnining workload, Ergonomics, 21, 143-149. LINDHOLM. E. and CHEATMAN, C. 1983, Autonomic activity and workload during learning of a simulated aircraft landing task, Aviation, Space. and Environmental Medicine, 54,

Downloaded by [University of Leeds] at 03:58 16 October 2015

435-439. LYSAGHT, R., HILL. S., DICK, A., PLAMONDON, B., LINTON, P., WIERWILLE. W., ZAKLAD, A., BnTNER, A. and WHERRY, R. 1989, Operator workload: comprehensive review and evaluation of operator workload methodologies, Analytics technical report 2075-3 (Analytics, PA). McCRACKEN, J. and ALDRICH, T. 1984. Analysis of selected LHX mission functions, Technical Note ASI 479-024084, Ancapa Sciences. MORAY, N. 1967, Where is capacity limited? A survey and a model, Acta Psychologica, 27,

84-92. MORAY, N. ]989, Mental workload since 1979, International Reviews of Ergonomics, 2,

123-150. MULDER, G. ]979. Mental load, mental effort and attention, in N. Moray (ed.), Mental Workload: Its Theory and Measurement (Plenum Press, New York). NAVON. D. and GOPHER, D. 1979, On the economy of the human processing system, Psycholog-

ical Review, 86, 214-255. NORTH, R. 1985, WINDEX: A workload index for interactive crew station evaluation, Proceedings of NAECON (IEEE Press, New York). O'DONNELL, R. and EGGEMEIER, F. T. 1986, Workload assessment methodology, in K. Boff, L. Kaufman and J. Thomas (eds), Handbook of Perception and Human Performance (Wiley, New York). OGDEN, G., LEVINE, J. and EISNER. E. 1979, Measurement of workload by secondary tasks,

Human Factors, 21,529-548. OSTER, P. and STERN. J. 1980, Measurement of eye movement, in I. Martin and P. Venables (eds), Techniques in Psychophysiology (Wiley, New York). PAPANICOLAOU, A. and JOHNSTONE, J. 1984, Probe evoked potentials: Theory, method and applications, International Journal of Neuroscience. 24, 107-131. POLSON, M. and FREIDMAN, A. 1988, Task sharing within and between hemispheres: a multiple resource approach. Human Factors, 30,633-643. PONDER. E. and KENNEDY. W. 1927, On the act of blinking, Quarterly Journal of Experimental

Psychology, 18, 89-] 10. SANDERS, A. 1979, Some remarks on mental load, in N. Moray (ed.), Mental Workload: Its Theory and Measurement (Plenum Press, New York), 41-78. SIREVAAG, E., KRAMER, A. F.• COLES, M. and DONCHIN, E. 1989, Resource reciprocity: an event related brain potentials analysis, Acta Psychologica, 70, 77-97. SIREVAAG, E., KRAMER, A. F., DElONG, R. and MECKLINGER. A. 1988, A psychophysiological analysis of multi-task processing demands, Psychophysiology, 25,482. SHERIDAN, T. 1980, Mental workload: What is it? Why bother with it?, Bulletin of the Human

Factors Society, 23, 1-2. STERN, J. and SKELLY, J. 1984, The eyeblink and workload considerations, Proceedings of the Human Factors Society 28th Annual Meeting (Human Factors Society. Santa Monica. CA). VAN DELLEN, H., AASMAN. J., MULDER, L. and MULDER. G. 1985, Time domain versus frequency domain measures of bean rate variability, in J. Orlebeke, G. Mulder and L. VanDooren (eds), Psychophysiology ofCardiovascular Control: Models, Methods and Data (Plenum Press. New York). VICENTE, K., THORTON, D. and MORAY, N. 1987. Spectral analysis of sinus arrhythmia: a measure of mental effort, Human Factors. 29, 171-182.

1140

M. and WICKENS, C. D-.1986, Causes of dissociation between subjective workload measures and performance: caveats of the use of subjective assessments, Applied Ergonomics, 17, 291-296. WICKENS, C. D. 1979, Measures of workload, stress, and secondary tasks, in N. Moray (00.), Mental Workload: Its Theory and Measurement (Plenum Press, New York), 79-99. WICKENS, C. D. 1980, The structure of attentional resources, in R. Nickerson and R. Pew (OOs), Attention and Performance VIII (Erlbaum, New Jersey), 239-257). WICKENS, C. D. 1984, Processing resources in attention, in R. Parasuraman and D. Davies (eds), Varieties of Attention (Academic Press, New York), 63-102. WICKENS, C. D. and 'KRAMER, A. F. 1985, Engineering psychology, Annual Review of Psychology (Annual Reviews Inc., New York). WILSON, G. and EGOEMEIER, T. 1991, Psychophysiological assessment of workload in multi-task environments, in D. Damos (ed.), Multiple Task Performance (Taylor & Francis, London),329-36O. WILSON, G., PuRVIS, B., SKELLY, J., fuLLENKAMP, P. and DAVIS, 1. 1987, Physiological data used to measure pilot workload in actual and simulated conditions, Proceedings ofthe Human Factors Society 31 st Annual Meeting (Human Factors Society, Santa Monica, California), 779-783. YEH, Y. Y. and WICKENS, C. D. 1988, Dissociation of performance and subjective measures of workload, Human Factors, 30, 111-120. VIDULlCH,

Downloaded by [University of Leeds] at 03:58 16 October 2015

Pilot performance and mental workload