Situation awareness implications of adaptive automation ... - CiteSeerX

2 downloads 30041 Views 369KB Size Report
Mar 13, 2006 - dNASA Langley Research Center, Hampton Roads, VA, USA ... Keywords: Adaptive automation; Situation awareness; Air traffic control; Workload ..... A final column in the table showed the call signs of any aircraft currently in ...
ARTICLE IN PRESS

International Journal of Industrial Ergonomics 36 (2006) 447–462 www.elsevier.com/locate/ergon

Situation awareness implications of adaptive automation for information processing in an air traffic control-related task David B. Kabera,, Carlene M. Perryb, Noa Segallb, Christopher K. McClernonc, Lawrence J. Prinzel IIId a

Institute for Automation, College of Informatics and Electrical Engineering, University of Rostock, Rostock-Warnemu¨nde, Germany b Edward P. Fitts Department of Industrial Engineering, North Carolina State University, Raleigh, NC, USA c Department of Behavioral Sciences and Leadership, United States Air Force Academy, USA d NASA Langley Research Center, Hampton Roads, VA, USA Available online 13 March 2006

Abstract The objective of this research was to assess the effectiveness of adaptive automation (AA) for supporting information processing (IP) in a complex, dynamic control task by defining a measure of situation awareness (SA) sensitive to differences in the forms of automation. The task was an air traffic control (ATC)-related simulation and was developed to present four different modes of automation of IP functions, including information acquisition, information analysis, decision making and action implementation automation, as well as a completely manual control mode. A total of 16 participants were recruited for a pilot study and primary experiment. The pilot assessed the sensitivity and reliability of the Situation Awareness Global Assessment Technique (SAGAT) for describing AA support of the IP functions. Half of the participants were used in the primary experiment, which refined the SA measure and described the implications of AA for IP on SA using the ATC-like simulation. Participants were exposed to all forms of automation and manual control. AA conditions matched operator workload states to dynamic control allocations in the primary task. The pilot did not reveal significant differences in SA among the various AA conditions. In the primary experiment, participant recall of aircraft was cued and relevance weights were assigned to aircraft at the time of simulation freezes. The modified measure of SA revealed operator perception and Total SA to improve when automation was applied to the information acquisition function. In both experiments, performance in the ATCrelated task simulation was significantly superior when automation was applied to information acquisition and action implementation (sensory and motor processing), as compared to automation of cognitive functions, specifically information analysis. The primary experiment revealed information analysis and decision-making automation to cause higher workload, attributable to visual demands of displays. Industry relevance The results of this research may serve as a general guide for the design of adaptive automation functionality in the aviation industry, particularly for information processing support in air traffic control tasks. r 2006 Elsevier B.V. All rights reserved. Keywords: Adaptive automation; Situation awareness; Air traffic control; Workload

1. Introduction Automation in air traffic control (ATC) can have significant effects on controller performance, including reduced workload (Laois and Giannacourou, 1995), Corresponding author. Tel.: +1 919 515 3086; fax: +1 919 515 5281.

E-mail address: [email protected] (D.B. Kaber). 0169-8141/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.ergon.2006.01.008

increased capability to perform complex computations and data management (Wickens and Hollands, 2000), and increased performance reliability (National Research Council, 1998). However, high-level, static automation in ATC can also present many disadvantages (Dillingham, 1998), including a loss of controller situation awareness (SA), as a result of substituting the human in system control loops (Endsley and Jones, 1995). Endsley (1995a)

ARTICLE IN PRESS 448

D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

defined SA as, ‘‘the perception of elements in an environment, within a volume of space and time, comprehension of their meaning and projection of their status in the near future.’’ As machines perform more and more ATC functions and operations, controllers have less interaction with the system and may become less aware of system operations. ‘‘Out-of-the-loop’’ (OOTL) performance may reduce a controller’s ability to detect problems (e.g., conflicting aircraft), determine the current state of the system, understand what has happened and what courses of actions are needed, and react to the situation (Endsley, 1996). 1.1. Adaptive automation for ATC New, advanced forms of automation, including adaptive automation (AA), are being considered for ATC to moderate operator workload and preserve SA by facilitating a better match between task demands and operator cognitive resources (Bailey et al., in press). In identifying forms of ATC automation, Hopkin (1998) suggested that automation may not only be applied statically to different types of system functions at different levels, but it can also be dynamically applied to functions to switch control between the human and machine. AA refers to complex systems in which the level of automation or the number of system functions being automated can be modified in real time based on task/contextual information or operator states for the purpose of optimizing system performance and managing operator workload (Scerbo, 1996; Kaber and Riley, 1999). By level of automation, we mean the extent to which machines replace human operators in control of systems. Prior research (Hilburn et al., 1997; Parasuraman et al., 1993b) has demonstrated AA to improve performance in both simple monitoring tasks and tasks requiring cognitive aspects of information processing (IP) when control dynamically shifts from human manual performance to some form of automation. The study by Hilburn et al. (1997) involved actual controllers and a high-fidelity simulation of a European ATC system to determine the effects of AA on decision making in descent advising. They developed three different automation schemes including constant manual control, constant automation and AA, which introduced automation during high air traffic periods. They found the AA condition resulted in the smallest increase in workload compared to fully manual and automated control. This is one example of how AA may benefit ATC, as compared to static automation and manual control. More recent work has been conducted to compare the differential effects of AA applied to a range of IP functions, including psychomotor and cognitive functions, in the context of an ATC simulation (Clamann et al., 2002; Clamann and Kaber, 2003). Clamann et al. (2002) assessed the performance and workload effects of AA applied to information acquisition, information analysis, decision

making, and action implementation functions in a lowfidelity simulation of ATC. They found that operators were better able to adapt to AA when applied to lower-order IP functions, such as information acquisition (sensing) and action implementation (motor performance), as compared to AA applied to higher-order, cognitive functions. By higher-order, cognitive functions, we mean those aspects of ATC requiring planning and decision making and use of complex, long-term memory structures. Results indicated that operator performance was greatest when automation was applied to the action implementation aspect of the task. Workload results revealed AA of the information acquisition function as part of the ATC task simulation to relieve some temporal demands (time pressure) on subjects. The results of this research indicate the effectiveness of AA in ATC may be dependent upon the type of automation (form of IP assistance) provided to a controller. In general, historical work has evaluated AA for supporting psychomotor and cognitive functions in terms of performance and workload and not measures of operator cognition. If potential benefits of AA include reductions in OOTL performance problems (e.g., automation induced complacency (Parasuraman et al., 1993a)) and support for specific IP functions in dynamic control tasks, then this needs to be assessed in terms of measures of operator cognitive states, such as SA, in order to develop more complete rationales for future systems design. Outcome measures, like performance, may not provide direct insight into whether AA for IP is facilitating a better match of task demands and cognitive resources than would otherwise be possible with static automation. 1.2. AA and SA It has been observed by review and theoretical studies of AA (e.g., Kaber et al., 2001; Parasuraman, 2000) that facilitating dynamic allocations of system control functions to a human operator or computer over time can moderate operator workload and, at the same time, may facilitate SA, or operator preparedness for unexpected system states, by maintaining some level of operator involvement in control loops. Parasuraman et al. (1993a) observed that AA could be beneficial for reducing decrements in SA during sustained monitoring performance under static automation. Other research has suggested that SA might be improved in aviation system operations through the use of AA (Prinzel, 2002). Unfortunately, little work has empirically assessed the SA implications of AA. Two recent studies (Kaber and Endsley, 2004; Kaber et al., 2002) measured SA under AA in dynamic control tasks, including radar monitoring and remote control of a telerobot. These studies used direct, objective measures of SA, following the Situation Awareness Global Assessment Technique (SAGAT; Endsley, 1995b), to assess the effects of dynamic changes in automation states (Kaber and Endsley, 2004) or changes in adaptive interface content (Kaber et al., 2002). The SAGAT is an ‘‘on-line SA query

ARTICLE IN PRESS D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

technique’’ (Endsley, 2000) in which a simulation of a complex control task is frozen (stopped) at random points in time in order to administer questionnaires to operators on the current state of the system, the importance of system states to their task goals, and their projections of system states in the near future. The responses to queries are graded based on ‘‘ground truth’’ information recorded by the simulation system at the time of each stop. Kaber and Endsley (2004) found that the form of task automation adaptively applied to simulated radar monitoring played a significant role in operator SA measured using SAGAT. They observed that automation, involving computer assistance in monitoring (sensory processing) and action implementation functions of the task resulted in worse operator SA, as compared to automation primarily applying computer assistance to decision-making aspects of the task and fully autonomous system operation. However, the sensory and action forms of automation had significant positive effects on operator performance in terms of dealing with dynamic changes in control modes, as part of AA. This research suggests that the SA impact of AA for supporting IP may be dependent upon the IP functions to which automation is applied. Although Kaber and Endsley’s (2004) performance results agree with those of Clamann et al. (2002), their results on SA should be of interest to designers because it may mean that applying AA to psychomotor functions to increase performance in the near term may undermine SA over extended periods. (Clamann et al. (2002) trials lasted 20 min; Kaber and Endsley’s (2004) trials were 60 min in duration.) One drawback of Kaber and Endsley’s (2004) study, in terms of providing insight into AA design for actual systems, is that they employed a ‘‘model-based’’ approach; that is, the control mode changes during task performance were pre-programmed based on anticipated changes in operator workload due to scheduled task events (see Scerbo (1996) for further definition of various approaches to AA). Although model-based approaches have been demonstrated to be effective for supporting monitoring task performance (e.g., Parasuraman et al., 1993b; Hilburn et al., 1993), pre-programmed system control allocations cannot be considered to represent truly adaptive systems that change states based on, for example, real-time task information or operator workload states. 1.3. Current research needs Research is needed to evaluate the effectiveness of actual adaptive systems for supporting complex task IP functions, in terms of SA. Other approaches to investigating AA in complex systems have included monitoring operator workload states and triggering control allocations based on workload fluctuations (e.g., Kaber and Riley, 1999), or triggering control mode changes on the basis of critical events (e.g., system failures), but this research has not assessed operator SA. It is possible that workload-matched

449

AA of certain IP functions in complex systems control may promote operator SA and yield higher levels of SA than observed under manual control conditions by providing a better match of operator cognitive resources for performance to workload demands (Bailey et al., in press). In order to evaluate this thesis, there is a need for sensitive SA measurement techniques in the context of complex, dynamic control tasks, like ATC. Table 1 presents a categorization of the empirical studies of AA reviewed here in terms of the types of tasks investigated, the types and levels of system automation explored, and measures used to evaluate the impact on human operators. The majority of these studies have been conducted in the context of aviation-related simulations primarily because of the breadth and type of cognitive requirements imposed on operators (ranging from monitoring to tracking to decision making and real-time planning). In addition, the complex tasks as part of these simulations often pose multiple competing goals for operators, complex decision requirements, and simultaneous attention demands under time pressure. Consequently, performance may be critically dependent upon operators achieving good SA for recognition-primed decision making (Endsley and Jones, 1997; also see Klein, 1998). In general, this is why SA is important in the ATC context and other aviation-related tasks. Table 1 supports the need for further research to describe the SA implications of AA for supporting various IP functions in complex, dynamic control tasks, like ATC. The intent of this study was not to demonstrate AA specifically in the ATC context, but instead to show how AA may support various aspects of IP in a dynamic control task, in general, and the implications on SA. We believe that the ATC paradigm is particularly conducive for this type of research; however, such work could also be conducted using, for example, a process control simulation (e.g., Moray, 1986; Vicente, 1991) or manufacturing systems simulation (e.g., Sharit and Salvendy, 1987) also posing operator performance requirements across stages of IP. Our focus here was to use the ATC paradigm as an application of AA for studying human IP and SA effects. We addressed this focus by defining a measure of SA sensitive to differences in forms of automated assistance in an ATC-related task. We observed the measure when AA was applied to the IP functions of information acquisition, information analysis, decision making, and action implementation (as described in Parasuraman et al., 2000). 1.4. Link of SA to forms of AA and hypotheses The historical approach to AA has been a binary one, involving periodic allocations of completely manual (human) control of a system during full automation in order to reintegrate operators in the control loop and to maintain vigilance and support system state awareness. In this way, AA has been applied to entire tasks, such as tracking performance (e.g., Scallen et al., 1995). Only few prior

ARTICLE IN PRESS 450

D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

Table 1 Classification of empirical research on AA covered in review Studies (in chronological order)

Type of task

Type/function of automation

Level of automation

Response measures used to evaluate AA

Parasuraman et al. (1993b) Hilburn et al. (1993)

Piloting tasks

Monitoring

Piloting tasks

Monitoring

Hilburn et al. (1997)

ATC simulation

Decision making

Kaber and Riley (1999)

Decision making

Kaber et al. (2002)

Radar monitoring and target elimination Telerobot control

Clamann et al. (2002)

ATC task simulation

Full for function or manual control Full for function or manual control Full for function or manual control Full for function or manual control Full for function or manual control Partial automation of various functions or manual control

Performance and workload Performance and workload Performance and workload Performance and workload Performance, workload and SA Performance and workload

Clamann and Kaber (2003)

ATC task simulation

Partial automation of various functions or manual control

Performance and workload

Kaber and Endsley (2004)

Radar monitoring and target elimination

Full for function or manual control

Performance, workload and SA

System supervision Information acquisition; information analysis; decision making; action implementation Information acquisition; information analysis; decision making; action implementation Monitoring; planning; decision making; and action implementation

studies have investigated AA applied to specific IP functions in the context of a task. In the present study, AA was designed to provide assistance with IP functions of the ATC-like simulation and, consequently, we expected SA to be affected in different ways. As in real ATC, SA in the context of our simulation can be defined as an up-to-date assessment of the changing locations of aircraft in a designated airspace, projection of their future locations relative to each other, and knowledge of pertinent aircraft parameters (destination, speed, etc.). Endsley and Rodgers (1994) said that controllers have historically called this ‘‘the picture’’; that is, their mental model of the traffic situation at any moment in time upon which all decisions are based. According to Endsley’s (1995a) general definition of SA, operators achieve system/state awareness on three hierarchical levels, including perception (Level 1 SA), comprehension (Level 2 SA), and projection (Level 3 SA). These levels are reflected in the contextual definition of SA for the ATC task. She also stated that these levels may build on each other in a hierarchical manner; that is, good Level 1 SA may lead to good comprehension and so on. According to Parasuraman et al. (2000), information acquisition automation includes methods for sensing the task environment, delivering information to human operators, as well as providing for some organization/categorization of information. This form of automation allocates aspects of the perceptual stage of IP to machine control and may support operator achievement of Level 1 SA. Information analysis automation involves filtering of information and highlighting particularly relevant items for task performance, as well as providing integrated and

predictive displays of information. Consequently, this form of automation may support operator achievement of states of task comprehension (Level 2 SA) and facilitate more accurate projection of system states (Level 3 SA). One potential problem, however, is that integrated displays may draw operator attention away from primary-task displays, like the radarscope in an ATC task, and cause operators to focus on automation projections of data (e.g., aircraft trajectories) versus data on the current state of the system (degraded Level 1 SA). Decision-making automation is intended to assist or replace operators in response selection by providing prompts/alerts to critical system states and recommendations for resolving problems. This type of automation may also support accurate operator comprehension and projection of system states (Level 2 and 3 SA). However, it often supplants the human operator from system control loops to a great degree and can lead to passive decision-making behavior (cf., Endsley and Kiris, 1995). Operators may come to rely on computer recommendations on, for example, resolving aircraft conflicts in traffic management, with little suspicion of the automation. Such automation may also lead to operators spending little time on basic control functions, including directly querying aircraft for information, which may be important to achieving Level 1 SA. Finally, action implementation automation is intended to assist or replace operators in response execution once a decision has been made. This form of automation may primarily impact operator psychomotor workload and free-up additional attentional resources for perceiving states of the environment, or achieving fundamental Level 1 SA.

ARTICLE IN PRESS D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

In general, because of the potential for OOTL performance problems under automation (Endsley and Kiris, 1995; Wickens, 2000), we expected that operator SA would be superior when AA was applied to lower-order IP functions, including information acquisition and action implementation, than automation applied to higher-order IP functions, providing computer assistance in decision making. This speculation is supported by prior empirical work (Endsley and Kiris, 1995; Kaber et al., 2000). Following the same logic, we also expected that a completely manual control condition, allowing operators to play an active role in system functions (e.g., aircraft clearances, etc.) would support greater SA, as compared to decision automation supplanting operators from formulation of response alternatives. Based on Clamann et al.’s (2002) research, we expected performance in the ATC-related simulation to be worse during completely manual control as compared to any mode of AA. Related to this, Leroux (1993), and Laois and Giannacourou (1995) found that humans performed better in ATC simulations using automation providing assistance with sensory/response functions. We hypothesized that performance in our low-fidelity ATC simulation would be superior during trials in which AA was applied to lowerorder sensory/response functions, including information acquisition and action implementation. We suspected that decreases in performance under AA of the decision function might be attributable to the higher cognitive complexity of the automation (i.e., the relative complexity of an operator’s mental model for comprehending the impact of the automation function) versus the transparency of automation for the lower-order IP functions and the need to smoothly transition between the use of mental models for automated and manual control during AA (Kaber et al., in press). Based on Hilburn et al.’s (1997) research, we expected trials involving manual control to produce greater workload than trials involving automated control. However, on the basis of observations made by Parasuraman et al. (2000), it was expected that automation of higher-order IP functions, including information analysis and decision making, presenting complex displays for operator interpretation might demand high levels of visual attention and increase workload. 2. General methodology A pilot study and primary experiment were conducted as part of this research. The objective of the pilot was to define and validate a direct objective measure of SA for assessment of AA conditions for supporting IP in the ATCrelated task simulation. The pilot was to demonstrate whether the measure was sensitive to differences in the various levels of operator SA during completely manual versus automated control of the simulation. We started with an existing SA measurement methodology and assessed its use in the low-fidelity ATC simulation.

451

The objective of the primary experiment was to use the validated measure of SA to describe the differential effects of AA applied to the various IP functions, including information acquisition, information analysis, decision making and action implementation, in the ATC-related task simulation. In the primary experiment, the performance and workload implications of the various forms of AA were also assessed to provide further evidence on the potential dependence of the effectiveness of AA on the IP function to which automation is applied. Since our simulation was of a limited fidelity, the results presented here may not be directly applicable to actual ATC, but they do shed light on how AA supports IP in terms of SA in complex, dynamic control systems, in general. 2.1. Participants and experimental design We used samples of eight participants for both the pilot and primary experiments. The participants ranged in age from 21 to 29 years and were naı¨ ve (i.e., none of the participants had prior flying or ATC experience). In order to control for intra-individual variability in cognitive and psychomotor behavior during the experiments, we elected to use within-subjects designs with limited sample sizes versus between-subjects designs with many participants serving as repeated observations on each of our automation conditions. We blocked on subject through the experimental designs (presented the manual control and automation trials in random order) in order to limit the potential for individual differences to load on any one experimental condition. Repeated measures of individual participant SA, performance and workload were collected during AA of the low-fidelity ATC simulation in multiple trials (2) under each control mode. 2.2. Independent variable The independent variable manipulated in both experiments was the mode of one of the IP functions— information acquisition, information analysis, decision making, or action implementation. (The modes of automation are described below in detail in the context of our task.) The automation conditions were compared with the manual control condition in both studies and the full experiment focused on the differential effects of the various forms of AA for supporting IP. Each AA trial consisted of both manual and automated control periods. The number of manual and automated control allocations and minutes during a trial depended on participant workload fluctuations during the ATC-related task simulation. 2.3. Tasks Both the pilot study and dual-task scenario involving the ATC-related simulation dary gauge-monitoring task.

primary experiment used a participant performance of (Multitaskr) and a seconThe Multitaskr simulation

ARTICLE IN PRESS 452

D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

presented functionality that could be resident in AA to support ATC IP. Since we used naı¨ ve participants, the fidelity of the Multitaskr was intentionally limited. Many other historical studies have taken a similar approach (e.g., Itoh and Inagaki, 2004; Moray et al., 2000) to demonstrate the effects of forms of automation on IP in low-fidelity simulations of complex processes with naı¨ ve subjects. The secondary task served as a measure of primary-task workload, which was used as a basis for computer mandated control allocations (manual or automated control of primary-task functions) during AA trials. There was no advance warning of the control mode changes for operators, but a salient automation-status display was updated when mode shifts occurred. They were instructed to allocate their attention to the ATC-related task and to the secondary task to the extent they could. 2.3.1. Primary ATC-related simulation Multitaskr is a PC-based simulation with a radarscope (see Fig. 1) presenting approximately 15,400 square nautical miles (nm) of airspace and revealing the position of various types of aircraft to operators, including military, commercial, and private vehicles. Near the center of the radarscope are two airports with approximately 20 nm between them (e.g., Washington Dulles International and

Ronald Reagan Washington International). Each airport has two runways for landing aircraft. There are also eight approach, holding fixes represented on the display by small circles located approximately 30 nm from the center of the radarscope. When an aircraft first appears on the scope, it is represented by a ‘‘white’’ flashing triangular icon and data tag, including the vehicle’s call sign. In the pilot study, all aircraft were initially positioned on the periphery of the scope and traveled toward one of the two airports, destined for one of two runways. During the primary experiment, aircraft initially appeared at the edge of a predefined ‘‘control sector’’ (see the hexagon overlay (dashed line) on the radarscope near the 50 and 60 nm bands), which reduced the total area of control for participants and accelerated the action of the scenario. Once an aircraft is contacted by an operator, the icon becomes solid white and the aircraft’s predetermined clearance (including speed, destination airport, and destination runway) is displayed in a data box on the left side of the Multitaskr display (see upper-left corner of Fig. 1). If an aircraft is placed in a holding pattern, the icon becomes ‘‘yellow’’ until the (virtual) pilot is advised to resume the approach to the destination airport, at which time the icon becomes white again.

Fig. 1. Multitaskr display screen under information acquisition automation.

ARTICLE IN PRESS D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

Military aircraft were generated 10% of the time and traveled at approximately 200 knots. Commercial aircraft were generated the majority of the time (70%) and traveled at approximately 170 knots. Private aircraft were generated 20% of the time and traveled at approximately 100 knots. In real time, all aircraft required between 21 and 42 min to reach an airport during the pilot study and about half this time in the primary experiment. The simulation was programmed to present seven aircraft on the display at any given time and this traffic load only varied by one vehicle during each minute of the task. The goal of the operator in the task was to contact aircraft, make any necessary changes to pre-existing clearances based on their potential to cause conflicts, and safely land aircraft. Clearing an aircraft required two general steps, establishing a communication link and issuing a specific clearance. Participants used a mouse controller to point to, and click on, an aircraft icon. They then selected a ‘‘query’’ command button from a ‘‘control box,’’ as part of the Multitaskr interface (upper-left corner of Fig. 1), in order to request that a pilot provide aircraft flight information. Subsequently, an operator could decide whether to issue a revised clearance to the aircraft. The control box included eight control command buttons (in total) to facilitate five types of clearances, including

453

‘‘reduce speed,’’ ‘‘hold,’’ ‘‘resume,’’ ‘‘change airport,’’ and ‘‘change runway,’’ as well as two action commands, ‘‘submit’’ and ‘‘cancel.’’ A ‘‘history box,’’ located below the control box, displayed all clearance communications with an aircraft (in text form), simulating ATC-like communications, and pilot responses to clearances. It should be noted here that there were specific clearance constraints built-in to the simulation that would result in a pilot not accepting a clearance and maintaining course (e.g., pilots would not execute ‘‘hold’’ clearances if an aircraft was within less than 35 nm of its destination airport; ‘‘change airport’’ clearances could only be issued to vehicles in holding patterns; ‘‘change runway’’ clearances could only be issued to aircraft inside the band of holding fixes (less than 30 nm from airports)). The steps as part of communicating with and clearing aircraft did not have to be completed in sequence and participants could, for example, query multiple aircraft and then provide multiple clearances. During experimental trials, participants completed the task under one of five different control modes: a manual mode, and four AA modes (one for each IP level as described in Table 2). We used the general descriptions of each form of automation presented by Parasuraman et al. (2000) as a guide for defining the modes of

Table 2 Modes of multitaskr control (including manual) Mode

Description

Manual

Manual control—no automated assistance was provided. (Operators performed the task, as described in the simulation overview.)

Information acquisition

A scan line rotated clockwise around the radar display and briefly revealed a trajectory projection aid (TPA) as it passed over aircraft icons (see Fig. 1 and TPA for Aircraft ‘‘GA2’’). A line connecting the vehicle and the airport (or holding fix) revealed the aircraft destination and route. The aircraft speed (in knots) and destination airport and runway were affixed to the center of the TPA (see lower-right quadrant of radarscope). This form of implementation allocated aspects of the perceptual processing stage of IP to machine control

Information analysis

Flight information for each aircraft appearing on the radarscope was displayed in a table in an ‘‘automation aid’’ box directly beneath the history box, as part of the Multitaskr interface. The data included aircraft call signs, destination airports and runways, speeds, and distances (nm) from destinations. A final column in the table showed the call signs of any aircraft currently in conflict (traveling within 3 nm of one another, or two aircraft within 20 nm of the center of the radarscope, destined for the same runway at the same airport). This form of automation assisted operators with storage of aircraft data and integration of perceived information for projecting potential conflicts

Decision making

This mode of automation alerted operators to conflicts (visually during the pilot study and with the addition of audio during the primary experiment), and presented recommendations for conflict resolutions through a table in the automation aid box. (It was observed during the pilot that some aircraft conflicts were not salient to subjects based on their understanding of the conflict criteria. The use of auditory cues on conflicts was implemented for the information analysis and decision making modes in the primary experiment and was in-line with the theoretical descriptions of these forms of automation provided by Parasuraman et al., 2000.) Call signs for conflicting aircraft, recommended clearance changes (reduce speed, hold, resume, change airport, or ‘‘no option’’), and the identifier for a specific aircraft to advise of a necessary change, were all displayed in the table. (The automation was not capable of recommending ‘‘change runway’’ clearances.) Up to three automated clearance recommendations could be displayed at any time and they were listed in order of priority based on the separation distance and speeds of aircraft. This form of implementation assisted operators with decision and response selection aspects of the task

Action automation

This form of automation simulated the ‘‘hand-off’’ of aircraft control from ‘‘approach control’’ to ‘‘local-tower control’’, and the tower automatically maintained full control and responsibility for aircraft within 20 nm of the center of the radarscope. The automation prevented any conflicts of aircraft after ‘‘hand-off’’ to tower control. A data table was presented in the automation aid box and summarized the classification and number of aircraft on the display. The automation assisted users with the response execution function as part of the simulation

ARTICLE IN PRESS 454

D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

Multitaskr automation. In using ATC as an example system for application of their model of types and levels of automation, Parasuraman et al. (2000) identified specific ATC technologies that they considered to be representative of each form of automation. We modeled the modes of automation in the Multitaskr simulation on the basis of descriptions of the various ATC technologies we found in the literature (e.g., Laois and Giannacourou, 1995; Leroux, 1993). Only one aspect of Multitaskr performance was automated in each mode of automation. 2.3.2. Secondary (gauge) monitoring task This task presented a vertical, fixed-scale display with a moving pointer. Participants were required to monitor pointer movements and to detect when a deviation occurred from a central ‘‘acceptable’’ range on the scale (colored in ‘‘green’’) into an ‘‘unacceptable’’ region (colored in ‘‘red’’). The gauge display was presented on a computer monitor separate from that used to present the Multitaskr simulation and positioned directly to the right of the Multitaskr display. Participants were required to correct pointer deviations (return the pointer to the acceptable range) by pressing keys on a keyboard (the ‘‘ctrl’’ and ‘‘shift’’ keys if the pointer moved above and below the acceptable region, respectively). There were, on average, four unacceptable pointer deviations (signals) generated per minute in order for the task to constantly test an operator’s ability to allocate residual attentional resources from the Multitaskr simulation. The number of deviations varied by 71 per min. Once an unacceptable deviation occurred, the pointer remained in the ‘‘red’’ for no more than 3 s (if not corrected), at which time a miss was recorded and the computer system automatically returned the pointer to the center of the display. Participants were informed in advance of test trials that this would happen. Monitoring performance was measured in terms of the number of unacceptable pointer deviations detected, divided by the total number of deviations (the hitto-signal ratio). 2.4. Approach to AA As previously mentioned, a workload-based approach to AA was used in this study (cf., Kaber and Riley, 1999). Under AA conditions, each mode of automation switched ‘‘on’’ and ‘‘off’’ depending upon operator workload states. To identify the point at which the automated mode would switch on or off during experimental trials, performance data were collected in one of the final training sessions. During this training trial, a cyclic automation pattern was applied to the Multitaskr, including 2-min periods of manual control followed by 2-min periods of automated control in one of the four AA conditions. Two participants were randomly assigned to each specific mode of Multitaskr automation for this training trial, and performed the gauge task simultaneously. The average participant gaugemonitoring performance level and the standard deviation

(SD) for the hit-to-signal ratio on pointer deviations were recorded. The performance for the two participants in each mode was used to establish Multitaskr ‘‘overload’’ and ‘‘underload’’ conditions for all participants exposed to the same mode of automation during test trials. When a participant performed an experimental trial, if gauge-task performance dropped below the average training trial performance minus 1 SD for 1 min of the trial (implying an increase in workload), the computer system would mandate a switch from manual control of the Multitaskr to the mode of automation selected for the trial. Once the automation was initiated, if participant performance in the gauge task increased above the mean training trial performance plus 1 SD, or a hit-to-signal ratio of 1.0 was achieved, for 1 min of a trial, then manual control was restored. These criteria were developed by Clamann et al. (2002) for the closed-loop AA system suggesting operator overload (inability to address the majority of aircraft) and underload (no aircraft conflict situations) at 71 SD about mean gauge monitoring performance. 2.5. Dependent variables 2.5.1. Situation awareness We decided to use the SAGAT for SA measurement because of prior research demonstrating validity of the method in various contexts (e.g., aircraft piloting; Endsley, 1995b). To define the measure for use in evaluating the AA and manual control conditions in our experiment, we reviewed cognitive task analyses of ATC operations. Endsley and Rodgers (1994) used goal-directed task analysis to develop a list of SA requirements for en route ATC. Endsley and Jones (1995) performed a similar analysis on Terminal Radar Approach Control center operators. The characteristics of our simulation were similar enough to these domains that we could use the SA requirements lists formulated by the researchers to develop a SAGAT-based approach to measurement. We followed Endsley’s (1995b) methodology and conducted multiple simulation freezes (3) during each experiment trial. Trials were 30 min in duration and freezes were designed to occur within one-of-three 8-min windows of trial time to ensure a sampling of SA throughout testing. The first 6 min of a trial were not used for simulation freezes in order to allow participants time to acquire SA (Endsley, 1995b), and to allow aircraft time to move from the periphery of the radarscope towards the airports. Furthermore, no two freezes were scheduled within 2 min of each other in order to allow participants time to reacquire SA after a freeze (Endsley, 1995b). When a freeze occurred, the Multitaskr displays were blanked, participants moved to an adjacent workstation and were required to respond to a memory test and six queries (selected at random from a pool of 18) targeting the three levels of SA defined by Endsley (1995a). The memory test during the pilot experiment required participants to recall the current locations of all (7) aircraft on the display

ARTICLE IN PRESS D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

by marking-up a graphic of the radarscope with a pencil. In addressing the subsequent queries, participants used a database application to complete tables of responses for each question for all aircraft. There was no time limit on participant responding and once they completed all queries, they immediately resumed the ATC simulation. SA was quantified as the percentage of correct responses to Level 1, 2 and 3 SA queries and a Total SA score at each freeze. Participant answers were compared with ‘‘ground truth’’ information recorded by the simulation computer system at the time of each stop (or by comparison with expert experimenter responses to the same SA queries). 2.5.2. Primary-task performance With respect to measurement of participant performance in the Multitaskr simulation, the number of aircraft cleared (and safely arriving at an airport), conflicts and collisions (aircraft that come in contact with each other or that simultaneously arrive at the same airport destined for the same runway) were recorded. All of these measures were captured during each minute of a test trial. 2.5.3. Workload Operator workload was assessed in terms of the secondary-task performance. The hit-to-signal ratio on pointer deviations in gauge monitoring was considered as an objective indicator of primary-task (Multitaskr) workload. 3. Pilot study For the pilot study, participants were provided with a rigorous training protocol for both tasks, including experience under each mode of Multitaskr automation. They were also trained on the SAGAT measurement method. We tested the participants over a period of 2 days. The SA response measure was determined for both automated and manual control periods as part of AA trials. Comparisons were made of the SA results under manual versus automated control. We found the general type of control to have an affect on higher (Level 3) SA, but the measure was not sensitive to revealing differences in the fundamental aspects of SA, including perceptual processing (Level 1 SA) or comprehension (Level 2 SA), due to the automation characteristics. Consequently, the Total SA score was also not sensitive. We concluded that the standard SAGAT metric might not be satisfactory for demonstrating AA support of IP in our ATC-related task. Since our SA queries as part of the SAGAT were based on actual ATC operations analyses, we were convinced of the construct validity of the measure for our simulation. However, we suspected there might be methodological issues relevant to the domain, in general.

455

4. Primary experiment 4.1. Revision to SA measure design Some recent work by Hauss and Eyferth (2003) suggested that SAGAT may not be a sensitive measure for SA in the ATC environment due to different aircraft having different relevance to controllers at different times in traffic management. They said that controllers may use ‘‘event-based mental representations’’ of air traffic situations in order to determine what information is currently relevant, what information will be relevant in the future, and what information can be neglected in an attempt to make the task manageable from a working memory perspective. With respect to the terminology ‘‘event-based mental representations,’’ Hauss and Eyferth (2003) said that aircraft that have recently been contacted by a controller and that have required recent control actions, or aircraft that are currently (or will soon be) in conflict, may demand more attentional resources than other displayed aircraft and this may drive states of controller awareness. Consequently, controllers may focus on certain aircraft to the exclusion of others at various times during control activities and they may more accurately recall the flight parameters of aircraft most relevant to current task performance in responding to SAGAT queries versus the parameters for aircraft that are not critical to ATC at the time of a SAGAT freeze. It is possible that in our pilot study, participants simply responded accurately to SAGAT queries for the aircraft they had recently dealt with and poorly for those not in the focus of their attention. Since each query was posed for all aircraft on the display at a freeze, the average percent correct responses to a single query could have remained relatively constant across conditions due to the aircraft relevance issue. Consequently, we attributed our lack of SA findings, and sensitivity of the measurement technique, to the approach of assessing participant recall of aircraft and the administration of the SA queries on all aircraft. We sought to modify our SA measure to promote sensitivity to differences in IP between the manual and automated ATC-related task conditions and to describe the differential effects of the various forms of AA on operator perception, comprehension and projection. We returned to the literature to find other studies for direction on refining the SA measurement approach. Aside from the Hauss and Eyferth (2003) study mentioned above, we found that Nunes (2003) assessed the performance and SA of ATC trainees in an experimental task by using the SAGAT. In his study, controllers were either aided or unaided when evaluating pilot requests for flight plan deviations. Trainees in the ‘‘aided’’ condition were given a data link trajectory of the proposed deviation, while those in the ‘‘unaided’’ condition had to extrapolate the trajectories. Nunes (2003) found that the SAGAT did not reveal differences between display conditions, while the performance measures of response time and response accuracy revealed a significant effect of display condition.

ARTICLE IN PRESS 456

D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

Nunes (2003) argued that SA might not have varied between groups because both aided and unaided controllers essentially processed the same information, but the processing occurred at ‘‘different levels’’ (greater for unaided controllers). However, Nunes provided no explanation on how ‘‘IP’’ could be affected by the aiding condition without an effect on SA, as operationally defined through the SAGAT. We suspected that the SAGAT results Nunes (2003) obtained, like ours, could be attributed to the implementation approach to the measurement technique. We presumed that his method, like that employed in the pilot study, placed equal emphasis on aircraft that controllers may have considered irrelevant at certain points in task performance, which may have limited sensitivity of Nunes’ measure to the condition manipulations. Hauss and Eyferth’s (2003) concerns with the SAGAT for assessing controller SA led them to develop a new measure of SA called SALSA, which weighted aircraft based on their relevance to the current control scenario. The SALSA involves expert rating of replay of an air traffic management (ATM) simulation to determine the relevance of each task element (aircraft) to controllers. Only elements that are judged as relevant in the replay are considered for SA queries during subsequent test trials. In addition, rather than having participants recall aircraft positions on a blank radarscope, SALSA involves cued recall, in which participants are given the positions for the aircraft on which they are to be queried. Hauss and Eyferth (2003) contended that these changes in the SA measurement approach reduce the possibility of participants confusing two aircraft positions during free recall and take into account the air traffic controller’s use of an event-based mental representation of their task. Hauss and Eyferth (2003) compared SALSA with SAGAT using a high-fidelity ATM simulation. Their empirical results confirmed that controllers used eventbased mental representations, since significantly more relevant parameters than irrelevant parameters were reproduced using the SALSA measure. Trends on the results of the SALSA measure also corresponded with changes in workload in intuitive ways. On this basis, we decided to investigate a modified approach to implementation of our SAGAT measure. We implemented cued recall of aircraft positions (in comparison to free recall), and we objectively weighted the relevance of aircraft as a basis for queries during simulation freezes. While Hauss and Eyferth (2003) used expert ratings on videos of their simulation scenarios to determine aircraft relevance at particular points in time, the approach we took involved real-time identification of aircraft in conflict, as well as those that had recently been issued clearances (e.g., hold, reduce speed, etc.) or queried during the simulation, as predictors of aircraft relevance to operators and selection as candidates for queries. It was expected that these modifications would lead to a more sensitive measure of the impact of AA of various IP

functions on operator SA. It is important to note that although cueing operators on aircraft location during SAGAT freezes may promote sensitivity of the measure to the display condition, it is possible that cued recall of the presence of aircraft in a particular airspace and parameters of specific aircraft may impact operator SA and task performance following SAGAT freezes, as well as future SA. This was one potential drawback of the modified approach.

4.2. Procedures and variables The steps in the training procedure for the full experiment included: (1) 20 min of practice on the Multitaskr and gauge monitor simulations; (2) 1 h of practice on the various modes of Multitaskr automation; (3) 40 min of training on AA of the simulation, including a dual-task practice session in which participants performed gauge monitoring and the Multitaskr under a randomly assigned mode of automation; and (4) familiarization with the SAGAT method followed by a 50-min dual-task practice session involving SAGAT freezes and SA queries (randomly selected from the pool of test queries). Testing was conducted on 2 days (separate from the training) during a 3-week period. Participants reviewed the simulation procedures on each day and were allowed to practice (15 min) the modes of automation to be tested. They completed five test trials on each day (5 modes of control  2 trials/mode) with each trial lasting 50 min and including SAGAT freezes. Test trial order was randomized across the 2 days. In total, each participant was trained and tested for 13 h. With respect to the SA measure, three SAGAT freezes were again conducted at random times during each experimental trial. When a freeze occurred, the Multitaskr display was blanked and participants moved to the adjacent SAGAT workstation. An experimenter collected information on any aircraft in conflict from the Multitaskr software by accessing a conflict detection aid, which was hidden from participants during testing. The aid also presented any recommended clearances. Based on these data, and position and speed information for each vehicle, the experimenter identified the three aircraft with the highest priority for clearances, or greatest ‘‘relevance,’’ at that point in time in the simulation. To facilitate the cued recall of aircraft as a basis for SA queries, the experimenter quickly sketched the locations of the ‘‘high priority’’ aircraft on a graphic of the Multitaskr radarscope. The participants were given the drawing and required to respond to nine SA queries on each ‘‘high priority’’ aircraft, including three queries targeting each level of SA (perception, comprehension and projection). In total, participants answered 81 SAGAT queries (3 freezes  9 queries  3 ‘‘high-priority’’ aircraft) during each trial. The Multitaskr and gauge monitoring performance measures were recorded during each trial of the study.

ARTICLE IN PRESS D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

4.3. Data analysis

457

corresponding changes in operator SA, this would be further evidence of sensitivity of the modified SAGAT.

There were 240 observations on average Level 1, 2 and 3 SA, and the Total SA score, for the entire experiment (8 participants  5 control modes  2 trials  3 freezes). With respect to Multitaskr and gauge monitoring performance, observations for each participant were averaged across the manual minutes of a trial in order to obtain a single score for each task in each trial. Eighty manual performance scores were gathered for all participants through the control condition trials and AA conditions (8 participants  2 trials  5 control modes). Similarly, performance observations under automated minutes were also averaged to obtain a single score for each task in each trial. Consequently, 64 automated performance scores were available from the AA conditions alone (8 participants  2 trials  4 automation modes). We conducted analyses of variance (ANOVAs) on all response measures (Level 1, 2, 3 and Total SA, aircraft cleared, conflicts, collisions, and secondary task performance), with the mode of control of the ATC simulation included as the independent variable in the statistical model. For the four SA measures, we also conducted ANOVAs to assess the effect of general mode of simulation (automated versus manual) control. Furthermore, for the primary and secondary task performance measures, we analyzed the two data subsets containing all manual periods of control (including completely manual trials and manual periods as part of AA trials) and all automated periods of control. These ANOVAs were performed for aircraft cleared, conflicts, collisions and secondary task performance, with the mode of control of the ATC simulation as the independent variable. We elected to use Duncan’s Multiple Range (MR) test for post-hoc comparison of means. Finally, correlation analyses were conducted on the performance and SA response measures. If any of the Multitaskr performance measures revealed differences among the various automation conditions, and there were

4.4. Results and discussion 4.4.1. Situation awareness An ANOVA on the SA response measures revealed a significant effect of control mode on percent correct responses to Level 1 SA queries (F(4,227) ¼ 3.78, p ¼ 0:0054) and the Total SA score (F(4,227) ¼ 2.7, p ¼ 0:0317). This finding supported our expectation that the modified version of the SAGAT measurement technique would be sensitive to the AA manipulations as part of the experiment. Fig. 2 shows the average Level 1 SA scores under each mode of automation. The pattern of results on Total SA was very similar. We hypothesized that participants would be better at responding to SA queries (on perceptual knowledge) under information acquisition automation and manual control as compared to decision automation. The trajectory projection aid presented during information acquisition automation drew operator visual attention to the display and promoted concentration on aircraft status. Duncan’s MR test revealed Level 1 SA to be significantly superior under information acquisition automation, compared to information analysis, decision making, action implementation, and manual trials ðpo0:05Þ. These results also support our hypothesis that operator (Level 1) SA would be lower under decision making and information analysis automation, but are contrary to the expectation that action implementation automation would free-up attentional resources for achieving better perceptual knowledge. Duncan’s MR tests also revealed information acquisition to be superior, in terms of Total SA, to action implementation automation ðpo0:05Þ. Related to our general hypothesis on degradations in SA due to OOTL performance, it is possible that the automated ‘‘hand-off’’ of aircraft to tower control under the action implementation mode removed Level 1 SA

Average SAGAT Score (%)

100 80 60 40 20 0 Manual

Information Acquisition

Information Analysis

Decision Making

Action Implementation

Automation Mode Fig. 2. Mean Level 1 SA scores for the different modes of automation.

ARTICLE IN PRESS D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

458

Automated Control

Automation Manual

80 60 40 20 0 Level 1

Level 2

Level 3

Total

SA Level Fig. 3. Mean SAGAT scores for manual and automated control periods.

Average Performance (Aircraft)

Average SAGAT Score (%)

Simulation Control Type 100

5 Cleared Conflicts Collisions

4 3 2 1 0 Information Acquisition

Information Analysis

Decision Making

Action Implementation

Automation Mode Fig. 4. Multitaskr performance for automated control periods.

operators from the control loop at a critical time (aircraft landing) and negatively affected SA on aircraft within 20 nm of an airport. (These vehicles may have been deemed relevant to operators as a result of just having left a holding fix, etc.). In addition, Total SA during information analysis and decision-making trials was not inferior to SA during other automation trials. These results suggest that perception of system states (Level 1 SA) may be most critically affected by demanding automation displays. Finally, as in the pilot study, we also found that the general mode of simulation control substantially impacted operator SA. Fig. 3 presents the mean Level 1, 2, 3 and Total SA scores for both automated and manual control periods (including the completely manual trial data). The general mode of automation had a marginally significant effect on percent correct responses to Level 2 SA queries (F(1,227) ¼ 3.51, p ¼ 0:0623), indicating that participant comprehension was, on average, higher during manual control periods. These results support our general notion that particular types of IP automation may remove the operator from the control loop more so than others and lead to decrements in SA (Endsley and Jones, 1995). By comparison with the SA measurement approach taken in the pilot study, cueing participant recall of aircraft and using relevance weighting of aircraft, based on simulation events and recent operator actions, as part of the administration of SA queries, lead to observation of meaningful, significant differences in percent correct operator responses to Level 1 and 2 SA queries and in Total SA due to the various automated conditions. This finding on the measurement approach lends support to our general definition of SA in the context of the ATC-related task simulation and the operational definition we developed through the SAGAT methodology. 4.4.2. Primary-task performance An ANOVA on Multitaskr performance revealed no significant differences across the five control modes in terms of cleared aircraft, and aircraft conflicts and collisions. It is possible that large numbers of manual

control minutes as part of the AA conditions may have caused any differences between the modes of automation and the control condition to be indiscernible. In fact, results of ANOVAs on data collected during only the automated control periods of the four AA mode trials revealed a significant effect of mode of automation on the number of cleared aircraft (F(3,41) ¼ 3.62, p ¼ 0:0208) and the number of aircraft conflicts (F(3,41) ¼ 3.97 p ¼ 0:0143), but not on the number of aircraft collisions. Fig. 4 presents the mean number of aircraft cleared, conflicting, and colliding under each mode of AA during the automation control periods. Duncan’s MR test indicated that the information acquisition, decision making, and action implementation modes were significantly superior ðpo0:05Þ in terms of cleared aircraft, as compared to the information analysis mode of automation. These results support our hypothesis that superior Multitaskr performance would occur during trials in which AA was applied to lower-order sensory/response functions. The high number of cleared aircraft under decision-making automation may be attributed to longer automated control periods for this condition, as compared to the other modes. However, Duncan’s MR tests also revealed decision-making automation to be significantly worse than information analysis automation for preventing aircraft conflicts ðpo0:05Þ. This finding was not surprising because the decision aid only recommended clearances to participants to address conflicts once the automation detected them. It is possible that participants developed a strategy of waiting for the automation to warn them of conflicts and then to consider how to appropriately clear aircraft. ANOVA results for data collected during manual control periods during all five control mode trials revealed a significant effect of mode of automation (F(4,68) ¼ 7.58, po0:0001) on the number of cleared aircraft, only. Duncan’s test indicated that the number of cleared aircraft was significantly higher for the information acquisition and

ARTICLE IN PRESS D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

4.4.3. Correlation analyses In general, Pearson-product moment coefficients provided evidence of associations between participant perception and projection of the ATC-related task simulation states and task performance. Marginally significant positive correlations were observed between Level 1 SA and cleared aircraft ðr ¼ 0:19466, p ¼ 0:0836Þ, Level 3 SA and cleared aircraft ðr ¼ 0:21388, p ¼ 0:0568Þ, and Total SA and cleared aircraft ðr ¼ 0:19754, p ¼ 0:079Þ. Operator performance in the Multitaskr simulation may have been dependent upon the levels of SA they were able to achieve. 4.4.4. Workload (secondary-task performance) An ANOVA on the secondary task performance measure did not reveal significant differences across the five control modes. It is possible that averaging of low- and high-gauge scores across automated and manual control periods when AA was applied to, for example, decision automation washed-out any significant differences among the conditions in terms of workload in this analysis. An ANOVA comparing secondary task performance data collected during the automated periods of the AA mode trials revealed a significant effect of the mode of automation (F(3,41) ¼ 4.01, p ¼ 0:0137). Fig. 5 presents the mean hit-to-signal ratio under each mode of AA of the Multitaskr simulation. It was expected that higher levels of automation, including information analysis and decision making, presenting complex displays for operator interpretation might demand high levels of visual attention and actually increase workload, indicated by decreased secondary task performance. Duncan’s MR tests confirmed that secondary-task performance was significantly lower (workload was higher) under the information analysis and decision-making modes than in action implementation automation. An ANOVA on secondary task performance data collected during Multitaskr manual control periods revealed significant effects due to the mode of automation (F(4,68) ¼ 2.66, p ¼ 0:0399). The pattern of workload

Automated Control Average Hit-to-Signal Ratio

information analysis modes of automation than for the decision making and action implementation modes ðpo0:05Þ. These results are in partial agreement with our original hypothesis that AA applied to lower-order IP functions would yield better performance because of the limited complexity of the automation function and ease with which operators might transition back to manual control during AA trials (Clamann et al., 2002). It is possible that participants needed more time to shift from using a complex mental model for interaction with the decision aid back to their manual control mental model after the decision aid disappeared from the display and they had to identify conflicts themselves. This complex mental model changeover may have subtracted from the time participants allocated to actually clearing aircraft and, consequently, degraded performance.

459

1 0.8 0.6 0.4 0.2 0 Information Acquisition

Information Analysis

Decision Making

Action Implementation

Automation Mode Fig. 5. Secondary-task performance during automated control periods.

results was almost exactly opposite to that observed during the automated control periods. Duncan’s test indicated that secondary task performance was significantly higher (average workload was lower) ðpo0:05Þ during trials where there was decision-making automation, compared to trials where information acquisition and action implementation automation were implemented. It is possible that when decision-making AA was invoked and participants followed recommended clearances for conflict avoidance, the result was lower workload when the simulation returned to manual control. This may have freed-up attention for secondary task performance.

4.4.5. Summary Table 3 presents a summary of results on the primary experiment. In the overall comparison of manual versus automated performance, the trend was for better SA under manual control, but this did not prove to be significant at the accepted alpha level (0.05). Among the AA conditions, automation of information acquisition (lower-order, sensory processing) appears to strongly support operator perception of aircraft states (Level 1 SA). Automation performance data indicated that AA applied to sensory functions was useful for supporting clearances and critical event detection. Manual performance data also revealed that AA of information acquisition was superior to decision (higher-order, cognitive) automation for supporting operator clearances. Secondary task performance was higher (workload was lower) during automated control periods in which assistance was provided with psychomotor functions (action automation). Automation of decision functions in advance of manual control allocations appeared to reduce workload for operators during these periods. In general, it appears that because of task workload, and the complexity of automation displays, operators may be able to better exploit the capabilities of simple forms of AA supporting sensory processing functions and the development of SA.

ARTICLE IN PRESS 460

D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

Table 3 Summary of findings based on full experiment Response measure

Level 1 SA Level 2 SA Level 3 SA Total SA Aircraft cleared

Conflicts Collisions Secondary task performance (workload)

ANOVA comparison made All 5 modes

All 5 modes (manual periods only)

All 4 AA modes (automated periods only)

Automated periods vs. manual periods

Info. Acq.4All other levels n.s. n.s. Info. Acq.4Action Imp. n.s.

NR

NR

n.s.

NR NR NR Info. Acq. and Info. Anal.4DM and Action Imp. n.s. n.s. DM4Info. Acq. and Action Imp.

NR NR NR Info. Acq., DM, and Action Imp.4Info. Anal.

Man4Auto n.s. n.s. NR

Info. Anal.4DM n.s. Action Imp.4Info. Anal. and DM

NR NR NR

n.s. n.s. n.s.

DM—decision making; NR—not reported; n.s.—not significant.

5. Conclusions 5.1. Caveats The results of this study should be interpreted with caution relative to the design of automation for ATC. This is due to the use of the low-fidelity simulation of an ATC task and university students as participants. Although the functions of the Multitaskr simulate the cognitive requirements of a limited subset of ATC IP functions, it is a PC-based task and the version used in this study did not require participants to keep track of aircraft flight level. Consequently, the working memory requirements may have been lower but there was also less flexibility in the simulation for clearing aircraft of conflicts. The simulation also did not require users to verbally communicate with pilots, maintain flight strips, deal with inadvertent aircraft deviations from prescribed routes, etc. These functions (in real ATC operations) add to task workload. Lastly, the Multitaskr simulation does not evoke in users the normal stress associated with clearing aircraft that are carrying people. It is possible that this impacted participant motivation and concentration during simulation trials and, consequently, performance. Novice operators were used in both experiments. Although they completed an extensive training program and they may have been experts at performing the simulation, they still were not actual air traffic controllers. Future work needs to make comparison of the results presented here with the SA implications of AA in ATC tasks demonstrated through comparable experiments with, for example, certified Federal Aviation Administration controllers. 5.2. SA and AA The main objective of the pilot study of this research was to define an approach to measuring SA sensitive to manual

and automated control manipulations in the context of the ATC-related task. The objective of the primary experiment was to use this measure to describe in detail the differential effects of AA for supporting various IP functions, as part of the ATC task simulation, on operator perception, comprehension and projection of traffic situations. The general ATC-related IP functions to which AA was applied included information acquisition, information analysis, decision making and action implementation. Both the pilot and primary experiments revealed some effect of the general mode of control (i.e., manual versus automated) on operator SA. However, this occurred at different levels of SA, including operator projection and perception abilities. On the basis of findings of other recent research on SA measurement for ATC, we designed a modified-SAGAT approach for measuring SA in the primary experiment and it proved to be effective in terms of assessing the impact of specific forms of AA for supporting IP. Using cued recall of aircraft, and establishing relevance weights for various aircraft at the time of SAGAT freezes, appeared to cause the SA response measures to be sensitive to the AA manipulations. Specifically, our findings on Level 1 SA revealed participant perception to be significantly superior under AA applied to lower-order IP functions, including information acquisition, which provided participants with additional information on aircraft flight parameters. One important point to make with respect to the modified SAGAT approach is that it is possible operator SA at the time of a SAGAT freeze may have extended beyond knowledge of those aircraft deemed to be most relevant, given current task circumstances. Since it is the job of an ATC to attempt to maintain awareness on all aircraft in an assigned sector in order to ensure separation, participants may have retained some knowledge of less relevant aircraft prior to a freeze. In this case, the modified SAGAT measure, which constrained the task information that was used as a basis for evaluating operator SA, may

ARTICLE IN PRESS D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

not have represented a global measure of SA (see Endsley, 1995b). That is, the measure may not have captured participant ability to maintain awareness on all elements of the environment. 5.3. ATC performance with AA In agreement with prior research (Clamann et al., 2002), the primary experiment demonstrated differential effects of AA applied to various IP functions of the dynamic control task on performance. We also observed that performance varied among the automated and manual control periods as part of AA trials. With respect to automated control, our results indicated that automation applied to higherorder IP functions, involving information analysis assistance (identification of potential aircraft conflicts), may not support successful controller clearing of aircraft. However, as one might expect, the same form of automation appears to improve conflict detection. Although the automation made conflicts salient to operators in our task, the fact that they were removed from the control loop in terms of closely monitoring aircraft flight parameters and projecting the conflicts themselves seems to have a negative implication on the execution of appropriate clearances to land aircraft. Beyond this, we observed that automation applied to lower-order IP functions, including action implementation and information acquisition, which support operators in terms of clearing aircraft and gathering information on flight parameters, promote performance, specifically the number of successful clearances. We also observed that in trials with information analysis automation, operators do better at successfully clearing aircraft during manual control periods than trials in which action implementation automation provided automatic clearances. 5.4. Future research Hauss and Eyferth (2003) attributed the need for a measure of SA, weighting the relevance of elements in the environment, to the unique complexity of the ATC environment requiring controllers to manage a large amount of information using event-based mental representations. It is possible that relevance of stimuli or events to human performance in other domains, such as driving and piloting, may be important to consider in developing SA measures for studying, for example, in-vehicle highway systems or cockpit automation effects on driver and pilot SA. Research is needed to identify other task environments in which ‘‘weighted’’ SA measures may be applicable and useful. Such measures may then be used to assess the effects of AA of various human IP functions on SA in different contexts. With a more complete understanding of the effectiveness of AA for supporting IP functions and SA, future research is needed to develop methods for real-time assessment of SA to be used as a basis for triggering control mode

461

changes in complex, adaptive systems. It is possible that SA-matched AA may produce superior performance results compared to workload-matched AA as a result of considering concurrent operator cognitive states in determining control allocations. The criterion for this type of AA design would entail allocating control to automation only when the controller is fully ‘‘in-the-loop’’ and has achieved sufficient SA. However, if the controller loses ‘‘the picture’’ (their internal situation model), manual control could be reinstated to require perception, comprehension, and projection of traffic management systems. It would be necessary to empirically assess the effectiveness of such an approach to AA for IP functions in ATC. Acknowledgements This research was supported by a grant from NASA Langley Research Center (Grant no. NAG-1-03022). Lance Prinzel was the technical monitor. The research was completed while the first author worked as an associate professor in the Department of Industrial Engineering at North Carolina State University and while the fourth author was a graduate student in the same Department. The views and conclusions presented in this paper are those of the authors and do not necessarily reflect the views of NASA. References Bailey, N.R., Scerbo, M.W., Freeman, F.G., Mikulka, P.J., Scott, L.A., in press. A comparison of a brain-based adaptive system and a manual adaptable system for invoking automation. Human Factors. Clamann, M.P., Kaber, D.B., 2003. Authority in adaptive automation applied to various stages of human–machine system information processing. In: Proceedings of the 47th Annual Meeting of the Human Factors and Ergonomics Society. Human Factors and Ergonomics Society, Santa Monica, CA, pp. 543–547. Clamann, M.P., Wright, M.C., Kaber, D.B., 2002. Comparison of performance effects of adaptive automation applied to various stages of human–machine system information processing. In: Proceedings of the 46th Annual Meeting of the Human Factors and Ergonomics Society. Human Factors and Ergonomics Society, Santa Monica, CA, pp. 342–346. Dillingham, G.L., 1998. Air traffic control: evolution and status of FAA’s Automation Program. Technical Report GAO/T-RCED/AIMD-9885, United States General Accounting Office, Washington, DC. Endsley, M.R., 1995a. Toward a theory of situation awareness in dynamic systems. Human Factors 37 (1), 32–64. Endsley, M.R., 1995b. Measurement of situation awareness in dynamic systems. Human Factors 37 (1), 65–84. Endsley, M.R., 1996. Automation and situation awareness. In: Parasuraman, R., Mouloua, M. (Eds.), Automation and Human Performance: Theory and Applications. Lawrence Erlbaum, Mahwah, NJ, pp. 163–181. Endsley, M.R., 2000. Direct measurement of situation awareness: validity and use of SAGAT. In: Endsley, M.R., Garland, D.J. (Eds.), Situation Awareness Analysis and Measurement. Lawrence Erlbaum Associates, Mahwah, pp. 147–173. Endsley, M.R., Jones, D.G., 1995. Situation awareness requirements analysis for TRACON air traffic control. Technical Report TTU-IE95-01, Technology Center, Federal Aviation Administration, Atlantic City, NJ.

ARTICLE IN PRESS 462

D.B. Kaber et al. / International Journal of Industrial Ergonomics 36 (2006) 447–462

Endsley, M.R., Jones, W.M., 1997. Situation awareness, information dominance, and information warfare. Technical Report 97-01, Endsley Consulting, Belmont, MA. Endsley, M.R., Kiris, E.O., 1995. The out-of-the-loop performance problem and level of control in automation. Human Factors 37 (2), 381–394. Endsley, M.R., Rodgers, M.D., 1994. Situation awareness information requirements for en route air traffic control. Technical Report DOT/ FAA/AM-94/27, Office of Aviation Medicine, United States Department of Transportation, Federal Aviation Administration, Washington, DC. Hauss, Y., Eyferth, K., 2003. Securing future ATM-concepts’ safety by measuring situation awareness in ATC. Aerospace Science and Technology 7, 417–427. Hilburn, B., Molloy, R., Wong, D., Parasuraman, R., 1993. Operator versus computer control of adaptive automation. In: Jensen, R.S., Neumeister, D. (Eds.), Proceedings of the Seventh International Symposium on Aviation Psychology. Department of Aviation, The Ohio State University, Columbus, OH, pp. 161–166. Hilburn, B.G., Jorna, P., Byrne, E.A., Parasuraman, R., 1997. The effect of adaptive air traffic control (ATC) decision aiding on controller mental workload. In: Mouloua, M., Koonce, J.M. (Eds.), Human–Automation Interaction: Research and Practice. Lawrence Erlbaum Associates, Inc., Mahwah, NJ, pp. 84–91. Hopkin, V.D., 1998. The impact of automation on air traffic control specialists. In: Smolensky, M.W., Stein, E.S. (Eds.), Human Factors in Air Traffic Control. Academic Press, San Diego, CA, pp. 391–419. Itoh, M., Inagaki, T., 2004. A microworld approach to identifying issues of human–automation systems design for supporting operator’s situation awareness. International Journal of Human–Computer Interaction 17 (1), 3–24. Kaber, D.B., Endsley, M.R., 2004. The effects of level of automation and adaptive automation on human performance, situation awareness and workload in a dynamic control task. Theoretical Issues in Ergonomics Science 5 (2), 113–153. Kaber, D.B., Riley, J., 1999. Adaptive automation of a dynamic control task based on secondary task workload measurement. International Journal of Cognitive Ergonomics 3 (3), 169–187. Kaber, D.B., Onal, E., Endsley, M.R., 2000. Design of automation for telerobots and the effect on performance, operator situation awareness and subjective workload. Human Factors & Ergonomics in Manufacturing 10 (4), 409–430. Kaber, D.B., Riley, J.M., Tan, K., Endsley, M.R., 2001. On the design of adaptive automation for complex systems. International Journal of Cognitive Ergonomics 5 (1), 37–57. Kaber, D.B., Wright, M.C., Hughes, L.E., 2002. Automation-state changes and sensory cueing in complex systems control. Final Report, Office of Naval Research Award #N0001401010402, Office of Naval Research, Arlington, VA. Kaber, D.B., Wright, M.C., Prinzel, L.P., Clamann, M.P., in press. Adaptive automation of human–machine system information processing functions. Human Factors. Klein, G., 1998. Sources of Power: How People Make Decisions. MIT Press, Cambridge. Laois, L., Giannacourou, M., 1995. Perceived effects of advanced ATC functions on human activities: results of a survey on controllers and

experts. In: Proceedings of the Eighth International Symposium on Aviation Psychology. The Ohio State University, Columbus, OH, pp. 392–397. Leroux, M., 1993. The role of expert systems in future cooperative tools for air traffic controllers. In: Proceedings of the Seventh International Symposium on Aviation Psychology. The Ohio State University, Columbus, OH, pp. 335–340. Moray, N., 1986. Monitoring behavior and supervisory control. In: Boff, K.R., Kaufmann, L., Thomas, J.P. (Eds.), Handbook of Perception and Human Performance: Volume II: Cognitive Processes and Performance. Wiley, New York (40-1–40-51). Moray, N., Inagaki, T., Itoh, M., 2000. Adaptive automation, trust and self-confidence in fault management of time-critical tasks. Journal of Experimental Psychology: Applied 6 (1), 44–58. National Research Council, 1998. The Future of Air Traffic Control: Human Operators and Automation. National Academy Press, Washington, DC. Nunes, A., 2003. The impact of automation use on the mental model: findings from the air traffic control domain. In: Proceedings of the Human Factors & Ergonomics Society 47th Annual Meeting. Human Factors and Ergonomics Society, Santa Monica, CA, pp. 66–70. Parasuraman, R., 2000. Designing automation for human use: empirical studies and quantitative models. Ergonomics 43 (7), 931–951. Parasuraman, R., Molloy, R., Singh, I.L., 1993a. Performance consequences of automation-induced complacency. International Journal of Aviation Psychology 3, 1–23. Parasuraman, R., Mouloua, M., Molloy, R., Hilburn, B., 1993b. Adaptive function allocation reduces performance costs of static automation. In: Jensen, R.S., Neumeister, D. (Eds.), Proceedings of the Seventh International Symposium on Aviation Psychology. Department of Aviation, The Ohio State University, Columbus, OH, pp. 178–185. Parasuraman, R., Wickens, C.D., Sheridan, T., 2000. A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics 30 (3), 286–297. Prinzel, L.J., 2002. Research on hazardous states of awareness and physiological factors in aerospace operations. Technical Publication: NASA/TM-2002-211444, NASA, Washington, DC. Scallen, S., Hancock, P.A., Duley, J.A., 1995. Pilot performance and preference for short cycles of automation in adaptive function allocation. Applied Ergonomics 26, 397–403. Scerbo, M.W., 1996. Theoretical perspectives on adaptive automation. In: Parasuraman, R., Mouloua, M. (Eds.), Human Performance in Automated Systems: Theory and Applications. Lawrence Erlbaum Associates, Mahwah, NJ, pp. 37–64. Sharit, J., Salvendy, G., 1987. A real-time interactive computer model of a flexible manufacturing system. IIE Transactions 19, 167–177. Vicente, K.J., 1991. Supporting knowledge-based behavior through ecological interface design. Unpublished Doctoral Dissertation, University of Illinois. Wickens, C.D., 2000. Imperfect and unreliable automation and its implications for attention allocation, information access, and situational awareness. Technical Report ARL-00-10/NASA-00-2, Aviation Research Laboratory, Institute of Aviation, University of Illinois, Savoy, IL. Wickens, C.D., Hollands, J.G., 2000. Engineering Psychology and Human Performance, third ed. Prentice Hall Inc, Upper Saddle River.