Evaluating the Benefits and Potential Costs of Automation Delegation ...

PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 54th ANNUAL MEETING - 2010

1498

Evaluating the Benefits and Potential Costs of Automation Delegation for Supervisory Control of Multiple UAVs Tyler Shaw1, Adam Emfield1, Andre Garcia1, Ewart de Visser1, Chris Miller2, & Raja Parasuraman1, Lisa Fern3 1 George Mason University , Smart Information Flow Technologies2, San Jose State University Research Foundation/US Army Aeroflightdynamics Directorate

Copyright 2010 by Human Factors and Ergonomics Society, Inc. All rights reserved. 10.1518/107118110X12829370088525

Previous studies have begun exploring the possibility that “adaptable” automation, in which tasks are delegated to intelligent automation by the user, can preserve the benefits of automation while minimizing its costs. One approach to adaptable automation is the Playbook® interface, which has been used in previous research and has shown performance enhancements as compared to other automation approaches. However, additional investigations are warranted to evaluate both benefits and potential costs of adaptable automation. The present study incorporated a delegation interface into a new display and simulation system, the multiple unmanned aerial vehicle simulator (MUSIM), to allow for flexible control over three unmanned aerial vehicles (UAVs) at three levels of delegation abstraction. Task load was manipulated by increasing the frequency of primary and secondary task events. Additionally, participants experienced an unanticipated event that was not a good fit for the higher levels of delegation abstraction. Treatment of this poor “automation fit” event, termed a “Non-Optimal Play Environment” event (NOPE event), required the use of manual control. Results showed advantages when access to the highest levels of delegation abstraction was provided and as long as operators also had the flexibility to revert to manual control. Performance was better across the two task load conditions and reaction time to respond to the NOPE event was fastest in this condition. The results extend previous findings showing benefits of flexible delegation of tasks to automation using the Playbook interface and suggest that Playbook remains robust even in the face of poor “automation-fit” events.

The use of Unmanned Aerial Vehicles (UAVs) for combat and surveillances mission in a variety of military operations is increasingly prevalent. UAVs such as the Hunter and Shadow are being used in threatening environments that are inaccessible to human military personnel. Due to the high cognitive load imposed and the degree of complexity associated with manual control of these UAVs, control generally requires multiple operators. For example, an AVO (aviator operator) may be responsible for aviating and navigating the UAV, while an MPO (mission payload operator) will search for targets and monitor system parameters (Dixon, Wickens, & Chang, 2005). The need for multiple operators for any one UAV limits the number of UAVs that can be deployed for surveillance and reconnaissance missions. Hence, one goal of much ongoing research and development is to define displays and modeling techniques that allow for one or few operator(s) to control several UAVs. One way this goal can be achieved is to allow UAVs to operate autonomously within an integrated human-robot network (Parasuraman, Sheridan, & Wickens, 2000). However, previous studies exploring the coordination between humans and automated agents have revealed that there are costs associated with high levels of autonomy, such as overreliance, reduced situation awareness, loss of operator skill, and unbalanced mental workload. (Endsley & Kiris, 1995; Parasuraman & Riley, 1997). A better possibility may

be to implement adaptable automation (Miller & Parasuraman, 2007), in which operators can delegate tasks to automated aids during task performance. Adaptable automation differs from other adaptive automation concepts in that the decision to automate is made by the user, not the system, thereby providing greater user flexibility, control, and awareness (Miller & Parasuraman, 2007). Thus, automation delegation concepts and interfaces are needed to test the idea of flexible automation where human operators are able to delegate tasks to automation at times of their own choosing, and receive feedback about their performance (Parasuraman, Galster, Miller, 2003). The Playbook® interface provides one such human-automation delegation architecture. The Playbook framework is based on a shared model of tasks which provides a means of human-automation communication about plans, goals, methods, and resource usage (Miller et al., 2004), similar to “calling plays” within a sports team. The Playbook interface allows for effective tasking of robots at varying levels of abstraction during system maintenance, while preserving situation awareness and keeping cognitive load within normal ranges (Miller & Parasuraman, 2007). Previous studies (e.g. Parasuraman et al., 2005; Squire et al., 2004) have used the RoboFlag simulation as a test of the Playbook architecture. RoboFlag is a semiautonomous experimental test bed modeled on real robotic hardware, in which a supervisor controls a team of robots to enter an opponent territory, capture the flag, and return to their


side without losing their flag (Campbell et al., 2003). Results from RoboFlag studies were able to show the benefits of flexible automation compared to less flexible automation. RoboFlag can be viewed as a simplified Playbook interface for a single human operator supervising multiple UAVs (Parasuraman et al., 2003, Squire, Galster, & Parasuraman, 2004). In typical military environments employing UAVs, each UAV operator generally works with multiple displays (e.g. map display, task display, communication display, and a display showing a first-person point-of-view (POV) through the UAVs camera). While the results of the aforementioned studies are encouraging, it is not known whether or not the results from the RoboFlag studies can extend to studies with more sophisticated interfaces representative of current and planned multi-UAV operations. To that end, in the current study we used the Multiple UAV simulator (MUSIM), which requires one operator to control 3 UAVs. While controlling the UAVs, they are provided with an aerial 2-dimensional map view, a first person POV from each UAV camera, and a separate control panel to execute highlevel UAV functions. The MUSIM environment more closely resembles the UAV control and operating environment in the field, as operators must successfully work with all displays to achieve mission success. Extending the delegation concept to simulations like this is essential if implementation of this technology is to be achieved. Fern and Shively (2009) carried out the initial study involving MUSIM comparing performance under three levels of delegation abstraction—that is, three levels at which automation could be invoked and tasked according to the human operator’s specifications. These levels were: Tools only (complete manual mode), Scripts, (intermediate level of abstraction), and Plays (highest level of abstraction). Delegation capabilities were inclusive at each level—that is, having access to the Tools level of delegation meant participants could also delegate using tools if they desired and having access to Plays meant Scripts and Tools were also available. The results showed benefits for the highest level of adaptable automation, as performance on the primary and secondary tasks in that study were superior to that when either Scripts or Tools, or Tools alone, were available. However, operators of UAVs can experience varying levels of task load, and that variable was not examined in the Fern and Shively (2009) study. Accordingly, one goal of the current study was to investigate the efficacy of Playbook by contrasting two levels of cognitive load. In this study, the MUSIM system was employed to continue the research testing the efficacy of the Playbook interface. Using two levels of task load, participants completed 6 missions, each containing one unexpected NOPE event that required participants to use Tools mode (manual control) to eliminate that threat. Two competing hypotheses were tested. First, consistent with the literature on automation over-reliance and complacency (Lee & See, 2004; Parasuraman, Mouloua, & Singh, 1993), participants could become over-reliant on Plays and would therefore be unable to switch to manual mode in a timely manner to deal with the emerging enemy threat. In this case, we predicted a cost for using Plays, the maximal level of flexibility employed in this

1499

study, and an advantage for using Tools, or manual mode. The alternative, competing hypothesis is that the flexibility of Playbook, as shown in a previous RoboFlag study (Parasuraman et al., 2005), would allow the operator to deal with the unexpected event efficiently, which would predict an advantage for Plays over the other two levels. This experiment was designed to test these hypotheses. Method Participants and Design Seven males and eight female undergraduates participated in this experiment. Participants were given experimental credit for participating. A 2 × 3 within subjects design was employed in this study, defined by the factorial combination of 2 levels of task load (low, high) and 3 levels of control mode (Tools, Plays/Tools, and Plays/Scripts/Tools) creating a total of 6 conditions. All control modes will be described in the next section. Apparatus Task environment. The Multiple UAV Simulator (MUSIM) task environment, created by personnel at the U.S. Army’s Aeroflight Dynamics Directorate, was employed in this study. In MUSIM, participants have control over three UAV’s (1:3 operator/vehicle ratio) with heterogeneous capabilities to carry out mission objectives. In each scenario, the primary objectives for participants were to monitor 3 named areas of interest (NAIs) to 1) track potential enemy threats (that presented in the form of high mobility multipurpose wheeled vehicles, or humvees), 2) prosecute, or eliminate, those humvees that travelled to an armory location to become weaponized, and 3) eliminate any threats that could potentially be present outside of the pre-planned 3 NAI operating space. A secondary task, which provided the basis for the task load manipulation, was to “paint” civilian vehicles that also emerged from the NAI’s. In low task load conditions, 19 civilian vehicles were presented; in high task load conditions, 64 civilian vehicles were presented (frequencies both higher and lower than those used in the Fern and Shively experiment). All trials lasted 10 minutes. MUSIM requires interaction with 3 displays: A 2 dimensional top-down map view, a MultiFunction panel, and a first-person sensor view from each UAV. The three MUSIM displays can be seen in Figure 1. The map view allows for an aerial view of the entire MUSIM environment, provides position and waypoint (flight paths) information, and allows for manual editing of waypoints. The MultiFunction display is used to monitor status and behavior, specify and equip weapons in Tools mode, and toggle between the different control modes. The UAV sensor view displays are used to interact with objects in the MUSIM environment (i.e. civilian vehicles, humvees). Manipulating the sensor views required the use of a Connexion SpaceExplorer video game controller which allowed for panning along the x and y-axis and zooming. An optical mouse was used for interaction of operator control panels in both the MultiFunction and map


displays. Two monitors were required to present the 3 displays. The map display was presented on a 17” CRT offset 35o to the left. The sensor display and MultiFunction display were both presented on a 22” monitor located directly in front of the participant Control Modes. Tools mode is basic manual control of all vehicle behaviors above basic autopilot route-following (which was available in all modes). This mode requires manually editing and moving waypoints to orbit around the NAIs and manually lase and destroy enemy vehicles. The procedure for prosecuting a target was for the only laserequipped UAV (Bravo) to lase the weaponized humvees and send coordinates to the only missile-equipped UAV (Charlie) who had prosecuting capabilities. Subsequently, participants had to equip weapons and instruct UAV Charlie to deploy the missile. Scripts mode, which is considered intermediate automation, provides automation that allows the operator to select a ‘script’ for one UAV at a time to provide some aggregated control of that UAV’s behavior, including setting automatic flight paths, slewing the camera sensor to an NAI, and automate the process of lasing a vehicle (with Bravo) prior to destruction. Destroying targets in scripts mode (i.e. equipping and instructing UAV Charlie) still needs to be performed manually, but the lasing process (including passing coordinates) has been automated. Plays mode provides multiUAV automation. In Plays mode, all UAVs can be simultaneously controlled, including setting automatic flight paths, slewing sensors to appropriate locations, and destroying targets. There were 3 Plays available: 1) Monitor NAIs automatically tasks all 3 UAVs to monitor all 3 NAIs, 2) Prosecute Target- Automatically lases and prosecutes weaponized targets (after a final human authorization for missile launch), and 3) Track Target Reconfigure Team- tasks 2 UAVs to monitor 3 NAIs while the other UAV tracks a potential enemy threat. In all modes of control, manual manipulation of the payload sensor was required to track targets and mark civilian vehicles. In both Tools and Plays control modes, the various behaviors involving monitoring NAIs can only be done with regards to pre-planned NAIs—that is, those that are listed in the system prior to the beginning of the mission. Hence, Plays and Scripts were “optimized” for a set of expected conditions: monitoring, painting and prosecuting appropriate targets that appeared near the 3 “downtown” NAIs which operators expected to be watching during their mission. In addition, we also included one event for which plays were not optimized in each mission: the appearance of a “pop up” target well outside the downtown area—notionally, a mortar shelling troops. This target was very high priority but, because it did not exist at a pre-designated NAI, operators had to use manual controls to get a UAV into position to find it. Procedure All participants were welcomed to the experiment and given informed consent prior to training. The MUSIM environment was then introduced via computer based training that included Microsoft PowerPoint slides and an introductory training video that described MUSIM scenario features, such

1500

as the SpaceExplorer and the different displays. Training materials were based on, and similar to, those used in the Fern and Shively (2009) experiment. Time was allotted to allow the participants to familiarize themselves with the SpaceExplorer controller. Once comfortable with the controller, participants were introduced to the three different control modes via training videos. After viewing each training video, participants were given practice trials under the three different control modes. Training time ranged from 45-60 minutes, and the entire experiment, including training and testing, lasted approximately 2 hr, 30 min.

a

b

c

Figure 1. MUSIM interface displays showing a) map display, b) MultiFunction control panel, and c) UAV sensor view.

Results Three performance measures were used to determine the efficacy of the 3 control modes under the varying workload conditions: Target tracking, secondary task performance (painting civilian vehicles), and the ability to successfully prosecute the unexpected event. For all 3 metrics, both accuracy and reaction time (i.e. time to respond from first appearance of the target vehicle) were examined. Prosecute target events occurred very infrequently and were therefore left out of the analysis. Each dependent measure was subjected to a 2 (task load) x 3 (control mode) Analysis of Variance (ANOVA). Tracking Performance In low task load conditions, 2 humvees appeared that could be tracked (1 of which became weaponized and required prosecution). In high task load conditions, 6 humvees presented that could be tracked (2 of which became


Humvees successfully tracked (Percent)

90 80

this new target location required the use of Tools mode. An ANOVA of the unexpected event performance revealed a significant reaction time main effect for control mode F(2,28) = 13.48, p < .05, such that participants were faster in dealing with the unexpected event in conditions with Plays/Scripts/Tools (M=175.11, SE = 3.64) than with either Tools (M = 214.22, SE = 6.19) or Plays/Tools (M = 189.49, SE = 6.00). In addition, a main effect for task load was also observed, F(1, 14) = 11.45, p < .05, such that participants were quicker to prosecute the unexpected event in the context of the low (M = 241.9, SE = 14.3) than the high (M= 269.2, SE = 13.96) task load condition. The effects of control mode and task load can be viewed graphically in Figure 3.

300 low

Reaction Time (Seconds)

weaponized). Tracking accuracy was defined as the number of vehicles actually tracked taken as a proportion to the number of vehicles that could potentially be tracked. An ANOVA of participants’ humvee tracking performance accuracy revealed a significant main effect for Control Mode, F(2, 28)= 4.3, p < .05. Tracking accuracy was significantly better in conditions in which participants had all control modes available, Plays/Scripts/Tools (M = 61.7, SE = 8.6), than Tools only (M = 46.1, SE = 6.3) or Plays/Tools (M = 49.4, SE = 6.2). The effects of control mode and task load can be viewed graphically in Figure 2. The ANOVA also revealed a main effect for task load, F(1,14)= 23.083, p < .05, such that Tracking accuracy was superior in the context of low task load trials (M = 66.7, SE = 7.8) than high task load trials (M = 38.1, SE = 4.4). A similar effect was found for Task Load with regard to reaction time, F(1,14) = 39.54, p < .05. Time to track target was significantly faster for low task load conditions (M = 71.76, SE = 4.6) than for high task load conditions (M = 120.7, SE = 6.3). None of the other sources of variance in the analysis of tracking performance accuracy and time to track attained significance.

250

high

200 150 100

low

50

high

0

70

Tools

60

1501

Plays/Tools

Plays/Scrips/Tools

Control Mode

50

Figure 3. Reaction time to prosecute unexpected event as a function of Control Mode for both Low and High task load. Error bars are standard error.

40 30

Discussion

20 10 0 Tools

Plays/Tools

Plays/Scripts/Tools

Control Mode

Figure 2. Percentage of humvees successfully tracked as a function of Control mode for both low and high task load.

Secondary task Performance Accuracy on the secondary task (civilian vehicle painting) was defined as the number of vehicles actually painted taken as a proportion to the number of vehicles that could potentially be painted. An ANOVA of participants civilian vehicle painting performance revealed a significant main effect for task load, F(1,14) = 30.70, p < .05, such that civilian vehicle painting accuracy was more efficient in the context of low (M = 43.8, SE = 3.3) rather than high task load (M = 26.3, SE = 2.4). None of the other sources of variance regarding civilian vehicle painting accuracy or time to paint attained significance. NOPE Event Performance In all trials, one NOPE event presented that required participants to prosecute a weaponized vehicle which appeared well outside of the pre-planned NAIs. Regardless of which control mode the participant was using, navigating the UAV to

The overarching goal of the current study was to test the efficacy of adaptable, delegation style automation using a new task environment requiring an operator to control three UAVs and to examine potential costs under conditions where unexpected events occurred that were a “poor fit” for automation. Previous studies have shown an advantage for flexible delegation interfaces using the Playbook architecture in a simulated robotic game of “capture the flag” (Parasuraman et al., 2005; Squire et al., 2004). This study extends that research by using a more sophisticated display that more closely mirrors the operation and control of field tested UAVs. One of the goals of this investigation was to determine the efficacy of a Playbook interface under varying task load conditions. Results indicated that performance was best under conditions in which maximum flexibility was available to the user (Plays/Scripts/Tools) and that this result persisted in the context of high task demand. Presumably, flexible automation delegation allows participants to manage higher task demand as other tasks are delegated to an automated agent and yet, at least within this context, to maintain sufficient awareness of the situation and of the use of lower levels of control so as to be able to effectively use them when the need arises. The effects offloading tasks may have on perceived mental workload was not directly examined in this study, and future studies using the MUSIM interface should incorporate subjective measures of mental workload (e.g. NASA-TLX).


The results pertaining to performance in the face of an unexpected NOPE event showed an advantage in dealing with this event for conditions in which the maximal level of delegation was available. When a “poor automation fit” event occurred, overall performance of the NOPE task itself was managed more efficiently when play calling was available, even though plays were not used for this event. Far from providing support for loss of efficiency during intervals where plays were “non-optimal”, this finding shows that the availability of plays to aid in other, concurrent tasks helped operators react more quickly to events for which plays were not suitable. This finding supports the notion that the delegation commands available in this work domain were in fact flexible enough to allow the operator to deal with the unexpected event in a more time-efficient manner. Generally, results support the notion that flexible delegation allows operators to adapt more efficiently to unanticipated situations—using high levels of abstract automation control when effective, but reverting to lower level control when needed. Adaptable delegation approaches may attenuate some of the negative effects of “automation surprises” (Sarter, Billings, & Woods, 1997) and overreliance, and minimize the risks of traditional automation design. However, studies examining the effects of flexible automation on subjective mental workload suggest that workload may not be attenuated and actually may be increased as operators decide what to automate (Kirlik, 1993; Squire et al., 2004). This observation illuminates the need to further examine the costs and benefits of flexible delegation. Regardless, adaptable automation should encourage human engagement with automation as it is a method of true supervisory control, and should mitigate such effects as complacency and reduced situation awareness. Future studies warrant the incorporation of performance and subjective metrics to examine these variables. Acknowledgements This work was supported by a Small Business Innovation Research grant from the U.S. Army Aviation Applied Technology Directorate, Contract # W911W6-08-C-0066. Special thanks to Jay Shively, our Project Director, and to Lisa Fern, both of the U.S. Army’s Aeroflight Dynamics Directorate (AFDD). AFDD has provided extensive guidance and access to the MUSIM simulator and prior experimental materials and results to support this work. Thanks also to David Musliner and Josh Hammell of SIFT who modified MUSIM as required for these experiments. References Campbell, M. D'Andrea, R. Schneider, D., Chaudhry, A., Waydo, S., Sullivan, J., Veverka, J., & Klochko, A. (2003). RoboFlag games using systems based hierarchical control, Proceedings of the American Control Conference. Denver, Colorado Dixon, S.R., Wickens, C.D., and Chang, D. (2005) Mission control of multiple unmanned aerial vehicles: a workload analysis. Human Factors, 47, 479-487. Endsley, M., & Kiris, E. (1995). The out-of-the-loop performance problem and level of control in automation. Human Factors, 37, 381–394.

1502

Fern, L., & Shively, R. J. (2009). A comparison of varying levels of automation on the supervisory control of multiple UASs. In Proceedings of AUVSI’s Unmanned Systems, North America, Washington, D.C. Kirlik, A. (1993). Modeling strategic behavior in human-automation interaction: Why an “aid” can (and should) go unused. Human Factors, 35, 221–242. Miller, C., Goldman, R., Funk, H., Wu, P., & Pate, B. (2004). A playbook approach to variable autonomy control: Application for control of multiple, heterogeneous unmanned air vehicles. In Proceedings of FORUM 60, the Annual Meeting of the American Helicopter Society (pp. 2146–2157). Alexandria, VA: American Helicopter Society. Miller, C.A., Parasuraman, R (2007) Designing for flexible interaction between humans and automation: Delegation interfaces for supervisory control. Human Factors, 49, 57-75. Parasuraman, R. Galster, S., & Miller C. (2003). Human control of multiple robots in the RoboFlag simulation Environment. In Proceedings of the 2003 IEEE International Conference on Man, Systems and Cybernetics. Washington, D.C. Parasuraman, R., Galster, S., Squire, P., Furukawa, H., & Miller, C. (2005). A flexible delegation interface enhances system performance in human supervision of multiple autonomous robots: Empirical studies with RoboFlag. IEEE Transactions on Systems, Man, and Cybernetics. Part A: Systems and Humans, 35, 481-493. Parasuraman, R., Sheridan, T.B. & Wickens, C.D. (2000). A model for types and levels of human interaction with automation,” IEEE Trans. Syst.,Man, Cybern. A, Syst. Humans, vol. 30, pp. 286– 297. R. Parasuraman and E. A. Byrne, “Automation and human performancein aviation,” in Principles of Aviation Psychology, P. Tsang and M.Vidulich, Eds. Mahwah, NJ: Erlbaum, 2003, pp. 311–356. Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39, 230–253. Sarter, N.B., Woods, D.D., and Billings, C.E. (1997). Automation Surprises. In G. Salvendy (Ed.), Handbook of Human Factors and Ergonomics (2nd edition) (pp. 1926-1943). New York, NY: Wiley. Squire, P.N., Galster, S. M., & Parasuraman, R. (2004) The effects of levels of automation in the human control of multiple robots in the Roboflag simulation environment. In Proceedings of the 2004 Human Performance, Situation Awareness, and Automation Technology Conference II. Daytona Beach, FL.