a model for predicting human trust in automated systems - CiteSeerX

5 downloads 62790 Views 197KB Size Report
subjective measure of trust in automation is gaining increasing importance. This area has a significant influence on the use of automation, especially in the ...
Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003

A MODEL FOR PREDICTING HUMAN TRUST IN AUTOMATED SYSTEMS Mohammad T. Khasawneh1, Shannon R. Bowling2, Xiaochun Jiang3, Anand K. Gramopadhye2, and Brian J. Melloy2 1

Department of Systems Science and Industrial Engineering State University of New York at Binghamton Binghamton, New York 13902-6000 Corresponding author: [email protected] 2

Department of Industrial Engineering Advanced Technology Systems Laboratory Clemson University Clemson, South Carolina 29634 -0920 3

Department of Industrial and Systems Engineering North Carolina A&T State University 1601 E Market Street, Greensboro, North Carolina 27411 Abstract: As a result of recent advancements in computer technology, hybrid inspection systems, those in which humans and computers work cooperatively, have found increasing application in quality control processes. These systems are preferable because they take advantage of the superiority of humans in rational decision-making and adaptability to new circumstances. Although the classical performance issues of speed and accuracy have been studied in this regard, the subjective measure of trust in automation is gaining increasing importance. This area has a significant influence on the use of automation, especially in the context of function allocation, an important consideration in achieving quality inspection performance. Thus, in order to improve inspection performance, this research addresses the issue of trust within the context of a manufacturing inspection task. Specifically, this research develops a model of human trust for hybrid inspection systems using a quantitative approach that relates both, machine properties to an operator’s perceptions thereof. 1. INTRODUCTION The most important implication of this research is that it extended the work of Muir (1987), Lee and Moray (1992, 1994), Muir (1994), and Muir and Moray (1996) on trust in human-machine systems. Bonnie Muir (1987) wrote a groundbreaking paper on the subject by modifying human-machine trust paradigms from psychology and sociology literature. In an effort to move from the realm of interpersonal interactions between humans and machines, Muir crossed the two taxonomies of Barber (1983) and Rempel et al. (1985) to form what she considered a more complete two-dimensional taxonomy of human-computer trust. Muir’s work was followed by two empirical studies by Lee and Moray (1992, 1994) who adjusted her trust taxonomy by experiments on college students using a simulated pasteurization plant. They suggested that the trust dimensions of Barber (1983), Remple et al. (1985), and Zuboff (1988) are actually complementary. They considered a supervisory control task of simulated juice pasteurization plant, and their investigation expected to find correlation between trust and allocation of tasks to automation. However, their results showed oppositely that as trust declined, the use of automatic controller went up. This may have been due to the fact that most subjects began with manual control, and looked for automation help when pre-programmed faults began occurring (Eidelkind, 1995). The results of the first experiment led to the presumption that allocation decisions are based more on the loss of confidence in one’s own ability to perform flawlessly rather than a sudden increase in trust of the automation (Lee and Moray, 1992). The follow-up study (Lee and Moray, 1994) was successful in showing that task allocation is affected by trust as well as self-confidence. However, just as in the previous study, regardless of the quality of automation, subjects were still more inclined to start manually and were unlikely to switch allocations, tending to stay with their original decision (Eidelkind, 1995). Issues of trust as an intervention variable are particularly interesting in hybrid systems, where systems may fail due to an incorrect intervention decision on the part of a human operator. Research from both social science and engineering viewpoints agree that trust is a multidimensional concept, reflecting a set of interrelated perceptions such as the reliability and predictability of an entity, and the actions of a human involving the use of an automated system (Llinas et al.,

216

Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 1998). Origins of the work done in trust can be found in the social science literature, which has essentially looked at trust between humans. Deutsch (1958) claimed that trust consists of expectation (predictability) and motivational relevance, whereas Rotter (1967) defined trust in terms of the expectancy of one individual/group that statements of another individual/group can be relied on. Barber (1983) defined trust as the subjective expectation of future performance and described three types of expectations related to the three dimensions of trust proposed: persistence of natural and moral laws, technically competent performance and fiduciary responsibility. Rempel et al. (1985) developed a time-based model and concluded that trust would progress in three stages over time, from predictability, to dependability, to faith. Recent work pertaining to trust in machines has essentially drawn from the aforementioned early work on trust in humans (e.g., Muir, 1987 and 1994). It has expressly looked at trust in process-control systems (Sheridan, 1988; Lee and Moray, 1992; Muir and Moray, 1996; Jian et al., 2000). For example, Lee and Moray (1994) and Muir and Moray (1996) studied issues of human trust in simulated semi-automated pasteurization plants and measured trust subjectively using rating scales and objectively by logging participants’ actions. Their study showed that an operator’s decisions to utilize either automated or manual control depended on their trust in the automation. Moreover, their results showed that trust was dependent on current and prior levels of system performance, the presence of faults and prior levels of trust. Similar findings were also reported by Zuboff (1988) and Sheridan (1988). Since then various researchers have tried to understand the role trust plays in system performance for a wide range of complex automated systems, such as air traffic control (Masalonis and Parasurman, 1999) and antiaircraft warfare (Jian et al., 2000). The results of all the recent studies point to one important conclusion - that trust is an important intervening variable between an automated system and its use and subsequent performance. That is, people may or may not use a system because of their trust in it, which in turn is driven by their experience using or relying on the system. To understand and predict the use of automation, then, it is necessary to specify factors and system characteristics that affect an operator's trust. Beyond the aforementioned work, limited research has been done on the concept of trust. Furthermore, the majority of the work that has been done is theoretical and unsubstantiated (by empirical studies). Although similar in content to previous studies, the current research takes advantage of further advances in computer technology to study issues related to human-computer interaction in a simulated industrial environment. In particular, this research developed a model of trust in hybrid inspection that will enable us to understand system output by linking human trust to changes in system variables. This model was guided by the two earlier ones developed by Barber (1983) and Rempel et al. (1985), both of which were later integrated by Muir (1994), to explain how trust changes in process control situations. Barber’s (1983) model defines trust in terms of a taxonomy of three specific expectations while Rempel et al.’s explains trust based on experience. Thus, they inherently model trust as a function of time, describing it as a hierarchy, with trust at any one stage based on the outcome of earlier stages. The proposed model, however, represents trust in terms of system parameters. Specifically, the model developed proposes that human trust in computerized inspection systems can be predicted by assessing the computer responses. To study this issue, a hybrid inspection simulator (Jiang et al., 2002) of printed circuit boards containing six categories of defects will be considered. These categories are: missing component, wrong component, inverted component, misaligned component, trace defects and board defects. In real-life each one of these six is associated with a different severity level. For example, a slightly misaligned component is not as severe as a missing component. The occurrence of these different types of defects determines the functionality of a printed circuit board. Therefore, to illustrate the severity of the different possible defects that can occur on a printed circuit board, a defect weight will be assigned to each as shown in the Table 1. Table 1. Sample Defects and their Associated Weights Defect Name Missing or Wrong Component / Soldering Joints / Open or Short Inverted Capacitor/IC/Transistor / Significantly Misaligned Component Slightly Misaligned Component / Inverted Resistor Slight Copper Overlay / Slight Board Defect / Wrong Board Number

Weight 3 3 1 1

Based on the total number, the weight of the defects on any of the boards selected for this study can range from 0 (100% Good) to 6 (100% bad). Although a higher upper limit for the defect weight is possible, only this weight range will be considered in this research. In a hybrid inspection system, the computer’s classification of the board as “good” or “bad” depends on the total weight of the defects present. This classification scheme was incorporated into the simulated hybrid inspection simulator. For example, based on this table, if the total weight of defects present on a board is 0 (100% good board) and the computer decides to reject the board with 100% confidence, the error made by the computer is considered

217

Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 very severe. For the purpose of this study, the overall human trust in the computer was modeled as a function of the computer’s response and the actual state of the printed circuit board. The following section presents the methodology that was used to achieve the objectives of this research. 2. METHODOLOGY 2.1 Participants Twelve participants, both graduate and undergraduate students, enrolled in the Department of Industrial Engineering at Clemson University were used in this experiment. Student participants can be used in lieu of inspectors because as Gallwey and Drury (1986) have shown, minimal differences exist between inspectors and student participants on simulated tasks. The participants were screened for 20/20 vision, corrected if necessary, and were paid $5.00/hour for their time. Guidelines set forth in the Institute of Review Board of Clemson University Regulations of Human Factors Studies were strictly followed. 2.2 Stimulus Material The task was a simulated visual inspection of a printed circuit board implemented on a Pentium III computer with a 19inch, high-resolution (1024 x 768) monitor. The input devices were a Microsoft standard keyboard and a Microsoft mouse. The task consisted of inspecting simulated PCB images developed using Adobe PhotoShop 5.5 for the six categories of defects − missing components, wrong components, inverted components, misaligned components, trace defects and board defects. Four categories of these defects could occur on any of these four individual components -- resistors, capacitors, transistors or integrated circuits. Figure 1 shows a sample of the different defects that could occur on the printed circuit boards under investigation.

(b) Wrong Resistor (c ) Inverted Resistor Figure 1. Illustrations of Sample Defects 2.3 Inspection Task The hybrid inspection system used in this research can run in different modes. For the purpose of this study, the hybrid inspection simulator was modified to operate in a supervisory-control mode, where the human operator is in charge only of monitoring the performance of the inspection system. In this mode, the computer performed the search for defects and made a decision on the state of the board (i.e., the total weight of defects). During the visual search, PCB boards containing 1, 2, 3, or no defects were inspected by the computer, whose task was to locate all potential defects, name them, and determine the total weight of defects present on the board. After the computer completed the visual inspection task, the participants were provided with information on the actual state of each board in addition to the computer’s response, and based on this information, they were asked to rate their trust in the system. Once the board was inspected, the image of the next was presented to the participants. Each inspection task consisted of 200 randomly ordered PCB. 2.4 Experimental Design As stated earlier, the independent variables of interest in this study were the computer’s response (C) and the actual board state (A). The values for each one of these variables were manipulated using the hybrid inspection simulator and ranged from zero to six (a total of 49 possible combinations). Each treatment combination was replicated forty times. All participants went through all the experimental conditions. The order the participants exposed to the treatment combinations was randomized. The duration of each experimental session was about 30 minutes. 2.5 Experimental Procedure

218

Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Las Vegas, Nevada, USA, November 10-12, 2003 The study took place over a 11-day period. Day One was devoted to training the participants, and during the next 10 days, data were collected on the criterion tasks. On the first day of the experiment, each participant was required to complete a consent form and was then introduced to subjective rating concepts before filling out a set of questionnaires. Following this step, instructions were read to the participants to ensure their understanding of the experiment. Next, all were trained and given three separate tests before beginning the experiment. Initially, the participants were introduced to basic PCB inspection terminology and familiarized with the computer program. Following this step, the participants were quizzed on their knowledge of the operation of the software, and correct answers were supplied for incorrect responses. The participants were then trained to recognize different types of defects by being shown instances of each, including names and probable locations. Then, training was provided on the guidelines used to classify the PCB board as conforming or nonconforming, based on the weight of defects present on the board. 2.6 Data collection Data were collected on subjective measures using questionnaires. The first part asked participants for demographic information, including age, gender, and education. During the experiment participants were asked to rate their trust in the system for every board on a 0-100 scale, and then they were asked to rate their overall trust in the system at the end of each experimental session. To ensure reliability, the participants used in this study were introduced to subjective rating concepts before the experiment. This was achieved using the material adopted and modified from Lee and Moray’s study (1992). 3. RESULTS Statistical analyses were performed on the participants’ average responses on trust to determine whether and how it changed as the computer’s response and state of the board changed. The following sections presents a description of the analysis performed and the results obtained. 3.1 Analysis of Variance An ANOVA was conducted on the participants’ trust in the system. This analysis revealed a significant computer × actual interaction (F(1,45) = 110.14, p < 0.01) and a significant main effect for both computer (F(1, 45) = 80.49, p < 0.01) and uncertainty (F(1, 45) = 80.92, p < 0.01). Since the interaction effect was significant, indicating the importance of both factors and their effect on human trust, further analysis were performed to depict this relationship. 3.2 Regression Analysis Since the interaction effect was significant, indicating the importance of both factors and their effect on human trust, further analysis were performed to depict this relationship. Therefore, regression analysis was performed in order to obtain the best model that best describes the changes of human trust as a function of the computer’s response (C) and the actual state of the board (A). A summary of the results for a multiple regression model that consists of all possible quadratic terms is shown in Table 2. As can be seen, the C2 and A2 terms were not significant. Table 2. Summary of the Regression Analysis Results Parameter Estimate Standard Error t Value Pr > |t| Intercept 99.94 10.93 9.14 |t| Intercept 98.10 7.07 13.88