Learning the Structure of Tacit Skills - Semantic Scholar

2 downloads 0 Views 116KB Size Report
The original cloning experiments sought to build situation-action rules that, ... The data from each flight were segmented into the seven stages described ...
Learning the Structure of Tacit Skills

Claude Sammut School of Computer Science and Engineering University of New South Wales Sydney 2052 Australia [email protected]

Abstract Behavioural cloning seeks to build a model by learning from the traces of a skilled operator’s behaviour. These models should be both robust and explainable. The original cloning experiments sought to build situation-action rules that, together, behaved as a reactive controller. As a result of our research to date, it has become clear that it is difficult to represent tacit skills simply as f l a t situation-action rules. Complex tasks have as inherent structure and it appears to be necessary t h a t a behavioural clone reflects that structure.

INTRODUCTION The aim of this project is to investigate the use of behavioural cloning to model the behaviour of an expert skilled in a particular task, such as a pilot flying an aircraft. Behavioural cloning seeks to build a model by learning from the traces of a skilled operator’s behaviour. Thus, if a human is capable of performing some task, rather than ask him or her to explain how the task is performed, we ask to be shown how. Machine learning is used to create a symbolic description of the skill where introspection by the human operator fails because the task is performed subconsciously. The first demonstration of behavioural cloning was by Michie, Bain and Hayes-Michie (1990) on the task of pole balancing. Later, Sammut, Hurst, Kedzier and Michie (1992) modified a flight simulation program to log the actions taken by a human subject as he or she flew an aircraft. The log file was used to create the input to an induction program. The output from the induction program was tested by running the simulator in autopilot mode where the autopilot code was derived from the decision tree formed by induction. There are two main goals in this research. The first is that the subcognitive activities required to perform a highly skilled task can be modelled in such a way that the skill is explainable and t h a t the model is operational. By explainable we mean that the output from the learning program should be in a form that can be read and understood by a human. The second goal is that the model should be sufficiently explicit that a computer implementation of it can perform the task in a robust manner. That is, the model should be general enough that it is capable of operating in a wide variety of conditions. The original cloning experiments sought to build situation-action rules that, together, behaved as a reactive controller. While we were able to build such controllers, this approach failed to achieve our goals of explainability and robustness. To solve these problems we are now investigating t h e goal structures of tacit skills. Thus rather than learning simple situation-action rules we now attempt to learn the goals of the operator and what actions must be taken to achieve those goals. In the following sections we briefly describe our original experiments. We discuss the limitations of the methods used and then we discuss the new methods designed to avoid these limitations.

LEARNING TO FLY A basic flight simulation program running on a Silicon Graphics workstation was used in our initial experiments. The central control mechanism of the simulator is a loop that interrogates the aircraft controls and updates the state of the simulation according to a set of equations of motion. Before repeating the loop, the instruments in the display are updated. The display update was modified so that when the pilot performs a control action by moving the control stick or changing the thrust or flaps settings, the state of the simulation is written to a log file. A number of pilots each flew t h e same flight plan 30 times to acquire data for learning. In the original experiments, the “pilots” were not qualified to fly a real aircraft. At the start of a flight, the aircraft points North, down the runway. The subject is required to fly a well-defined flight plan that consists of the following manoeuvres: take off and fly to an altitude of 2,000 feet; level out and fly to a distance of 32,000 feet from the starting point; turn right to a compass heading of approximately 330˚; at a North/South distance of 42,000 feet, turn left to head back towards the runway; line up on the runway; descend; land on the runway. During a flight, up to 1,000 control actions could be recorded. The data recorded in each event consisted of 17 numeric attributes and three discrete attributes. The numeric attributes included t h e orientation and position of the aircraft as well as the rates of change of this values and the setting of the controls. The elevation of the aircraft is the angle of the nose relative to the horizon. The azimuth is the aircraft’s compass heading and the twist is the angle of the wings relative to the horizon. The elevator angle is changed by pushing the mouse forward (positive) or back (negative). The rollers are changed by pushing the mouse left (positive) or right (negative). Thrust and flaps are incremented and decremented in fixed steps by keystrokes. The angular effects of the elevator and rollers are cumulative. For example, in straight and level flight, if the stick is pushed left, t h e aircraft will roll anti-clockwise. The aircraft will continue rolling until the stick is centred. The thrust and flaps settings are absolute. Originally, Quinlan’s C4.5 (Quinlan, 1993) program was used to generate flight rules from the data. Subsequent experiments have also used a variety of regression tree algorithms. Even though induction programs can save an enormous amount of human effort in analysing data, in real applications it is usually necessary for the user to spend some time preparing the data. The learning task was simplified by restricting induction to one set of pilot data at a time. Thus, an autopilot has been constructed for each of the three subjects who generated training data. The reason for separating pilot data is that each pilot can fly the same flight plan in different ways. For example, straight and level flight can be maintained by adjusting the throttle. When an airplane’s elevation is zero, it can still climb since higher speeds increase lift. Adjusting t h e throttle to maintain a steady altitude is the preferred way of achieving straight and level flight. However, another way of maintaining constant altitude is to make regular adjustments to t h e elevators causing the airplane to pitch up or down. The data from each flight were segmented into the seven stages described previously. In the flight plan described, the pilot must achieve several, successive goals, corresponding to the end of each stage. Each stage requires a different manoeuvre. Having already defined the sub-tasks and told the human subjects what they are, the learning program was given the same advantage. In each stage four separate decision trees were constructed, one for each of the elevator, rollers, thrust and flaps. A program filtered the flight logs generating four input files for the induction program. The dependent variable or class value is the attribute describing a control action. Thus, when generating a decision tree for flaps, the flaps attribute is treated as the class value and t h e other columns in the data file, including the settings of the elevator, rollers and thrust, are treated as ordinary attributes. After processing the data as described above, they can be submitted to C4.5 to be summarised as rules that can be executed in a controller. To test the induced rules, they were used as the code for an autopilot. A post-processor converted C4.5’s decision trees into if-statements in C so that they could be incorporated into the flight simulator easily. Hand-crafted C code determined which stage t h e flight had reached and decided when to change stages. The appropriate rules for each stage were

then selected in a switch statement. Each stage had four, independent if-statements, one for each action.

PROBLEMS ARISING FROM EARLY EXPERIMENTS AND THEIR SOLUTION The learning scheme, just described, produced behavioural clones that could successfully pilot t h e simulated aircraft through the flight plan described earlier. Despite the success of the original experiments, a number of problems were identified. 1.

The behavioural clones were not very robust with respect to changes in flying conditions.

2.

The clones were only applicable to a particular flight plan and were not flexible enough to f l y any other plan.

3.

The decision trees that were produced were very large. Thus we did not achieve our goal of having the output of learning understandable.

Robustness Originally, the flight simulator did not model atmospheric effects. That is, the simulation did not include turbulence or wind drift. Under these conditions, it was sufficient for a clone to simply reproduce actions similar to those of the trainer when the aircraft reached a particular point in t h e flight. If the simulation were to be perturbed in any way, the clone could not cope because t h e original training data did not contain sufficient information for the induction program to generalise. This limitation was avoided by introducing atmospheric effects into the simulation. Turbulence was approximated by small random displacements to the flight path. The degree of turbulence is an adjustable parameter that can be used to experiment with the robustness of the clone. Wind drift was approximated by systematic displacements of the aircraft. With the introduction of atmospheric effects, the simulation now has a degree of variation that requires the trainer to provide examples of how to correct for unexpected deviations from a trajectory. Therefore, the data contain sufficient information for the induction algorithm to generalise from the pilot’s behaviour. Experiments with the new simulator were conducted with the aid of a qualified pilot. Rather than attempt to learn a complete flight plan, clones were built for separate manoeuvres such as straightand-level flight, turning, etc. Here, the attributes for induction are now deviations from target values, rather than the absolute values used in the original experiments. In the case of straight-and-level flight, the pilot was asked to maintain a given heading and altitude, compensating for turbulence. This was repeated for a number of different headings and altitudes and different degrees of turbulence. The data collected from these flights were input to C4.5 as in the previous experiments. It was found that the most effective scheme for learning to perform a manoeuvre is to break the task into a “goal seeking” phase and a “station keeping” phase. That is, we determined bounds around the required target values. As long as the aircraft was within those bounds, one set of decision trees was used to keep the aircraft within those bounds. Another set of decision trees was used when the aircraft strayed outside the bounds due to severe disturbances or at the beginning of the manoeuvre. Thus, the training data were divided into two sets and two sets of trees were induced separately. Using this method we were able to produce robust clones that can tolerate disturbances and are flexible enough that a wide range of target values can be specified and achieved. The degree of tolerance to disturbances depends on the training data. The clone will perform successfully as long as the disturbances are the same or less than anything seen in the training data. If a disturbance is larger than anything the clone was trained on, then it may not be able to cope. Thus, to produce a truly robust clone, it must be trained on a wide variety of possible situations. Flexibility Our original clones were only capable of flying a particular flight plan. We could not specify a different mission and expect the decision trees to be able fly the correct plan. This is because all of

the data were collected from complete flights. In discovering how to make the behavioural clones more robust, we concentrated on particular manoeuvres. We also specified task in terms of deviations from given target values rather than absolute values. These modifications made i t possible to construct more flexible clones. We are now able to build a library of behaviours for different manoeuvres and invoke the behaviours as required to meet the goals of different stages of the flight plan. Michie and Sammut (1995) have suggested a two-level architecture in which an expert system acts as a high-level decision maker that invokes low-level behaviours as required during t h e performance of a task. To assist this approach, we are developing a suite of machine learning algorithms that are embedded in a Prolog programming environment. The output of the learning algorithm can be asserted as a procedure in a Prolog and thus, it is easy to incorporate a learned behaviour into a larger program. We intend to use this as part of a larger investigation into the applicability of inductive logic programming to behavioural cloning. One of the difficulties in learning a manoeuvre is how to characterise the trajectory which t h e aircraft should follow. Srinivasan and Camacho (1996) have developed a method which uses a combination of inductive logic programming and traditional statistics to accomplish this. Most ILP systems are capable of using background knowledge to learn the description of a new concept. Srinivasan’s P-Progol (Muggleton, 1995), includes a number useful built-in predicates as background knowledge, these include a regression package and predicates for describing arcs, circles, lines etc. Progol learned to combine these predicates to construct a description of turn. This can, in turn, be used to specify the target trajectory for the clone. Simplifying Behavioural Clones by Learning to Achieve Goals In trying to understand why the original cloning method produced very large decision trees, we formulated the following hypothesis. By inputting the full state of the simulation along with t h e action performed, the induction program is generating a large set of situation-action rules. The clone has no knowledge of what it is supposed to achieve. It simply recognises what state it is in and acts according to its rules. Unfortunately, the state space of a flight is enormous. Thus a robust controller must necessarily have a very large set of rules to cover many possible situations. This standard form of behavioural cloning has proved effective in small domains, but the task specification must be altered to handle complex domains like piloting an aircraft. Constructing more informative attributes, such as descriptions of trajectories, can simplify t h e description of the state. However, this does not entirely solve the problem. Following a suggestion of Donald Michie, we have reformulated the problem into a two stage process (Bain & Sammut, 1996). • The first stage learns to models the effects of controls on state variables • The second stage learns which controls to apply to achieve specified goals. An example of an effects rules is shown in: Elevators Elevators Elevators Elevators

= = = =

-0.28 -0.19 0.0 0.9

→ → → →

ElevationSpeed ElevationSpeed ElevationSpeed ElevationSpeed

= = = =

3 1 0 -1

This gives the appropriate elevator settings required to obtain a particular elevations speed. An example of a goal rule is: Distance Height > Height > Height > Distance else

> -4007 1998 1918 67