A Hierarchical Distributed Planning Framework for

A Hierarchical Distributed Planning Framework for Simulated Battlefield Entities Jeremy Baxter and Richard Hepplewhite DERA St Andrews Road, Malvern, Worcs. WR14 3PS, UK (jbaxter , rth)@signal.dera.gov.uk

Abstract We describe a hierarchy of agents that incorporates a variety of different planning systems. Agents are able to plan activities and execute them within a framework designed to support devolved planning and execution of high level tasks in a manner similar to that found in military command structures. The framework allows the more reactive behaviours of agents at the lower level of the hierarchy to be guided by more deliberative planning from agents above them in the hierarchy. This allows the group as a whole to exhibit both reactive and deliberative behaviours. Examples are given of a constraint based planner and an ‘anytime’ route planner which both operate within the framework. The way in which execution is monitored and agents carry out replanning is described.

1. Introduction We are researching AI techniques suitable for providing intelligent opposing forces within battlefield simulations. In particular we are looking at using agent-based techniques to plan and execute the actions of tanks and groups of tanks. This paper describes how we use a command hierarchy of agents to continually plan and execute small unit tasks. In particular it concentrates on the planning and execution frameworks that we use and how goals and plans are distributed amongst agents. The paper gives examples of characteristics of planners with which are useful in this domain, in particular the ability to adjust their performance and to produce plans in an anytime manner. Considerable work has been done on Semi-Automated forces activities (Courtemanche and Ceranowicz 1995; McEnany and Marshall 1994) which use simple automatic responses to allow an operator to control several battlefield entities at once. These systems rely heavily on specialised rules and scripts and can show unrealistic behaviour (Meliza and Varden 1995) if left unsupervised. To produce agents capable of operating unsupervised for long periods of time more flexible approaches to automation are needed (Rosenbloom et al 1994, Hepplewhite and Baxter 1997). Planning within the battlefield domain is an extremely complex task. Most of the variables that characterise the problem (position, time, speed etc.) are continuous. The actions of numerous other battlefield agents, both friendly and hostile need to be considered since any plans that are made are likely to produce intelligent counter moves from opposing forces. The future, and to some extent the present, state of the battlefield is uncertain due to the possible destruction of other agents and potentially inaccurate sensor information. Simulations include hundreds or thousands of entities, so even with the use of distributed computing the amount of computing power available for each agent is limited. Furthermore in the training arena responses must be made in real time which further limits the complexity possible for individual agents. It is essential that the agents act co-operatively so they act as a unit. Effective co-operation in a military context results from a hierarchical command and control (C2) structure which gives a

Military Command Hierarchy Higher Level Objective

Command and Control

Squadron Commander

Intelligence

Troop Commander

Tank

Tank

Troop Commander

Tank

Troop Commander

Lower Level Objectives

Figure 1. Military Command Hierarchy framework for both planning and execution and includes the re-allocation of roles following attrition.

2. The Command Structure The command hierarchy used by the agents is shown in Figure 1 and is based upon the military command structure. It serves three main purposes, first by organising agents into groups along the same lines as the military formations they are trying to emulate the chance of getting plausible group behaviour is increased. Second it provides a framework to guide the communication between agents and third it allows the planning of complex group orders to be divided up into several smaller problems. As the diagram shows a single high level objective passed to the squadron commander agent is split up into three lower level objectives for its troop commanders who in turn each produce three lower level objectives for the tanks under their command. The commander agents consider ways of achieving their objective over longer time scales and distances than the level below them, but in less detail. Communication of orders passes straight down the hierarchy while intelligence information is shared between peers and communicated to superiors and therefore flows both across and up the hierarchy.

2.1 Command roles The commander agents must carry out a number of functions. They are responsible for gathering information about their own situation and passing it up the command chain to their superior and to their peers. Troop commanders for instance relay information to the Squadron commander, and the commanders of other troops in the squadron, about the position of their troop. The commanders are responsible for giving orders to their subordinates to achieve the orders given to them. They are also required to monitor the progress of the group towards achieving those orders and report to their superior once they have been completed. To fulfil this role they need to be able to

reason about how their local situation affects the orders they have and plan to achieve them. They also have to ensure that their subordinates know enough about the orders given to the group so that they could take over in the event of the commander’s demise.

2.2 Hierarchical Planning and Execution Having a hierarchy of agents assists the operation in real time by allowing us to divide up the planning problems between levels. This means that long term plans, which are expensive to make but less sensitive to short term changes in the world, can be made and executed by high level agents. These agents rely on the flexible performance of their subordinate agents to refine and carry out the plan. By giving a degree of autonomy to the subordinate agents they can be left to carry on executing the present plan while the higher level agent plans to update or replace the current plan. Similarly by relying on the ability of subordinate agents to carry out stop gap actions when the situation changes significantly the higher level agents can plan a response without being tied to the short reaction times needed by their subordinates. The whole system therefore looks like a distributed Hierarchical Task Network planner which is interleaving planning and execution.

3. Control of the Planning Processes Inside our system planning is controlled by an overall framework designed to support different types of planners and to control the way in which plans are executed. We do not presently distribute the planning for a single goal, instead we use a model of distributed execution where each agent is given its own goal to pursue (this may involve co-ordination with other agents). The higher level agents only plan to a relevant level of detail before producing a plan that consists of a series of tasks for their subordinates. The subordinates must plan how to execute these tasks within the guidelines set by the superior agent. Internally each agent can subdivide its task (goal) into a number of subgoals which in turn can be sub-divided until a level is reached where the tasks can be given to one of their subordinates or directly executed by the agent itself. Because of the varied nature of the tasks which agents have to perform support for several different types of internal planners is provided. The simplest type of planners are ‘one shot’ planners that produce a single plan for a goal and have to be fast to prevent delays in execution. Typically these simple planners perform pre-determined task decompositions based on simple features of the problem. More complex problems are dealt with by ‘anytime’ planners (Zilberstein and Russell 1993) that produce plans on demand, giving improved plans as more planning time is made available. Our system also incorporates a self adjusting, constraint based, planner which adjusts its own evaluation criteria as it gain more information about the details of the task being planned. Initially our support for planners concentrated on the one shot type where a task would be planned for and then executed. However as more complex behaviours were developed it became clear that this is only appropriate where the planning time is very short so that execution can start quickly. In cases such as complex route finding we found that the ‘anytime’ model suited our needs much better. This allowed plans to start executing and to be replaced if an improvement was later identified, preventing long periods of inaction by agents while plans were fully refined. In general execution is carried out in parallel with planning at multiple levels.

3.1 Speculative Planning Standard AI planning systems assume that the world is static and produce a complete plan that has sequences of actions that will be executed far into the future. The danger of this approach in a highly dynamic and uncertain environment is that the assumptions under which a plan is made will not hold when the time to execute them is reached and planning effort will have been wasted. Our system therefore normally operates on the assumption that an action is best planned in detail only when it is to be executed immediately. That means that the plans are kept as abstract as possible for as long as

possible. This works well if the planning time is short compared to the execution time, however where this is not the case a mechanism exists for agents to undertake speculative planning. Our agents are always executing some sort of plan, even if it is a very simple one such as remaining stationary. Planning can be taking place at all levels of the hierarchy simultaneously and several planners may be operating within an agent at the same time, on different levels of the same task. A prioritisation mechanism ensures that speculative planning can only take place when the current high level task is being actively pursued and speculative planning is allowed for one future task at a time in the order in which they would be executed. Providing this speculation mechanism has allowed us to balance tasks that take a long time to plan compared to their execution with those that take a short time to plan but a long time to execute. For example a troop commander may wish a group of tanks to defend a particular area. The bulk of the planning time is taken up by generating and comparing potential positions within the area assigned to the troop while the bulk of the execution time is normally consumed by driving to the assigned area. By assuming that the driving task will be completed successfully the agent can plan the subsequent task in parallel with it and hopefully be ready to execute the next action (taking up a good defensive position) by the time the first completes. The danger with this approach is that subsequent tasks may be overtaken by events and rendered inappropriate or unnecessary so the planning effort was wasted. We presently hand select those tasks suitable for speculative planning but automatic selection, based on expected planning and execution times and the probability of success for the current task, would be an interesting area for future work in this field.

3.2 A Self Adjusting Planner Since the goals given to agents are flexible enough to allow them to plan many different ways of achieving them we have made use of self adjusting planners capable of balancing many potentially conflicting constraints. This allows a high level agent to issue an order along with expectations about how it is to be achieved and allow the subordinate agent to plan and re-plan, trading off different constraints to keep within the guidelines set by its superior. This flexibility prevents most local problems (such as the unexpected discovery of a new enemy vehicle) from propagating all the way to the top of the command chain since they can be dealt with locally. In other cases flexibility is need to ensure that the abstract plan made by the higher level agent can be adhered to while optimising its execution in the more detailed context considered by the subordinate commander. An example of one such self adjusting planner is the squadron assault planner (Hepplewhite and Baxter 1999), the planner automatically ascertains the difficulty with planning the problem and then adjusts appropriate parameters in an attempt to overcome the limitations. This makes the planner very robust against the particular situation in which it must plan. The squadron commander agent is given the general position of an enemy force, which is to be assaulted. The agent must determine a suitable fire support position and assault approach route for its own force (Figure 2). The general requirements for the fire support group are that it should be hull down to the enemy, be within weapons range but not too close, and have a reasonably concealed route to the position. Similarly, the assaulting group should approach the objective perpendicular to the fire support group’s line of fire, have a concealed route to their form up position, and break cover to the enemy at a reasonable distance.

R es po ns ib ili ty

Assault Approach Fire Support Axis

A re a

of

N

Fire Support Position 0

1

2 Km

Enemy Position Figure 2. Assault Example

As terrain is highly variable it is possible for the planner to fail during its initial attempt. This is generally caused by assumptions made about visibility thresholds, distances etc. Therefore, a planner must be able to detect the reason for the failure to enable it to make corrective measures before reattempting the planning (i.e. the planner can automatically adjust itself). To detect the cause of a planning failure the planner must be able to find out which of the factors it is considering cannot be justified or optimised. To achieve this a constraint-based planner (based on Logan, 1997) has been implemented, which can be configured to take a number of constraints. Each possible plan generated contains a number of costs that the planner tries to maximise in order. Currently the constraints can be bounded, or a value optimisation function. For the Fire Support position selection the constraints include the distance from the enemy, which should be no less than 1 Km but no more than 2.5Km. There must be a route to the position, the cost of the route should be minimised, and the cover available at the position should be maximised. When the points have been costed, the planner can examine how well the constraints have been satisfied and then decide if the "best" option is acceptable. If for example the route constraint is not satisfied, the planner adjusts itself to generate a different set of options. The planner maintains what adjustments have been made such that it can decide that there will never be a fully satisfactory solution to the problem. From this it can decide either to execute the plan or to inform its superior of its failure.

3.3 An Anytime Planner The troop commander motion planner is of the ‘anytime’ type. It produces initial plans for the complete route and continuously refines these plans until it reaches the optimal route, given its constraints and costing scheme. At any time during the planning process the current best plan can be taken and executed. The troop commander makes plans over a series of points based upon ‘significant’ terrain features, currently abstracted ridgelines. This is designed to enable troop commanders to consider large scale moves, such as driving round a hill, as a single step, and reduces the search space significantly to enable the planning to occur in a reasonable amount of time. Troop plans are currently based on consideration of the degree of concealment provided by a route from

Plan cost vs time for A*

Plan cost 25000 23000 21000 19000 17000 15000 13000

Figure 3. Example of ridge line abstraction .

11000 9000 0

20

40

60

80

100

120

140

160

Time (seconds)

Figure 4. Plan cost against Planning Time

known enemy positions To represent a ridge in the planning system a simple line is not sufficient, some idea of the size of the ridge is also important as are potential crossing points. The ridge lines are abstracted manually and a simple algorithm is used to extract the edge points. An example is shown in Figure 3. The cost function uses a combination of traversal time and exposure to enemy units. To model the effect of terrain, traversal time is based on the slope of the terrain profile. The speed reduction depends on how close the slope is to the maximum gradient the vehicle can climb. Only upward slopes have an effect and ‘impassable’ slopes are given an arbitrarily slow speed. The exposure of a route is calculated by using line of sight to find what percentage of a route segment (sampled at 50m intervals) is visible to observer positions. The costs are combined by multiplying the cost of exposed sections by a constant indicating the relative importance of speed and concealment. The search is a variant of A* modified to work with complete routes over a fully connected graph. The search is required to find the lowest cost solutions in the available time. The search starts by considering the cost of the route direct from the start to the goal and then considers all single stage routes (routes which travel via a single intermediate node). This gives an upper bound on the route cost and also identifies the direction in which search can be carried out with the lowest apparent branching factor. A* search then proceeds, in this direction. The heuristic function used is the straight line distance divided by the maximum speed of the vehicle. In general the branching factor at each node would appear to be the number of nodes in the graph. In practice it is considerably lower than this since nodes can be rejected if the expected path cost through them exceeds the current lowest cost plan. When a lower cost route to a node is found the planner checks the (known) cost to directly complete the route to see if this provides a cheaper route than has been found so far and updates its plan accordingly. This means that at any time the best known route can be executed, allowing the troop commander to respond quickly when necessary. The search terminates when no node has any possible extensions that could yield a lower cost route. Figure 4 shows how the initially high cost for a route was improved over time with an initial very fast reduction in cost as the search quickly identified how to avoid areas which were very exposed to the enemy and then showing a steady improvement as the route was refined.

4. Plan Execution and Monitoring The battlefield domain is highly dynamic with numerous semi-independent agents altering the world. In addition to this the information available about the world is restricted by the capabilities of a vehicle’s sensors and may be inaccurate. The success of actions cannot be guaranteed because

agents may be destroyed or damaged during execution. Other features such as changing objectives and interruption of actions also feature in the problem because the dynamism of the world combined with uncertain and incomplete information mean that it is extremely rare that a plan can be executed in its entirety. This leads to a requirement that the plan execution system should be fully aware of the assumptions used when the plan was made and be able to either adapt or reject an action by comparing the present situation with the situation envisaged when the action was incorporated into the plan. In such a fluid situation the plan execution system needs means of rapidly assessing plan failure. In our application this means that the plan execution system has to monitor several features. First, and most obviously the potential achievement of the present goal has to be monitored, second the system must check if any of the constraints on the plan have been violated and third it must consider whether any of the assumptions used in the planning process no longer apply.

4.1 Monitoring Assumptions and Constraints An assumption represents some feature of the environment that the planner has identified as crucial to the success of the action being performed. For example a movement plan for a tank agent comes with an assumption about the acceptable level of risk. While the tank agent is considering actions for inclusion within a movement plan it predicts the threat which each known opposing agent would pose to it during that action, (opposing agents are assumed to keep on a constant heading and velocity). The threat level predicted for the most dangerous opponent to an action is recorded and used as an assumption of the largest individual threat level that will be perceived during the action. If during execution of that action within the plan an entity is encountered which poses a greater threat than was foreseen, either due to a known entity moving to a more dangerous position or a previously unknown entity being detected, then re-planning takes place. A constraint represents some state of the world the planner was attempting to avoid. Its violation requires an adjustment to the plan. Examples are that an agent should be in a loose formation with respect to other friendly agents, or should be within a specified corridor or area. Violation of a constraint triggers a re-assessment of the plan so that actions that restore the desired state can be considered. In an ideal system the particular constraint or assumption violated would be used to guide the planner in a plan repair process rather than requiring a full re- plan, as is currently the case in our system. Care has to be taken to prevent transient violations of constraints from causing unnecessary replanning episodes. For example, in the transition period between actions the maximum threat assumption has to be set to be the greater of the levels associated with both actions to allow time for any reduction in threat, predicted by the execution of the new action to be detected.

4.2 Using Generalised Actions One of the ways we have attempted to prolong the life of plans in dynamic situations is to use generalised actions. The predicted effects of these actions are evaluated in detail by the planner and are then re-evaluated when they are executed to conform to the situation as it is at the moment of execution rather than the situation envisaged by the planner. This allows minor corrections to be made as an action is executed and for the detection of inappropriate actions if the situation has changed drastically. This is similar to the Reactive Action Packages approach (Firby 1989). As an example one of the actions available to tank agents is ‘face threat’ where a tank turns to face the direction of hostile fire to ensure any hits it receives will be on its stronger frontal armour. The heading it faces may well be different from the heading that would have been generated when the plan was made since the opposing tank may have moved to a different position. The action is flexible enough to cope with such minor differences. If the opposing tank has been destroyed or retreated out of sight the action is no longer appropriate and would therefore trigger a re-planning episode. One of the dangers of this approach is that a series of minor corrections to actions can accumulate

Figure 5. Initial troop plan and re-plan on spotting the enemy to give an overall outcome radically different from that envisaged by the planner. In our experience these differences are usually detected by the violation of a constraint or assumption. In the case of tank agents plans are also fairly short, typically consisting of two to five actions each lasting ten to fifteen seconds which reduces the possibilities for differences to accumulate. Since the system is to operate in real time temporary replacements for inappropriate actions are necessary to fill in until a full new plan is available. These replacements need to be linked to the way in which the action failed. For example if a ‘face threat’ action fails due to lack of a threat it is reasonable to replace it with a ‘go-aim-goal’ action which causes the tank to head for its present goal position. Actions that fail due to increased threat levels are replaced by ‘face threat’ actions as a stopgap measure until a plan accounting for the new or increased threat is ready.

4.3 Executing Troop Level Plans. The troop route planner described in Section 3.3 above produces plans that are based on known enemy positions. During execution it is assumed that no new enemies will be detected. If an enemy which was not previously known about is detected the troop commander agent can re-plan the route. The behaviour this produces can be seen in Figure 5. The blue troop initially planned to cross the ridge in front of it and proceed towards their goal along the northern edge of the ridge. As it reached the top of the hill one of the tanks spotted the red troop in the valley and informed the troop commander. This resulted in a re-plan which caused the blue troop commander to adjust its plan to follow the southern slopes of the ridge instead, first crossing the smaller ridge to the SW. Additional assumptions are made about the boundaries within which the troop can move and the time taken to execute the route. All of these assumptions are dealt with by the same framework which ensures that the agent knows which plan they refer to and what actions are appropriate if they are violated.

5. Conclusions and Further Work We have described a system consisting of distributed agents organised in a command and control

hierarchy. The agents interleave planning and execution so that they can operate in real time within a simulated environment. We have found that this hierarchical decomposition allows us to distribute planning for a high level goal to multiple agents, who are also responsible for execution. The detailed plans at the lowest level can be planned and executed in a much more dynamic manner than the more complex, but abstract, high level plan allowing continual adjustment to the needs of a dynamic environment. Key to this process is identifying and monitoring assumptions and constraints on goals which leave subordinate agents enough flexibility to respond to minor changes in the environment while allowing major changes to be dealt with further up the hierarchy. The inclusion of contingency measures in plans allows agents at the lowest level to perform sensible stop gap actions while their superiors construct a new plan to achieve the high level goal. To support this type of execution we have used planners which can adjust their own internal parameters to try and fulfil multiple constraints and ‘anytime’ planners which always have a solution and continually improve upon it if more time is available. One of the aspects of the agents that we are currently pursuing in more depth is the inclusion of formal models for inter-agent co-operation and co-ordination (Jennings, 1995; Tambe 1998). The agents described in this paper co-ordinate by passing messages on an ad hoc basis, separately included in any plan that requires it. This can lead to brittle behaviour, unless the messages are carefully designed, and the co-ordination plan has to be done from scratch for each new task. We have been working on a re-implementation of the agents (Baxter and Horn 2000) in a Java based framework with a general co-ordination mechanism that we hope will be more flexible and can be reused for a wide variety of tasks.

6. References Baxter, J. W. and Horn, G. S. 2000. Representing and Executing Group Tasks. Proceedings of the 9th CGF Conference May 2000 Orlando, Florida pp. 443-449. ISBN 1-930638-07-6 Courtemanche, A. J. and Ceranowicz A. 1995. ModSAF Development Status. In Proceedings of the 5th Conference on Computer Generated Forces and Behavioral Representation. 3-13. Orlando Florida. Institute for Simulation and Training. Firby J. 1989. Adaptive Execution in Complex Dynamic Worlds. PhD diss., Dept. of Computer Science, Yale University. Hepplewhite, R. T. and Baxter, J. W. 1997. Planning and Search Techniques for Intelligent Behaviour of Battlefield Entities. In Recent Advances in AI Planning 4th European Conference on Planning, ECP’97 Toulouse, France. 247-259, Springer Steel S., Alami R. (Eds). Springer (Lecture Notes in computer science; Vol. 1348). ISBN: 3-540-63912-8 Hepplewhite, R. T. and Baxter, J. W. 1999. Terrain-dependent behaviours for forces in synthetic land environments. Journal of Defence Science. Vol.4 No.4 pp 397-403 Jennings N. 1995. Controlling cooperative problem solving in industrial multi-agent systems using joint intentions. Artificial Intelligence 75;195-240. Logan B. 1997. Route planning with ordered constraints. In Proceedings of 16th UK Workshop on Planning and Scheduling 133-144 ISSN 1368-570. Meliza, L. L. and Varden, E. A. 1995. Measuring Entity and Group Behaviours of Semi-Automated Forces. In Proceedings of the 5th Conference on Computer Generated Forces and Behavioral Representation. 181-192. Orlando Florida. Institute for Simulation and Training.

McEnany, B. R. and Marshall, H. 1994. CCTT SAF Functional Analysis. In Proceedings of the 4th Conference on Computer Generated Forces and Behavioural Representation, 195-207. Orlando Florida. Institute for Simulation and Training. Rosenbloom. P. S.; Johnson, W. L.; Jones, R. M.; Koss, F.; Laird, J. E.; Lehman, J. F.; Rubinott, R.; Schwamb, K. B. and Tambe, M. 1994 Intelligent Automated Agents for Tactical Air Simulation. In Proceedings of the 4th Conference on Computer Generated Forces and Behavioural Representation, 69-78. Orlando Florida. Institute for Simulation and Training. Tambe, M. 1998. Implementing Agent Teams in Dynamic Multi-agent Environments. Applied AI (12). Zilberstein, S. and Russell, S. J. 1993. Anytime Sensing, Planning and Action: A Practical Model for Robot Control. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1402-1407, Chambery, France. © British Crown Copyright 2000 / DERA Reproduced with the permission of the controller of Her Britannic Majesty’s Stationery Office.