Control of Cooperative Systems - Semantic Scholar

Control of Cooperative Systems Yong Liu(1), David Galati(1) , Marwan A. Simaan(1), and Jose B. Cruz, Jr.(2) (1) Department of Electrical Engineering, University of Pittsburgh, Pittsburgh, PA 15261 (2) Department of Electrical Engineering, The Ohio State University, Columbus, OH 43210

ABSTRACT In this paper, we consider systems that are controlled by several teams of cooperative controllers. An important resource allocation issue in such systems is the task assignment problem. In general, assigning a task to one team will require considering the actions taken by the other teams. Such resource allocation problems in a multi-team system can be formulated and solved in a game-theoretical framework. As an application of this resource allocation problem, we consider a scenario of a military air operation where one force uses suppression of enemy air defense aircraft and bombers against the other force’s fixed targets that are defended by their air defense units and ground troops. A Nash Ordinal strategy is presented for the top leader of each force to make decision on the initial task assignment and team composition. Furthermore, an initial assignment of tasks may not always result in the most optimal outcome as the system evolves over time. A reassignment of the units to different tasks during the course of operation may then become necessary in order to achieve the required objective. We also present the results of moving-horizon Nash reassignment strategies to solve the task reassignment. Several simulation experiments are performed to illustrate the applications of these Nash-type strategies. Keywords: Task assignment, Cooperative Control, Game theory, Nash ordinal strategy , Nash reassignment strategies.

1. Introduction In a large-scale complex system, different units may have different resources, and this leads to different capabilities and costs for handling the given tasks. In order to complete the various tasks more efficiently, the leader (or manager) often has to group the units into teams and allow them to cooperate with each other in order to enhance the overall system performance. Examples of such systems includes teams of cooperative robots, unmanned aerial or underwater vehicles, and large enterprises controlled by teams of agents. In these systems, to organize the units into teams is known as a natural way to reduce the complexity of the system from the leader’s perspective. In general, dealing with N teams of M agents each may be much simpler than dealing with N × M agents. During the progresses of the system operation, a leader may reassess his initial task assignment among the teams and may decide that a different assignment could yield better overall performance of the system. In that case a reassignment of tasks and a redeployment of resources will have to be performed. These problems are known as the dynamic resource allocation problems in a

complex system. For both initial task assignment and task reassignment, a decision made by one team generally requires considering the actions taken by the other teams. In the presence of an adversary, the problem of cooperative teaming and tasking becomes more complicated. A typical example of such a complex dynamic system is a military engagement between two opposing forces. A control oriented state space model for such an enterprise was recently developed in [1,2]. The model assumes that each force has a leader; the top commander, and several agents; the fighting units. These could be unmanned vehicles, robots, or other such entities. There are several tasks that need to be performed on each side of the engagement. For example, a typical task for the attacking force may involve destroying a specific part of a fixed, or moving, target on the defending side. A typical task on the defending task would be to protect the target and weaken the attacking force. The model allows for the possibility of teaming on each side for the purpose of accomplishing the required tasks. The fighting units on each side may actually be teamed up and allocated specific tasks to accomplish. In that case, a problem will arise if some of the teams are able to accomplish their tasks successfully and others are not. A situation of this type will occur when a weak team is assigned to a difficult task that it cannot accomplish on its own. It is therefore natural for the team leader to consider reassigning those teams that are still strong after successfully finish their own tasks to join the remaining teams. In some cases, even if every team is able to complete its task on its own, the associated costs and the overall systems performance may vary drastically if those teams that accomplish their tasks first are reassigned to the remaining tasks rather than if they are left inactive afterwards. The leader may therefore consider reassigning teams that have accomplished their tasks first to cooperate with the remaining teams in order to accelerate the accomplishment of the overall mission. The cooperative teaming and task reassignment in the military air operation can be analyzed in the context of a game in which each force takes certain decisions regarding the utilization of available resources in each mission to achieve certain goals. The dynamic reassignment and effectiveness of teaming and tasking problems in the military air operations have been investigated in [3] and [4] respectively. It should be noted in the previous work [1-4], that the initial teaming and tasking for both forces are assumed to be fixed at the beginning. However, in a real situation, the top leaders of both forces may have several possible ways to allocate their resource. Clearly, these initial strategies can also be analyzed in the framework of game theory. In this paper, we present just the relevant aspects of the results in our previous work, then introduce a newly-developed

1

game theory [5], Nash ordinal strategy, and use it in figuring out the initial strategies for the top leaders in a military air operation. The paper is organized as follows. In section 2, we briefly review the model for a military operation as developed in [1,2]. In section 3, we present the one-step and two-step movinghorizon Nash Reassignment strategies [3]. In section 4, we introduce the properties of Nash ordinal strategies used for deciding the initial team composition and task assignment. In section 5, we present simulation results that make use of our earlier work [1-4] and could shed some light on the importance of team composition and planning in the decision making process of the Blue top commander. In section 6, we conclude the paper with some observations.

2. An Attrition Model for a Military Operation Although considerable work has been done on mathematical models of military operations [3], recently a discrete-time statespace attrition-based model with control variables for a military engagement between two forces has been developed in [1]. For the purpose of completeness, this model will be briefly reviewed. There are two opposing forces, referred to as Blue and Red, respectively. The Blue force consists of air power and its objective is to attack some fixed targets that are defended by a ground-type Red force. The Blue force consists of Blue Fighter planes (BFs) and Blue Bomber planes (BBs). The objective of the blue force is to destroy Red Fixed Targets (FTs) such as bridges, refineries, or airports. The Red force consists of Red Troops (RTs), such as tanks and mobile vehicles, and Red air Defenses (RDs) such as surface to air missiles (SAM’s). To simplify the analysis, the individual elements in each force are grouped into units, and the elements in each unit are referred to as platforms. Thus a unit of BBs with three platforms is a group of three Blue Bombers acting as a unified entity. Each platform in a unit carries a certain number of weapons. Instead of considering individual weapons, we will characterize each unit by the average number of weapons per platform that it possesses. Finally, to facilitate the development of the model, we will assume that time is discretized into steps. At each step k, each unit is fully described by four variables: its (x-y) coordinates, the number of platforms (p) in it, and the average number of weapons per platform (w) that it is carrying. If we let

ziX ( k ) denote the state vector of unit i

 xi ( k )   X  y (k )  ziX (k ) =  iX X = {BB ,BF ,RT ,RD}  pi ( k )   X   wi ( k ) 

of type X, then:

X

wiX (k + 1) = wiX ( k ) −c iX (k ) (4) Equation (2) represents the movement on the x-y coordinates. Equations (3) and (4) are attrition models for the number of platforms and average weapons per platform that govern the behavior of these variables as the two forces engage in a battle. The term Ai X (k ) in (3) represents the percentage of platforms of type X surviving the transition from stage k to stage k+1. Since only one-on-one engagement is allowed, once the identities of the attacking and the attacked units are determined from the choice of target controls, this percentage can be represented as  N RT AiBB (k ) = 1 − ∑QijBBRT (k )PijBBRT (k )δ (ξiBB (k ),ξ RT j (k ))δ ( BBi ,d j (k ))  j =1 RT

NRD

− ∑Q

BBRD ij

BBRD ij

( k) P

j=1

 (k )δ (ξ (k), ξ (k))δ( BBi ,d (k ))  BB i

RD j

(5)

RD j

The above expression is written for the case when X represents Blue Bombers. Similar expressions can be written for all the other units [1]. The factors QijXY (k ) and PijXY (k ) represent the engagement and attrition factors between the attacking unit ( j th unit of Y) and the unit being attacked ( i th unit of X), and the term δ (.,.) represents the Kronecker delta. These two factors are determined according to the following expressions: XY ij

Q

(k ) = β

XY pij

(1 − e

− µ XY pij

pYj ( k ) piX ( k )

(6)

)

and P (k ) = 1− (1 − β wPK ) XY ij

In expression (6),

XY ij

XY β pij

(

cYj (k ) pYj ( k ) X

pi ( k )

E{

p Yj ( k ) X

pi ( k )

})

(7)

represents the probability that

th

the j unit of Y acquires the i th unit of X as a target and the

i =1,2,.....,N X

(1a)

Here, N X is the total number of units of X type. The fixed targets have fixed x-y coordinates and their platforms do not carry any weapons. Their state vector would be represented by:  x iFT   FT  y ziFT ( k ) =  FTi  ,  pi ( k )    0 

The control variables for each unit are divided into three types: X (a) the relocate command r X (k ) = a i ( k ) , (b) the choice of i  X  bi (k )  target d iX (k ) , and (c) the average salvo size ciX (k ) fired. There are several constraints that restrict the choice of controls for each unit and these are discussed in detail in [1]. The state equations for the model can therefore be written as:  x iX ( k + 1)   x iX ( k )   a iX ( k )  (2)  X  =  X +  X  y ( k + 1 ) y ( k ) b ( k )  i   i   i  X piX (k + 1) = piX ( k ) Ai (k ) (3)

i = 1,2,....., N FT

(1b)

term µ XY pij is a normalizing factor that uniformly scales the units of these platforms if they are of different types in order to make them suitable for comparison. In expression (7), β w is a weather XY PK ij

dependent

modification

factor

(

0 ≤ β w ≤ 1 ),

represents the probability of kill under ideal weather

conditions for a single weapon (i.e. an effective salvo size of 1) for the type of weapon used by unit j against the type of platform in unit i, and E (⋅) is a factor (called the firing modification factor) that models the inefficiencies of scale that may exist when two forces of unequal sizes are engaged in combat and modifies the average salvo size that reaches the target accordingly [7]. In our model, we will use the following expression for E (⋅) as was suggested by Helmbold [6, 7]:

2

ω −1

 XY pYj (k )  (8) E( X ) =  µ pij X  pi (k )  pi (k )  where the factor 0 ≤ ω ≤ 1 is referred to as the Weiss parameter. pYj ( k )

After clarifying the state equations (2) - (4) for all units, it is possible to write them in the more compact standard form: z( k + 1) = f k ( z ( k), uB ( k), uR ( k ))

(9)

where  z BB ( k )   BF   z ( k)  RT z( k ) =  z ( k )   RD   z (k )   FT   z ( k ) 

 z1X (k )    and z (k) =  M  X = {BBBFRTRDFT , , , , } X  z X ( k )  N  X

 u BB ( k )  u B ( k ) =  BF  and  u (k )

 u RT (k )  u R ( k ) =  RD  u ( k ) 

 u1X ( k )    X u ( k ) =  M  and X  u X (k )  N 

 aiX (k )   X  b (k)  uiX ( k ) =  iX .  ci ( k )   X   di ( k ) 

(10)

For a time horizon of length K time steps, we assume that the control vectors are chosen so as to maximize the objective functions: K

JB =

∑J

B

( k ) for the Blue force

k =1 K

and J

R

∑J

=

R

( k ) for the Red force

(11a)

k =1

where N

J (k) = B

BB

∑α i =1

N

N BF

pˆ i (k ) + ∑ α BFi ˆpi ( k ) BB

BBi

BF

i =1

RT

N

RD

N

FT

− ∑ α RTi pˆ i ( k ) − ∑ α RDi ˆpi (k ) − ∑ α F T i pˆ i ( k ) RT

RD

i =1

N

i =1

BB

J ( k ) = − ∑ β BBi pˆ R

N

BB i

i =1 N

(11b)

BF

( k ) − ∑ β BFi pˆ ( k ) BF i

RT i

(k ) +

i =1

N

RD

∑β i =1

N

RDi

FT

ˆpiRD ( k ) + ∑ β FTi ˆpiFT ( k ) i =1

and ˆpiX (k ) is a normalized number of platforms:

pˆ iX (k ) =

Blue force is divided into n teams {T1B , T2B , L, TnB } . Each

team consists of a combination of Blue units (BBs and BFs). The objective function of team Ti B at stage k, denoted by J iB (k ) , is given by a subset of expression J B (k ) in (11b). We assume that each team has a pre-assigned task. If some teams accomplish their tasks before others, instead of returning to base, the commander has the option of reassigning them to other, either new or ongoing, tasks. Let I c (k ) denote the set of indices of teams that have accomplished their tasks at stage k. For i ∈I c (k ) , let t iB (k ) denote the task that team Ti B can be reassigned to. The number of possible combinations of assignments of teams who have accomplished their tasks to tasks that have not been accomplished can easily grow exponentially. Let r (i ,k ) > 0 be the cost of reassigning team Ti B ( i ∈I c (k ) ) to the new task t iB (k ) at stage k. Thus, the optimal re-assignment problem at stage k can be formulated as: max J%kB, K [ u% B( k ),L u,% B (K − 1)]

K   where J%kB, K = ∑  ∑ J iB (l ) − ∑ r (i , l )  (12) l = k i ∉ I c ( l ) i∈ Ic ( l )   uB ( k )  In the above expression, u% B ( k ) =  B  for i ∈ I c (k ) , which  ti ( k )  basically says that the control vector in (11) has been augmented by the choice of a new task t iB (k ) . It is clear that

the optimal control actions [u% B* (k ),L, u% B * (K − 1)] taken by the Blue teams also depend on the controls of the Red force and hence the problem will need to be considered within the framework of game theory as will be discussed in the next section. That is, the solution will continue to be game-theoretic in nature. In this paper, we will maintain the Nash strategy from game theory [8] as the approach to obtain the optimal reassignment controls for any Blue team that has been reassigned. Once t iB (k ) is determined, the units in team Ti B will move to the location of the new task.

i =1

RT

+ ∑ β RTi pˆ

FT

i =1

target and weakening its defending units is defined as a task for the Blue force. When there is only one fixed Red target, the Blue commander will assign the entire Blue force to that task. When the number of Red targets is greater than one, the commander may divide the Blue force into teams and decide which team will be assigned to which task. Assume that the

p iX (k ) piX (0 )

k = 0,1,2,3 ....K

The expressions in (11b) are linear combinations of normalized platforms and express the objective of each force to maximize the number of its own platforms and minimize the number of the platforms of the opposing force.

3. Moving -Horizon Nash Reassignment Strategy Let us now consider a general task reassignment problem for the Blue force. Assume that there are m distinct fixed targets, each defended by specific units of the Red force. Destroying a fixed

Let na (k ) and nb ( k ) be the number of teams to be re-assigned and the number of unaccomplished tasks at time k, respectively. The number of task choices t iB (k ) for the i th re-assigned team is equal to nb ( k )+ 1 , i.e., the number of unaccomplished tasks plus the choice of returning to base. Thus, the number of all possible combinations of task choices for the re-assigned teams at time k is ( nb (k ) + 1) na ( k ) . Clearly, this number will grow exponentially with increasing na (k ) and nb ( k ) adding another complexity to the task reassignment problem. To reduce it, one way is to allow those re-assigned teams to select the unaccomplished tasks near their current locations only, and thus the cost of any reassigned path can be ignored in the objective functions.

3

Because of the extensive computations involved, even in cases that do not involve reassignment, determining a solution for problems of this type over the entire time horizon K is not numerically feasible [2]. In order to deal with this issue, instead of maximizing the objective functions J%kB, K from stage k to the final stage K, we will consider maximizing the objective functions over a reduced look-ahead moving horizon of length K r steps ( Kr