Improving a Multiagent Team with a Model-Based Diagnosing Coach

1 downloads 0 Views 451KB Size Report
Based on the diagnosis, it makes recovery changes in the team so finally the team performs well. We evaluated our diagnosis-based coach in a simulation of the.
Improving a Multiagent Team with a Model-Based Diagnosing Coach Eliahu Khalastchi and Meir Kalech and Lior Rokach Ben-Gurion University, Be’er-Sheva, Israel email: {khalastc,kalech,liorrk}@bgu.ac.il Abstract Achieving a high performance of a team is challenging. Each agent must be capable of performing its own role under the team’s context and constraints. In addition, the agent’s beliefs must be consistent with the domain knowledge and its teammates’ beliefs. Unfortunately, these requirements may be violated and thus the team performs unwell. In this paper we propose a diagnosis-based coach which can observe the team execution and, by a model-based diagnosis approach, identifies the root causes of the poor performance of the team. Based on the diagnosis, it makes recovery changes in the team so finally the team performs well. We evaluated our diagnosis-based coach in a simulation of the volleyball game and show that its performance is close to an oracle.

1

Introduction

When a group of agents collaborate to achieve a shared goal they face a lot of challenges [1]. For the agents to be a successful team each agent must be capable of performing its own role under the team’s context and constraints. In addition, the agent’s beliefs must be consistent with the domain knowledge and its teammates’ beliefs. Inconsistency in the beliefs or incapability of some agents to fulfil their tasks may lead to a failure while trying to achieve a shared goal [2] . In this work we focus on beliefs and capabilities of heterogeneous agents which function as a team. We focus on the personal and the shared beliefs of the agents as well as their individual performance. Some of these beliefs are prone to errors. Some individual capabilities of an agent might not be suitable for the task it is assigned to by the team plan. As a result, the agents might produce a faulty execution of the team plan which may lead to poor performance. We aim to exploit the fact that there are number of agents in the team. If a belief of an agent is incorrect then the agent can be instructed to receive the correct information from another agent. If the capabilities of an agent are not adequate to successfully execute its task then tasks can be reassigned among the team members. The challenge is to automatically deduce the root cause of the poor performance of the team given: (1) a team of heterogeneous agents with different sensing, thinking and acting capabilities, (2) the agents operate in a dynamic environment, (3) there is no insight of their internal beliefs or their belief system, (4) there is an a priori knowledge

about the team plan, and (5) a posteriori observation of the team’s behavior is given. We assume an external observer agent, hereinafter a coach [3], who can improve the team performance by operating the above proposed actions: reassign agents’ roles or share agents’ beliefs. As an external observer, the coach has a limited observation on the team execution. It can observe the assigned roles of the agents as well as domain observations but it has no a priori knowledge about the capabilities of the agents nor about their beliefs. Therefore, the question of which actions the coach should take to improve the team is challenging. This challenge is relevant to any domain in which a group of agents has to improve their team performance. To demonstrate this problem, we adopt the volleyball game domain. As in other team sports, a successful team performance is dependent on the correct role assignment of agents and the correctness of their individual and shared beliefs about the environment of the game. For instance, an agent in the spiker position should be a quick player with an excellent timing capability. In addition, all the players must have correct beliefs about the ball location and about the roles of their teammates e.g., who defends which sections or to whom they should pass the ball. A coach agent can observe the team games, and then try to improve the team performance by reassigning players’ role or instructing players to share information during the next games e.g., the believed ball location. Since the coach decisions are based on game observations, the typical approach to this problem in the AI community is to use some reward driven mechanism that directs the search for the correct coach actions such as in reinforcement learning [4]. Blame assignment [5] is a very challenging problem in such approaches. Alternatively, we suggest the use of model-based diagnosis approaches for such coaching challenges. The main contribution of our work is by applying a model-based diagnosis [6] (MBD) approach to help the coach in improving the team. MBD relies on a model of the diagnosed system, which is utilized to simulate the expected behavior of the system given the operational context (typically, the system inputs). The resulting simulated behavior (typically, the system outputs) are compared to the observed behavior of the system to detect discrepancies that indicate failures. By adopting an MBD approach, the coach will be able to isolate the root cause of the team’s poor performance. Specifically, it will know which of the agent’s beliefs or limited capability caused a team failure. Thus it will be able to make intelligent changes in the team so the team

will improve its performance. An additional contribution of the paper is by proposing a methodology for the coach to train a group toward a goal: The team tries to achieve a goal; upon failure, the coach diagnoses the root causes of the failure using MBD, and proposes appropriate recovery actions. This process is iteratively performs until the goal is achieved. MBD for multi-agent systems (MAS) has been proposed in previous work [7] [2]. Our work is orthogonal to the above research, since the focus of our work is how to assist a coach to improve the team using diagnosis and recovery. In this sense, research in area of multi-agent diagnosis may incorporate in our work. We evaluated our MBD-based coach in a simulation of the volleyball game. The goal is to achieve a high performance of the team (measured by withstanding 100 consecutive opponent attacks), by minimizing the number of faults and games. We implemented a competitive approach which relies on heuristic search. We show that the diagnosis approach successes to achieve results very close to an oracle-based search that knows the exact failures.

2

using partial observations of a plan in execution that may concern individual components of the plan. In addition, they show the possible consequences of these failures for the future execution of plan elements. A follow up work by Roos et al. [12] aims to identify equipment and/or agents that should be repaired or adjusted in order to avoid further violation of plan execution. Similarly, Micalizio and Torasso [13] added the ability of understanding at what extent a fault affecting the functionalities of an agent affects the global plan. They adopted a relational formalism for modeling both nominal and abnormal execution of actions and a mechanism of failure propagation to capture the interplay between agent diagnosis and plan diagnosis. Our diagnosis method relies on concepts of the above work. We extended the plan diagnosis to deal with teammates that are not assigned to a proper role. Thus we diagnose also the capabilities of agents to execute tasks. In addition, our work is orthogonal to the research of diagnosis of MAS, since the focus of our work is how to assist a coach to improve the team using diagnosis and recovery. In this sense, research in area of MAS diagnosis may incorporate in our work.

Related Work

The use of a coaching agent is not new [8]; a coaching agent’s role is to improve the team via communications. It is not a centralized coordinator that instructs the agents exactly what to do at each point of time. Rather, it suggests the agents on how to improve by giving them useful and limited information, general instructions, or altering the team plan. Works about a coaching agent [9] in the robocup competition typically adjust the strategy of the team or find weaknesses about the opponent team such that they could be exploited. In this work we do not wish to alter any plan or to model an opponent, but rather describe and demonstrate how a coach that uses model based diagnosis can aid a team of agents which execute the team plan in a faulty manner. We expect a high learning curve of the team because of the insights provided by such a coach. Diagnosing multi-agent systems has been studied in the last decade on different types of system and with different approaches. The first approach focuses on diagnosing teamwork coordination failures [10]. Kalech and Kaminka introduce the notion of social diagnosis. The input of the diagnosis algorithm, beside the observation of the team, is a hierarchy of behaviors that the agents share. Behaviors abstract the expected actions taken by agents in specific situation. If two or more agents select two behaviors that contradict each other, a disagreement arises. A diagnosis of this problem is a set of belief states held by agents that possibly lead them to choose the wrong behaviors. In our problem we also look for faulty beliefs but we infer them by a model of the relations between the tasks rather than from a model of the coordination. The next approach addresses the problem of diagnosing a multi-agent plan (MAP) that represents chronological constrains between the actions that agents plan to take. Roos et al. apply [11] MBD methods to detect plan failures

3

The Volleyball Domain

Figure 1: The Volleyball Simulator As a motivating example we chose the game of volleyball. Volleyball is a quick and dynamic game in which a high degree of coordination and collaboration is required from a team of players. We created a volleyball simulator (see figure 1) in which a team of 6 volleyball playing agents are challenged to withstand 100 consecutive opponent attacks. An opponent attack occurs automatically whenever the ball crosses over to the opponent’s court. During an opponent attack the ball bounces over the net at a random speed to a random location inside the team’s court. In order to withstand an opponent’s attack the team must apply a team plan comprised of three stages: (a)

defense, (b) transition to offence, and (c) offence. The team is allowed exactly 3 hits to the ball, where a player is not allowed to hit the ball twice in row. If the ball touches the ground anywhere but the opponent’s court then the team fails to withstand the opponent’s attack. A predefined team plan dictates the team formation and the role of each player. Each role is associated with one or more tasks. The roles are: left spiker, setter, right spiker, left receiver, middle receiver, and right receiver (see figure 1). In figure 1 player 5 is highlighted. Player 5 is the middle receiver, the small light-gray squares present the areas it is tasked to defend. Each player poses several domain related beliefs, such as the location of the ball, its destination, the location of the other players, the number of team hits etc. These beliefs are sensor-based. Some agents have better sensors than others and therefore their beliefs may be more accurate. Each player also poses team related beliefs – the believed role of the other teammates. These beliefs are definition based. In addition, each player possesses different capabilities such as maximum movement speed and accurate timing. For a team to be successful, (1) all the players must know who the setter is, (2) the setter must know who the left and right spikers are, (3) every player must know exactly where the ball is and where it is headed, and (4) the players roles should match their maximum speed ability. Each player knows its own role but there is no guarantee it correctly knows the roles of its teammates. In addition, a player’s ball location and destination calculation can be imprecise. In figure 1 we highlight player 5’s beliefs. Its beliefs about the roles of the other players are correct, but its ball location and destination calculation suffer from severe deviations. This yields a poor team performance. For the team’s aid, a coach agent 𝑎𝑐 is able to observe a game until the team fails i.e., the ball touches their side of the court, and then 𝑎𝑐 tries to improve the team performance by operating one of the following actions: (1) redefine a player with a believed role of another teammate, (2) instruct a player to get the ball’s information from another teammate instead of trusting its own sensors to calculate these beliefs, and (3) switch two players. Switching players includes the switching of their position in the formation and their associated roles, and inform all the team members with the new believed roles of the switched players.

4

Problem Description

In a given domain 𝐷 with a set of domain observations 𝑜𝑏𝑠𝐷 = {𝑑1 , … , 𝑑𝑞 } a team of agents 𝐴 = {𝑎1 , … , 𝑎𝑛 } collaborate to achieve a shared goal 𝐺 by executing a shared team plan 𝑃𝐴 . 𝑃𝐴 defines a set of roles 𝑅𝐴 = {𝑟1 , … , 𝑟𝑛 } where each role 𝑟𝑗 is assigned to a different agent 𝑎𝑖 . A role is associated with tasks 𝑇𝑟𝑗 ⊆ 𝑇𝐴 = {𝑡1 , … , 𝑡𝑚 } i.e. an agent that has been assigned to role 𝑟𝑗 is expected to execute each task 𝑡𝑗 ∈ 𝑇𝑟𝑗 . Each task 𝑡𝑗 defines the following: (1) a failure condition - 𝑓𝑎𝑖𝑙𝑢𝑟𝑒(𝑡𝑗 ) is a predicate which is true if the

task has failed. The indication whether a task failed can be derived by propositional formula on the domain observations: 𝑓𝑎𝑖𝑙𝑡𝑗 (𝑜𝑏𝑠𝐷 ) → 𝑓𝑎𝑖𝑙𝑢𝑟𝑒(𝑡𝑗 ). (2) Rules of actions activation - a rule consists of an action 𝛼 that is activated upon the satisfaction of a precondition. The precondition is a propositional formula on the domain observations: 𝑝𝑟𝑒𝛼 (𝑜𝑏𝑠𝐷 ). An agent 𝑎𝑖 poses its own beliefs about the domain. These beliefs are dynamically changed upon the observations; we denote the beliefs as 𝐵𝑖 (𝑜𝑏𝑠𝐷 ). Thus, an agent 𝑎𝑖 assigned to execute the task 𝑡𝑗 is expected to operate an associated action 𝛼 only if it believes that the precondition is satisfied i.e. 𝐵𝑖 (𝑜𝑏𝑠𝐷 ) ⊢ 𝑝𝑟𝑒𝛼 (𝑜𝑏𝑠𝐷 ). The possibility of agent 𝑎𝑖 to complete the execution of action 𝛼 successfully depends on two factors: (1) the precondition of 𝛼 is consistent with the domain observation. The fact that the agent’s beliefs are consistent with the precondition is not sufficient since it is possible that the agent’s beliefs are wrong. (2) the agent’s capabilities, denoted by the propositional formula 𝐶𝑖 , must be consistent with the capability requirements defined for the task. These requirements are part of the precondition of the task. The effect of an executed action 𝛼 yields a postcondition. The postcondition is a propositional formula on the domain observations: 𝑝𝑠𝑡𝛼 (𝑜𝑏𝑠𝐷 ). A task may be composed of several actions that are operated in sequence. The order is determined by the preconditions and postconditions of the actions as follows: A postcondition may be identical to another action’s precondition and thus the other action is provoked next. In our problem setting, the capabilities of the agents are initially unknown and the role assignment is random. Moreover, the beliefs of the agents are not guaranteed for being consistent with the domain observations. Thus, 𝑃𝐴 may be poorly executed and failure conditions may be reported. The root causes of these failures cannot be inferred directly. A special agent 𝑎𝑐 ∉ 𝐴 takes the role of a coach. The general role of a coach agent is to improve the team performance through communication [3]. In this particular problem setting the role of 𝑎𝑐 is to fix the faulty execution of 𝑃𝐴 and to achieve the goal given the following: 1. Agent 𝑎𝑐 knows the initial task assignment of each agent: 𝐵𝑐 (𝑅𝐴 ) ⊢ 𝑅𝐴 2. Agent 𝑎𝑐 observes the domain correctly: 𝐵𝑐 (𝑜𝑏𝑠𝐷 ) ⊢ 𝑜𝑏𝑠𝐷 . 3. Agent 𝑎𝑐 does not know the capabilities of the agents in the team. 4. Agent 𝑎𝑐 cannot observe the beliefs of the agents. The methodology of 𝑎𝑐 is as follows: 1. While the team’s goal is not achieved 2. Observe the team operation until the team fails 3. Diagnose the root cause 4. Apply recovery actions Agent 𝑎𝑐 can apply the following recovery actions: 1. Assign an agent with a new set of tasks. 2. Instruct an agent to change a belief.

An action of 𝑎𝑐 is deterministic i.e. there is no uncertainty about the result of one of these actions. The challenge is for 𝑎𝑐 to choose its actions intelligently and fix the execution of 𝑃𝐴 and achieve the goal efficiently. Meaning, let 𝐹 be the number of failures that invoked 𝑎𝑐 and let 𝑁 be the number of actions 𝑎𝑐 performed. An efficient 𝑎𝑐 agent enables the team’s goal to be achieved while minimizing 𝐹 and 𝑁.

5

Model-Based Diagnosing Coach

We propose an MBD technique to assist 𝑎𝑐 to choose its recovery actions. We are inspired by human behavior where a human team coach is able to deduce the reason for the team’s lack of performance and instruct them on how to improve. There are two main challenges. The first challenge is to diagnose the correct root causes that led the team to failure. The second challenge is to apply an efficient recovery based on the returned diagnosis. In this work we focus on the diagnosis challenge and provide some insights about the recovery. Our intention is to demonstrate how effective a diagnosis based approach is compared to other traditional AI approaches even when using simple forms of diagnosis. Next, we describe the basic MBD notations and terminology for this problem.

5.1 MBD Terminology MBD problems arise when the normal behavior of a system is violated due to faulty components as indicated by certain observations. An MBD problem is specified as a triplet 〈𝑆𝐷, 𝐶𝑂𝑀𝑃𝑆, 𝑂𝐵𝑆〉 where: 𝑆𝐷 is a system description, 𝐶𝑂𝑀𝑃𝑆 is a set of components and 𝑂𝐵𝑆 is an observation. SD takes into account that some components might be abnormal (faulty). This is specified by a unary predicate ℎ(⋅) on components such that ℎ(𝑐) is true when component c is healthy, while ¬ ℎ(𝑐) is true when c is faulty. Denoting the correct behavior of c as a propositional formula, 𝜑𝑐 , 𝑆𝐷 is given formally as: 𝑆𝐷 = ⋀𝑐∈𝐶𝑂𝑀𝑃𝑆 ℎ(𝑐) ⇒ 𝜑𝑐 .Namely, each component which is healthy follows its correct behavior. A diagnosis problem (DP) arises when, under the assumption that none of the components are faulty, there is an inconsistency between the system description and the observations [13] [5]. Definition 1: [Diagnosis Problem] Given an MBD problem,〈𝑆𝐷, 𝐶𝑂𝑀𝑃𝑆, 𝑂𝐵𝑆〉, a diagnosis problem arises when 𝑆𝐷 ∧



ℎ(𝑐) ∧ 𝑂𝐵𝑆 ⊢⊥

𝑆𝐷 ∧ ⋀ ¬ℎ(𝑐) ∧ ⋀ ℎ(𝑐) ∧ 𝑂𝐵𝑆 ⊬⊥ 𝑐∈𝛥

5.2 MBD for a Coach We formalize our problem as an MBD problem by defining the triplet 〈𝑆𝐷, 𝐶𝑂𝑀𝑃𝑆, 𝑂𝐵𝑆〉. The components of our system, represented by 𝐶𝑂𝑀𝑃𝑆, are the beliefs and capabilities of the agents. We specify the unary predicate ℎ(∙) on the components in 𝐶𝑂𝑀𝑃𝑆 to denote their health. The description of our system (𝑆𝐷) consists of task models and additional rules about the health variables. We define two predicates: (1) 𝑎𝑐𝑡(𝛼) is true if action 𝛼 is selected to be executed. (2) 𝑠𝑎𝑖 (𝛼) is true if agent 𝑎𝑖 completed the action 𝛼 successfully. This depends on the health of 𝑎𝑖 . In our problem wrong beliefs and inadequate capabilities of agents trigger faulty execution of actions. The system description 𝑆𝐷 includes the following rules: ∀𝑎𝑖 ∀𝛼: 𝑠𝑎𝑖 (𝛼) →

Definition 2: [Diagnosis] Given an MBD problem, 〈𝑆𝐷, 𝐶𝑂𝑀𝑃𝑆, 𝑂𝐵𝑆〉, the set of components 𝛥 ⊆ 𝐶𝑂𝑀𝑃𝑆 is a diagnosis if

⋀ 𝑏∈𝐵𝑖 (𝑜𝑏𝑠𝐷 )

ℎ(𝑏) ∧ ⋀ ℎ(𝑐) 𝑐∈𝐶𝑖

In addition, 𝑆𝐷 includes a model of the tasks. Task model of 𝑡𝑗 is described as follows: for each action 𝛼 ∈ 𝑡𝑗 we add: (1) 𝑝𝑟𝑒𝛼 (𝑜𝑏𝑠𝐷 ) → 𝑎𝑐𝑡(𝛼). This rule indicates upon which conditions 𝛼 is activated, (2) 𝑠𝑎𝑖 (𝛼) → 𝑝𝑠𝑡𝛼 (𝑜𝑏𝑠𝐷 ). This rule indicates the result (postcondition) of a successfully completed action 𝛼. (3) We add at least one rule 𝑠𝑎𝑖 (𝛼) → ¬𝑓𝑎𝑖𝑙𝑢𝑟𝑒(𝑡𝑗 ). This rule indicates that a successful completion of action 𝛼 cannot be true if 𝑓𝑎𝑖𝑙𝑢𝑟𝑒(𝑡𝑗 ) is true. (4) Finally, we add the rule: 𝑓𝑎𝑖𝑙𝑡𝑗 (𝑜𝑏𝑠𝐷 ) → 𝑓𝑎𝑖𝑙𝑢𝑟𝑒(𝑡𝑗 ). This rule defines conditions on the domain observation that indicates a task failure. Finally, we formalize the observations of our system. 𝑂𝐵𝑆 = 𝑜𝑏𝑠𝐷 . Meaning, the set of observations consists of the set of predicates describing the observations about the domain. In the volleyball domain these may include the ball location and the roles of the agents. Given 〈𝑆𝐷, 𝐶𝑂𝑀𝑃𝑆, 𝑂𝐵𝑆〉, any model based diagnosis algorithm as SAT_based [14] or conflict-directed [15] algorithms, can yield a set of diagnosis candidates upon fault occurrence. A diagnosis includes a subset of the health variables (beliefs and capabilities) which the assumption that they are not healthy explains the observation. In the volleyball domain, for instance, there are three main tasks. Here we describe 𝑆𝐷 related only to the defend area task. 𝑡 = 𝑑𝑒𝑓𝑒𝑛𝑑_𝑎𝑟𝑒𝑎: 1. 𝑏𝑎𝑙𝑙_𝑎𝑏𝑜𝑢𝑡_𝑡𝑜_ℎ𝑖𝑡_𝑎𝑟𝑒𝑎(ball,area) →

𝑐∈𝐶𝑂𝑀𝑃𝑆

A diagnosis algorithm will try to find a set of components 𝛥 ⊆ 𝐶𝑂𝑀𝑃𝑆 that, if assumed faulty, explains the observation.

𝑐∉𝛥

2. 3. 4. 5. 6.

𝑎𝑐𝑡(𝑚𝑜𝑣𝑒_𝑡𝑜_𝑏𝑎𝑙𝑙𝑎𝑖 ) 𝑠𝑎𝑖 (𝑚𝑜𝑣𝑒_𝑡𝑜_𝑏𝑎𝑙𝑙) → 𝑎𝑡_𝑏𝑎𝑙𝑙_𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝑎𝑖 𝑎𝑡_𝑏𝑎𝑙𝑙_𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛𝑎𝑖 → 𝑎𝑐𝑡(𝑝𝑎𝑠𝑠_𝑏𝑎𝑙𝑙_𝑡𝑜_𝑠𝑒𝑡𝑡𝑒𝑟𝑎𝑖 ) 𝑠𝑎𝑖 (𝑝𝑎𝑠𝑠_𝑏𝑎𝑙𝑙_𝑡𝑜_𝑠𝑒𝑡𝑡𝑒𝑟) → 𝑏𝑎𝑙𝑙_𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛_𝑐ℎ𝑎𝑛𝑔𝑒𝑎𝑖 𝑠𝑎𝑖 (𝑝𝑎𝑠𝑠_𝑏𝑎𝑙𝑙_𝑡𝑜_𝑠𝑒𝑡𝑡𝑒𝑟) → ¬𝑓𝑎𝑖𝑙𝑢𝑟𝑒(𝑑𝑒𝑓𝑒𝑛𝑑_𝑎𝑟𝑒𝑎) 𝑏𝑎𝑙𝑙_𝑎𝑡_𝑎𝑟𝑒𝑎_𝑔𝑟𝑜𝑢𝑛𝑑(𝑏𝑎𝑙𝑙, 𝑎𝑟𝑒𝑎) → 𝑓𝑎𝑖𝑙𝑢𝑟𝑒(𝑑𝑒𝑓𝑒𝑛𝑑_𝑎𝑟𝑒𝑎)

If agent 𝑎𝑖 believes the ball is about to hit the area it is assigned to defend then it will move towards the ball (rule 1). If successful then it will be at the ball’s location (rule 2). This provokes the agent to pass the ball to the setter

(rule 3). Upon success we can conclude that the ball’s location was changed by 𝑎𝑖 (rule 4) and the task is not failed (rule 5). If at any time the ball hits the ground at the defended area then this defend_area task has failed (rule 6).

Figure 2: a faulty play Example: assume the following play illustrated in

Figure 2. The ball is received by the right receiver (player 6), who mistakenly thinks the setter to be the left receiver (player 4). When it hits the ball towards the left receiver, the setter (player 2) has to run across the court to reach the ball. However quick the setter may be, it will not make it in time, the ball will touch the ground and the team fails to withstand an attack. At first glance, it may seem that player 2 is the one who has failed, but deeper analysis can isolate the root cause – player 6 has a faulty belief about who is assigned to the role of the setter. The setter’s setting task indeed has failed. The diagnosis process propagates through the setting task model. The ball was received (observation). This triggers the setter’s move_to_ball action (rule). The expected outcome at_ball_location for the setter was not achieved (observation). Hence, the setter’s action of move_to_ball is isolated. The modeled health variables of this action are the beliefs and capabilities essential for a successful completion of the action. Specifically, the setter’s ball location belief and its ability to reach the ball, and the receiver’s ability to pass the ball to the correctly recognized setter player. Given this observation, by running an MBD algorithm we will infer, in this case, four diagnosis candidates: Δ1 = {player 2’s belief about the ball’s location}, Δ2 = {player 2’s speed capability}, Δ3 = {player 6’s belief about the setter identity}, Δ4 = {player 6’s capability to pass the ball}. All the diagnoses include either belief or capability since these are the assumable health variables of the diagnosis problem. Each one of the diagnoses can explain the observation.

5.3 Recovery The MBD algorithm returns a set of diagnoses that explain the observation. In our problem, the diagnosis candidates include faulty beliefs and capabilities of some agents. In classical MBD, there are two methods to discriminate the actual diagnosis, the diagnosis that actually contains the faulty components, either by testing or probing [17]. Both

methods can be run iteratively until focusing on a single diagnosis. In MBD of MAS, discriminating the actual diagnosis has been researched [18] [19] in the context of recovery. The recovery should consider: (1) the probability of the different diagnoses, (2) the cost of fixing the components in the diagnosis, and (3) the risk by fixing components that are not actually faulty. The recovery process should consider domain specific parameters, but in the context of our problem we would like to mention two aspects: (1) agent’s beliefs are usually got through sensor-based computation. If a belief is faulty then a quick recovery can done by permanently getting the believed values from another agent. (2) Diagnosis that contains agent’s capability actually indicates a limitation of the agent to perform a task. Then a quick recovery can replace the agent’s role with another.

6

Experimental Setup

To evaluate our diagnosis-based coach we implemented the volleyball game, described above. We wish to show the model based diagnosing coach efficiency compared to an AI approach that would have been typically applied for this problem setting. A typical AI approach would be to use a reward based heuristic decision for the coach’s choice of repair actions. Such is the case in reinforcement learning algorithms. However, given the deterministic nature of the coach influence, we can be satisfied with competing against a search algorithm which is more suitable for this domain and converge faster. In a search problem we should define the state space and the possible actions in each state. We will describe them in the experimental domain of the volleyball game. A searching coach maintains a state that consists of (1) the team roles, (2) the coach’s knowledge about every agent’s believed team roles, and (3) information about where an agent gets the ball information from. The coach needs to choose the repair actions i.e., switch roles, redefine role beliefs, and setting a different source for calculating the ball’s position. Before instructing the team, the searching coach heuristically estimates the resulting states that yield from the possible actions that can be taken from the current state. Then, the action that yields the best state is chosen. The heuristic estimation of a state is goal oriented and may be partially based on previous observations and associated rewards. A search algorithm which is purely reward dependent is not competitive enough to our approach which has the additional knowledge provided by the model. Hence, we implemented two types of competitive searching coaches: oracle based (hereinafter OB), and a semi-oracle based (hereinafter SOB). The OB coach uses an “oracle” that knows the exact capabilities and beliefs of the agents. Thus, the OB coach can immediately define the goal state and apply an A* algorithm. In A* algorithm we should define the 𝑔 function which represents the cost from the start and an admissible heuristic function (ℎ) which estimates the cost to reach the goal. To implement A* with the OB coach we set the 𝑔 function as the sum of weighted actions the coach

has already performed, and the ℎ (heuristic) function as the exact number of actions needed to reach the goal state. For this reason, OB uses as a theoretical lower bound on the number of necessary actions. The SOB coach is less dependent on an “oracle”. The oracle provides it partial information about the players’ beliefs (the beliefs about the ball location) but not about the capabilities of the players. Since the heuristic function estimates the distance to the goal state, SOB uses a reward function which sets a reward to each state based on the number of consecutive withstands attacks (for space limitation we skip the exact description of the reward). We wish to compare the model based diagnosis coach (hereinafter MBD coach) to the other coaches. In particular, we measure how better it is than the SOB coach and how close its performance can get to the oracle based coach OB. For this matter we run experiments on a team that is injected with a series of faults. We use a setting of 2, 4, 7, and 11 different random faults. Each setting is tested 25 times where the random faults are also randomly located among the team members with each test. Each coach is tested for how many failures had occurred and how many actions it took until the team was able to withstand 100 consecutive opponent’s attacks. The results are averaged over 25 trials.

7

Results

In some very rare cases the MBD coach had less actions than the OB coach. In these cases the team has not yet achieved the best configuration but was still able to withstand 100 attacks. The OB coach automatically applies all the actions needed to reach the best team configuration and thus it took a bit more actions in these particular cases.

Figure 4: number of games SOB vs MBD Figure 4 depicts the average number of failed games that were played before the coach was able to fix the team, with respect to the number of failures. The MBD coach required significantly less games than the SOB coach. The number of failed games for the OB coach is always 1 since it is independent of observing the game; it simply applies A* upon the first failure and then conducts all the actions that fix the team.

Figure 3 depicts the average number of actions taken by a coach until a team was able to withstand 100 consecutive attacks, with respect to the number of faults injected (the 𝑋 axis). The black bars depict the SOB coach, the light gray bars depict the MBD coach, and the dark gray bars depict the OB coach. We can see that the MBD coach performed significantly less actions than the SOB coach.

Figure 3: number of actions SOB vs MBD vs OB As expected, in every case the oracle based coach is more efficient since it is actually the theoretical lower bound on the number of actions. Even though, the MBD coach presents a very close results to OB. OB is expected to perform better than MBD due to the fact that the oracle is able to observe the beliefs and capabilities of the agents while the MBD coach just infers them by observing the environment.

8

Conclusions

In this paper we presented how MBD can be applied to a assist a multiagent team coach to improve the team. A diagnosis-based coach uses an MBD approach to isolate the root causes of poor performance of a team. Then, the coach conducts recovery changes in the team to improve its performance. This iterative process repeats until the team successes to achieve its goal. We presented evaluation on the volleyball domain by comparing it to a heuristic search approach. We implemented two competitive coaches, one that is an oracle that knows the exact beliefs and capabilities of the agents and a semioracle coach that knows the agents’ beliefs partially. These coaches present the typical AI approach towards the problem. We showed that our diagnosis-based coach performs very close to the oracle and much better than the semi-oracle and thus we can suggest the use of MBD approaches to improve the performance of a multiagent team.

References [1] B. J. Grosz and S. Kraus, "Collaborative plans for complex group action," Artificial Intelligence, vol. 86, pp. 269-357, 1996. [2] M. Kalech and G. A. Kaminka, "On the design of social diagnosis algorithms for multi-agent teams," in International joint conference on Artificial Intelligence (IJCAI-03), 2003.

[3] P. Riley, M. Veloso and G. Kaminka, "Towards Any-team Coaching in Adversarial Domains," in Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-02), New York, 2002. [4] A. G. Barto, Reinforcement learning: An introduction, MIT press, 1998. [5] M. J. Mataric, "Reinforcement learning in the multi-robot domain," in Robot colonies, Springer, 1997, pp. 73-83. [6] R. Reiter, "A theory of diagnosis from first principles," Artificial intelligence, vol. 32, pp. 57-95, 1987. [7] R. Micalizio and P. Torasso, "Plan diagnosis and agent diagnosis in multi-agent systems," in AI* IA 2007: Artificial Intelligence and Human-Oriented Computing, 2007. [8] P. Riley, M. Veloso and G. Kaminka, "Towards any-team coaching in adversarial domains," in Proceedings of the first international joint conference on Autonomous agents and multiagent systems, 2002. [9] G. Kuhlmann, W. B. Knox and P. Stone, "Know thine enemy: A champion RoboCup coach agent," in Proceedings of the National Conference on Artificial Intelligence, 2006. [10] M. Kalech and G. A. Kaminka, "On the design of social diagnosis algorithms for multi-agent teams," in International Joint Conferences on Artificial Intelligence, 2003. [11] N. Roos and C. Witteveen, "Models and methods for plan diagnosis," Autonomous Agents and Multi-Agent Systems, vol. 19, pp. 30-52, 2009. [12] F. De Jonge, N. Roos and C. Witteveen, "Primary and secondary diagnosis of multi-agent plan execution," Autonomous Agents and Multi-Agent Systems, vol. 18, pp. 267-294, 2009. [13] R. Micalizio and P. Torasso, "Plan diagnosis and agent diagnosis in multi-agent systems," in AI* IA 2007: Artificial Intelligence and Human-Oriented Computing, 2007, pp. 434-446. [14] J. De Kleer and B. C. Williams, "Diagnosing multiple faults," Artificial intelligence, vol. 32, pp. 97-130, 1987. [15] A. Metodi, R. Stern, M. Kalech and M. Codish, "Compiling Model-Based Diagnosis to Boolean Satisfaction," in TwentySixth AAAI Conference on Artificial Intelligence , 2012. [16] R. T. tern, M. Kalech, A. Feldman and G. M. Provan, "Exploring the Duality in Conflict-Directed Model-Based Diagnosis," in Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012. [17] A. Feldman, G. Provan and A. van Gemund, "A modelbased active testing approach to sequential diagnosis," Journal of Artificial Intelligence Research, vol. 39, p. 301, 2010. [18] R. Micalizio and P. Torasso, "Team Cooperation for Plan Recovery in Multi-agent Systems," in Multiagent System Technologies, 2007, pp. 170-181. [19] R. Micalizio, "Action failure revovery via model-based diagnosis and confoemany planning," Computational Intelligence, 2012.