Uncertain Agent Verification through Probabilistic Model-Checking

Uncertain Agent Verification through Probabilistic Model-Checking Paolo Ballarini, Michael Fisher, and Michael Wooldridge University of Liverpool, Liverpool L69 7ZF, UK, {paolo,michael,mjw}@csc.liv.ac.uk

Abstract. In many situations an agent’s behaviour can sensibly be described only in terms of a distribution of probability over a set of possibilities. In such case (agents’) decision-making becomes probabilistic too. In this work we consider a probabilistic variant of a well-known (twoplayers) Negotiation game and we show, first, how it can be encoded into a Markovian model, and then how a probabilistic model-checker such as PRISM can be used as a tool for its (automated) analysis. This paper is meant to exemplify that verification through model-checking can be fruitfully applied also to uncertain multi-agent systems. This, in our view, is the first step towards the characterisation of an automated verification method for probabilistic agents.

1

Introduction

Because of their notorious complexity, multi-agent systems’ run-time behaviour is extremely hard to predict and understand and, as a result, agents’ verification is a hard task. In recent years significant research effort has been invested in the development of formal methods for the specification and verification of multiagent systems [14] and model-checking has been proved to be a possibility in that respect. Bordini et al., for example, have shown how LTL model-checking techniques [13] can be applied, by means of the SPIN model-checker [9], to the verification of a specific class of non-probabilistic rational agents (i.e. BDI agentsystems expressed as AgentSpeak programs) [3]. In [5], Dix et al., argue that in several real-life situations an agent behaviour (i.e. the state an agent is in at a certain time) may be known with a given degree of uncertainty. As a consequence a specific language for probabilistic agents (i.e. a language that allows for a quantification of such uncertainty) is needed and that is what the authors develop throughout their work. Given that probability distributions are sensible means to describe the uncertainty between several possibilities (i.e. several potential successor states of a given, current state) then the type of analysis involved must also be a probabilistic one. In a probabilistic framework, the analyst is interested in discovering the measure with which certain properties (either negative or positive ones) are likely to be true. If we refer to a simple negotiation game where two players, a seller and a buyer, bargain over a single item, then a player’s decision may well be considered a probabilistic one, as an expression of the player’s uncertainty towards their opponent’s behaviour. In such a situation it

II

is relevant to assess how uncertainty affects the negotiation outcome (hence players’ payoff), which is important in being able to answer questions such as: “what is the probability that an agreement will be reached at all?”, “how likely is it that the bargained item will be sold to a given value x?”, and “what type of strategy gets a player a better expected payoff?”. In this work we show how a probabilistic model-checker, such as PRISM[12], can be used as a tool to find answers to those type of questions, once the considered probabilistic agent system has been mapped onto a Markovian model, specifically a Discrete Time Markov Chain (DTMC). So as LTL model-checking, through SPIN, has been proved to be suitable for the analysis of non-probabilistic multi-agent systems, we show here that PCTL model-checking [8], through PRISM, is a proper tool for verifying probabilistic agent systems. What is still lacking, at present, is an automatic translation of probabilistic agent programs, such as, for example, the ones proposed in [5]. into the Reactive Module language, the input formalism for PRISM. This will be the focus of our future work. The remainder of the paper is organised as follows: in the next Section a brief introduction to the PCTL model-checking and to the PRISM tool is presented. In Section 2 we first introduce the probabilistic version of the Negotiation game we have considered, then we describe the results we have obtained by analysing specific PCTL formulae through PRISM. The final section summarises our analysis and lists directions for future work.

2

Alternating-Offers Negotiation Framework

In this section we present the Negotiation framework we have considered in our work and describe the probabilistic extension we have introduced and studied. The Bargaining process we refer to is the Alternating-Offers one discussed in [11] and [6]. In the basic formulation of such a game, two players bargain over a single item by alternatively throwing an agreement proposal and making a decision over the opponent’s last proposed value. Bargainers’ interest is clearly conflicting with the seller aiming to maximise the outcome of negotiation (i.e. the agreed-value) and the buyer aiming to minimise it. Players’ behaviour is characterised by a strategy which essentially determines two aspects: a player’s next offer value and a player’s acceptance condition (which describes if the opponent’s most recent proposal is going to be accepted/rejected). Negotiation analysis aims to study properties of the players’ strategies and, in particular, is interested in addressing the existence of dominant strategies and strategic equilibrium. However this type of analysis is not tractable in incomplete information settings, which is what we are considering in this paper. Nonetheless assessing the effectiveness of strategies for automated negotiation is relevant from an AI perspective. In the following we briefly describe the family of time-dependent strategies for automated negotiation between agents introduced by Faratin et al. in [6], which is the one we use in our probabilistic framework. Players’ strategies depend on 2 intervals, respectively [minb , maxb ], where maxb is the buyer’s reservation price and minb is the buyer’s lowest acceptable offer (i.e. minb = 0), and [mins , maxs ],

III

where mins is the seller’s reservation price and maxs represents the seller’s upper bound of a valid offer. A player’s initial offer is his most profitable value, which is minb , for the buyer and maxs for the seller. The basic idea of this family of strategies is that a player concedes over time and the pace of concession determines the type of negotiation strategy. Time is thought of as a discrete quantity indicating negotiation periods (i.e. players’ turn), T denotes the timedeadline and a player’s offer at time t ∈ [0, T − 1], depicted, respectively, as xtb and xts , is defined as: xtb = minb + αb (t)(maxb − minb ) xts = mins + (1 − αb (t))(maxs − mins )

(1) (2)

where αi (t), i ∈ {b, s} is the time-dependent function in which the constant β determines the conceding pace of player i: t 1 αi (t) = ( )( β ) T By varying β ∈ (−∞, ∞) a whole family of offer functions, also called Negotiation Decision Functions (NDFs), can be obtained (see Figure 1(a) and Figure 1(b)). For β > 1, strategies are referred to as Conceder strategies whereas, for β < 1, strategies belong to the so-called Boulware class. With β = 1, on the other hand, we have linear strategies for which a player’s offer is monotonically incremented (decremented) over time. Finally, the decision making on a received offer is driven by profitability. Hence the buyer will accept the seller’s offer at time t if, and only if, it is more profitable than his next offered value. Formally: xts is (similarly xtb is accepted by s if, and accepted by b if, and only if, xts ≤ xt+1 b ). For the sake of simplicity we have chosen specific settings only if, xtb ≥ xt+1 s for the bargainer,s bounds, namely: minb = 0, maxb = 1100 and mins = 100, maxs = 1100. Such a choice is motivated by the fact that, in our modelling formalism (i.e. the Reactive Module language of the PRISM model-checker) only integer variables are admitted, hence a wide enough interval shall be considered in order to be able to observe the effect of different strategy slopes (having in mind that in the integer domain 1 is the smallest possible slope value). Finally we consider T = 20 as the time deadline for the negotiation process. 2.1

Probabilistic decision making

As a result of the decision mechanism introduced in [6], an agreement between the buyer and the seller is reached only if the offer functions (Figure 1(a) and Figure 1(b), respectively) cross each other within the time deadline (i.e. T ). If this is the case the result of negotiation is positive and a deal is implemented at the crossing point1 . 1

In a monetary negotiation the actual agreement would correspond to rounding up/down to the closest unit of exchange (a penny valued price, if we are referring to the UK market) of such crossing point, depending on the accepting agent.

IV MaxB=1100

MaxS=1100

! = 0.01

! = 100.0 1000

1000

900

! = 10.0

! = 0.2 800 Seller Offers Value ->

Buyer Offers Value ->

900 ! = 5.0

800

! = 0.1

700 !=1

600 500 400

700 !=1

600 500 400

300

! = 5.0

300

! = 10.0

! = 0.2 200

! = 0.1

200 ! = 100.0

100 MinB=0 0

MinS=100 ! = 0.01 0

1

2

3

4

5

6

7

8

9

0

10 11 12 13 14 15 16 17 18 19 T=20

0

1

2

3

4

x= time ->

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 T=20

x= time ->

(a) Buyer NDF

(b) Seller NDF

Fig. 1. Buyer-Seller Negotiation Decision Functions

In our framework we introduce uncertainty by making a player’s decision a probabilistic function of the offered value. By doing so, we enlarge the semantics of the negotiation model so as to cope with the inherent uncertainty that may characterise a player’s behaviour when it comes to deciding over a deal proposal. Such uncertainty may reflect several factors such as: lack of information about the opponent’s preferences/strategy, a change of a player’s attitude as a result of environmental changes and/or time passing. In order to reduce the complexity of our model, uncertainty, rather than explicitly being a function of time, is quantified with respect to the offered value only (however, since we are adopting time-dependent offer functions the acceptance probability is also, indirectly, dependent on time). Taking inspiration from the NDFs (1) and (2), we introduce the acceptance probability functions, S AP () for the seller and B AP () for the buyer, in the following way: S AP (xtb ) =

8 0 > :

1

if (xtb ≤ mins ) xtb −mins maxs −mins

”

1 βs

if (xtb > mins ) ∧ (xtb < xt+1 ) s if

(3)

) (xtb ≥ xt+1 s

8 0 if (xts ≥ maxb ) > < “ t ”1 t β xs b B AP (xs ) = 1 − if (xts < maxb ) ∧ (xts > xt+1 ) b maxb > : 1 if (xts ≤ xt+1 ) b

(4)

Each definition depends on the parameter βi , i ∈ {b, s}, whose value determines the type of function. A player’s attitude (in decision making) is called conservative if the likelihood of accepting low profitable offers is very small. Thus a conservative attitude, for the buyer, corresponds to βb >> 1, whereas, for the seller, is given by βs

0.8 Buyer Acceptance Probability ->

1.0

! = 0.01

0.9

0.7 0.6 0.5

!=1

0.4 0.3

! = 10.0

0.7

! = 5.0

0.6 0.5

!=1

0.4 0.3

! = 5.0 0.2

! = 10.0

0.2

0.1

0.1

! = 0.2 ! = 0.1

! = 100.0 0.0 0

100

200

300

400

500

600

700

800

900

MaxB

1100

0.0 0

MinS

200

x= Buyer Offers ->

(a) Buyer Acceptance Probability

300

400

500

600

700

800

! = 0.01 900

1000

x= Seller Counter-Offers ->

(b) Seller Acceptance Probability

Fig. 2. Buyer-Seller Acceptance Probability Functions

bility functions allows us to assess the effect of different player’s attitudes on the negotiation outcome (as we will see we verify several configurations of our model corresponding to different combinations, (βs , βb ), of the acceptance probability functions). Finally we point out that, by replacing the decision-making model of Faratin et al. with the probabilistic mechanism determined by (3) and (4), we still maintain the semantics of the original; in fact through (3) and (4) we have that the acceptance of an offer whose utility is higher than the corresponding counter-offer’s one is certain.

3

The PRISM model of negotiation

In this section, we describe the Markovian model of the negotiation framework (with probabilistic behaviour) introduced in the previous section. We developed a Discrete Time Markov Chain (DTMC) which represents the behaviour of two bargainers alternatively throwing offers according to the NDF families (1) and (2) and adopting the probabilistic decision mechanism described by the family of functions (3) and (4). We have used the PRISM model-checker to implement and verify the DTMC model of negotiation. Before describing some details of the DTMC model we provide brief background to the PRISM tool and to the Probabilistic Computational Tree Logic (PCTL), the temporal logic used for the verification of DTMC models (for a more detailed treatment of the subject the interested reader is referred to the vast literature, examples of which are [1, 8, 10, 12]). 3.1

The PCTL logic and PRISM

Markov processes [7] are a subclass of stochastic processes suitable for modelling systems such that the probability of possible future evolutions depends uniquely

MaxS

1200

VI

on the current state rather than on its past history. A system’s timing is also taken care of with Markov chain models leading to either DTMCs, for which time is considered as discrete quantity, or Continuous time Markov chains (CTMC), where time is continuous. Both DTMC and CTMC models can be encoded in PRISM by means of a variant of the Reactive Modules formalism of Alur and Henzinger [1]. For issuing queries, PRISM uses either the Probabilistic Computational Tree Logic (PCTL) [8], if the underlying model is a DTMC, or the Continuous Stochastic Logic (CSL) [2] for referring to CTMC models. These languages are variations of the temporal logics used for more conventional LTS model checking (i.e. the CTL of Clarke et al. [4]). Formally a labelled DTMC is defined as follows: Definition 1. Given a set of atomic propositions AP , a labelled DTMC M is a tuple (S, P, L) where S is a finite set of Pstates, P : S×S → [0, 1] is the transition probability matrix such that ∀s ∈ S, s0 ∈S P(s, s0 ) = 1 and L : S → 2AP is a labelling function. A path in a given DTMC M = (S, P, L) and its probability measure are formally characterised in the following definitions. Definition 2. A path σ from state s0 is an infinite sequence σ = s0 → s1 → . . . → sn → . . . such that ∀i ∈ N, P(si , si+1 ) > 0. Given σ, σ[k] denotes the k-th element of σ. The probability measure of a set of infinite paths with common finite prefix σ ↑ n = s0 → . . . → sn is defined as the product of the probability of the transitions in the prefix σ ↑ n. Definition 3. Let σ ↑ n = s0 → . . . → sn be a finite path of M. The probability measure of the set of (infinite) paths prefixed by σ ↑ n is P rob(σ ↑ n) =

n−1 Y

P(si , si+1 )

i=0

if n > 0, whereas P rob(σ ↑ n) = 1 if n = 0. The PCTL language used to refer to DTMC models is formally introduced in the next definition. Definition 4 (PCTL syntax). For a set of atomic propositions AP , the syntax of PCTL state-formulae (φ) and path-formulae (ϕ) is inductively defined as follows: φ := a | tt | ¬φ | φ ∧ φ | PEp (ϕ) ϕ := φ U ≤t φ where a ∈ AP , p ∈ [0, 1], t ∈ N∗ ∪{∞} and E∈ {≥, >, ≤, 0 (φU φ) and A(φU φ) ≡ P≥1 (φU φ) A two piece line is a good approximation for extreme bargaining tactics (i.e. ψ ∼ 0 or ψ >> 1), which is the type of non-linear tactics we address in this work. Less extreme strategies may be better approximated by multi-piece lines, which would require minimal modifications to our model in order to be coped with.

VIII

probability for each possible negotiation outcome are derived). Model’s configuration: the model we developed is designed to be highly configurable through a number of constants. Before running a verification experiment a configuration is chosen by setting up: the Buyer and Seller NDFs (each of which requires three parameters, first-piece-slope, second-piece-slope, switchtime), the buyer and seller reservation-price and initial-offer, the buyer and seller acceptance attitude (i.e. the parameter β for both acceptance probability functions (3) and (4)) and the time-deadline (which we set to 20 for all experiments). In order to be able to assess the effect of substantially different NDFs (i.e. to compare how NDF’s slopes affect the result of negotiation) we had to allow for a wide enough acceptance interval (i.e. the interval [mins , maxb ]). As a result we have chosen a default settings of [1, 1000] for such an interval. This allows us to sensibly verify the effect of slopes values up to 2 orders of magnitude different (i.e. 1-10-100). This is the reason why the graphs reporting the results of our analysis (see next section) refer to such a setting (the x-axis range is referred to [0, 1000]).

MaxS=1000

Buyer-Conceder(500/1/2) Seller-Conceder(500/1/2)

900 800 700 600 500 400 300 200 MinS=100

0

1 Tsw=2 3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19 T=20

Fig. 3. NDFs’ piecewise linear approximation

4

Analysis

In this section we describe the result of the verification performed through the PRISM model-checker on the DTMC model of Negotiation. We observe that, the probabilistic decision making mechanism encoded within the bargaining DTMC model results in a probability distribution over the set of possible outcomes of the negotiation (i.e. the interval [mins , maxs ] ⊂ N), hence over a strategy profile’s payoff.

IX

We have used the verification facilities of PRISM to evaluate the distribution of probability of two distinct aspects of the model behaviour: the value at which an agreement is reached, and the delay for reaching an agreement. These measurements can be automatically derived by running a number of PCTL formulae verifications (the so-called PRISM ‘experiment’ facility, by means of which a parametric PCTL formula is iteratively verified for each specified value of its parameters). The corresponding PCTL-like PRISM temporal formula for verifying the probability of reaching an agreement at value x is as follows: P =?(tt U (agreement)∧(P U RCHASE = x)),

(5)

whereas the temporal formula for determining the probability of an agreement to be reached at time t is: P =?(tt U (agreement)∧(T IM E = t)).

(6)

The reader should not be misled by formulae (5) and (6), in which the PCTL syntax described by Definition 4 is slightly abused by means of the ’=?’ notation. This is the way PRISM allows for specifying an experiment with respect to one or more parameters. For example, the result of running a PRISM experiment on (5), is that the probability of reaching (at some point in the future) a state in which an agreement is implemented at x, is computed for every value of the x-range which has given as input of the experiment (hence by choosing [mins , maxb ] as input range for an experiment over (5), we end up deriving the distribution of probability over the set of possible agreement values). In the following we report about the results of the model analysis obtained by verification of (5) and (6) through PRISM experiments. These are grouped in three different categories, each one of which copes with a different aspect of the model analysis. Agreement distribution as a function of players NDFs: here we discuss experiments which aim to assess how players’ NDFs affect the probabilistic outcome of negotiation. For this reason, in these experiments we have compared model’s configurations by varying combinations of NDFs while sticking with a specific (fixed) combination of Acceptance Probability Fucntion (specifically the one corresponding to βb = 0.2 and βs = 5). The results we present are grouped according to different combination of NDFs. Each combination is denoted by a pair Fb (xb /yb ), Fs (xs /ys ) where Fa (a ∈ {b, s}) denotes the type of function for player a (i.e. either Lin, Boul, or Conc), whereas xa and ya denote, respectively the first and second piece’s slopes4 . Hence, for example, Lin(10)-Lin(100) denotes a profile for which both players are using a linear offer function, the buyer with slope 10 aqnd the seller with slope 100, whereas the profile Boul(1/100)Conc(100/1) corresponds to the buyer using a Boulware offer function with first slope 1 and second slope 100 and the seller using a conceder tactic with first and second slope respectively 100 and 1. Figure 4(a) and 4(b) compares the cumulative probability distribution (over the default interval, [1, 1000], of possible 4

as linear functions consist of a single piece, then for Lina only xa is used.

X

1

Agreement Cumulative Probability

Agreement Cumulative Probability

1

0.8

0.6

0.4

0.2

0

Lin(10)-Lin(10) Lin(50)-Lin(50) Lin(100)-Lin(100) 0

100

200

300

400

500

600

700

800

900

1000

0.8

0.6

0.4

0.2

0

Lin(100)-Lin(10) Lin(50)-Lin(10) Lin(10)-Lin(50) Lin(10)-Lin(100) 0

100

200

Offer Value

300

400

500

600

700

800

900

Offer Value

(a) Linear Symmetric NDF

(b) Linear Asymmetric NDF

Fig. 4. Lin(x)-Lin(y) NDF profiles

agreements), for several configurations of, respectively, symmetrical and asymmetrical, linear NDFs. Some general indications can be drawn from these pictures. For example if players use symmetrical NDFs then a higher concession pace tends to favour the seller. This is also confirmed by the expectation values which for the symmetrical case show a (slow) increasing trend Exp(Lin(10)) ∼ 498, Exp(Lin(50)) ∼ 499 and Exp(Lin(100)) ∼ 500 confirming the seller’s advantage with fast concession behaviour. In case of asymmetrical NDFs (Figure 4(b)), instead, it is generally more convenient for a player to minimise his concession pace, as this is going to get him a more profitable expected outcome. For example if we compare the curves for profiles Lin(50)-Lin(10), and Lin(100)-Lin(10)) the probability tend to cumulate closer to the supremum of the interval (which is good for the seller). Again this is confirmed by looking at the expectation values, which show the same tendency, with Exp(Lin(50)Lin(10)) ∼ 692, whereas Exp(Lin(100)Lin(10)) ∼ 804. Agreement delay as a function of players NDFs: here we discuss the effect that players’ NDFs have on the delay for reaching an agreement. Again we consider several combinations of NDFs, but this time we deal with the verification of 6. Figure 5 depicts the expected time-to-agreement as a function of the players’ concession pace. It indicates that a faster concession results in a quicker agreement. It should be noted that the discrepancy between the symmetrical case (i.e.

1000

XI

Expected Time to Agreement

Conc(x/1)−Conc(x/1)) and the asymmetrical one (i.e. Conc(x/1)−Conc(1/1)) is a due to the early crossing of NDF curves. In fact, when both player speed up concession at the same pace (symmetrical), NDFs curves intersect earlier (and for x > 25 within the time-deadline T = 20) than they do when only one player (asymmetrical) is increasing the pace of concession.

20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Conc(x/1)-Conc(x/1) Conc(x/1)-Conc(1/1) 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95100

Concession Pace

Fig. 5. Expected Time-to-Agreement as a function of concession pace

Agreement distribution as a function of players uncertainty: here we discuss the effect of players’ uncertainty (i.e. Acceptance Probability functions) on the outcome of negotiation. We performed a number of experiments for the verification of (5), but this time we compared combinations of values for the parameters (βb , βs ) of (3) and (4), while imposing a constant configuration for the NDFs. Figure 6(a) reports about the expected value of an agreement as a function of the β parameters of (3) and (4). We recall that, in that respect, a conservative attitude corresponds to βb > 1 , for the buyer, and to βs < 1 for the seller. The curves in Figure 6(a) allows for comparing the effect of increasing the conservativeness of a player while the other one’s is maintained constant (in this specific case, linear). As expected, we can conclude that, conservativeness (in probabilistic decision making) is desirable for a player, as it improves the expected value of a deal (decreasing it when the buyer becomes more conservative and increasing when the seller becomes more conservative).

XII

1000 900

Expected Time to Agreement

Expected Agreement Value

800 700 600 500 400 300 200 Beta=(x,1) Beta=(1,1/x) Beta=(x,1/x)

100 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95100

Acceptance Probability Parameter (beta)

(a) Expected Agreement

20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Beta(x,1) Beta(1,1/x) 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95100

Acceptance Probability parameter (beta)

(b) Expected Time-to-Agreement

Fig. 6. Effect of uncertainty on Agreement value and Agreement delay

Agreement delay as a function of players uncertainty: similarly, here we discuss the effect that players’ probabilistic decision making on the delay required for reaching an agreement. Figure 6(b) reports about the expected time-to-agreement as a function of the β parameter of (3) and (4). Again the indication of these results is confirms the intuitiveness, which is: a more conservative attitude results in a (slightly) longer expected delay. It should be pointed out that the small variability of curves in Figure 6(a) and Figure 6(b) is a direct consequence of the small variability of the acceptance probability functions (3) and (4) corresponding to extreme values of β (i.e. βb >> 1 and βs