Plan Projection as Deduction and Plan Generation

Plan Projection as Deduction and Plan Generation as Abduction in a Context-Sensitive Temporal Probability Logic Liem Ngo and Peter Haddawy Department of Electrical Engineering and Computer Science University of Wisconsin-Milwaukee Milwaukee, WI 53201 fliem, [email protected] 414 229-4955 March 5, 1995 Abstract

This paper presents a theoretical framework for representing plans in a context-sensitive temporal probability logic and for reasoning about plans by constructing Bayesian networks from these knowledge bases. We provide a sound and complete network construction algorithm for evaluating plans which uses context information to index only the relevant portions of the knowledge base. We formalize plan generation as a process of abduction in our language and provide a sound and complete anytime plan generation algorithm. We show that the provided framework is capable of representing a wide range of previously proposed probabilistic action models.

Submitted to UAI95.

This work was partially supported by NSF grant IRI-9207262.

1 Introduction There has recently been great interest in using Bayesian networks to perform plan evaluation [9, 3, 4] as well as plan generation [2]. The use of Bayesian networks as the underlying representation provides a uniform and relatively compact representation of both actions and domain relations. While strong contributions have been made concerning how best to formulate network models of actions and how to construct models for evaluating plans from knowledge bases of network fragments, two major issues remain unaddressed. First is the practical problem that networks for evaluating plans tend to become extremely large and thus inecient for inference. Second is the theoretical problem that none of the work on generating networks for plan evaluation has provided a formal semantics for the knowledge base from which the networks are generated. We address these issues by representing a class of Bayesian networks with a knowledge base of context-sensitive temporal probability logic sentences. We provide a network construction algorithm for evaluating plans which uses context information to index only the relevant portions of the knowledge base, thus greatly limiting the size of the generated networks. We formalize plan generation as a process of abduction in our language and sketch a plan generation algorithm. We show that the provided framework is capable of representing all the action models discussed above, as well as models used in a wide range of decision-theoretic planners. We use a running example, which is similar to the one in [4], to illustrate our representation framework. Assume that a robot is secretly attempting to fetch an object. We consider only one possible action the robot can perform: use its gripper to pick up the object. The robot must avoid detection by a sound sensor which can activate an alarm. It has some partial information about the location, size and weight of the object. We need to equip the robot with a knowledge base of action models and domain knowledge. Figure 1.a shows a possible Bayesian network model of the pickup action. The nodes represent the variables involved in the performance of pickup. Object location in the next stage is aected by the action and probabilistically depends on the object's size, weight and position before the action is performed. If the pickup action causes the object to fall on oor, the sound alarm might be activated. Figure 1.b displays some intrinsic relationships between random variables in one time slice. Figure 1.c depicts the persistence rules: statuses of random variables in future stages depend on current 1

object location t

object location t+1 object location t

object weight

object size

(a)

sound sensor t+1

object size

object weight

sound sensor t

alarm t

sound sensor t

alarm t

(c)

(b)

object location t+1

sound sensor t+1

alarm t+1

Figure 1: Bayesian network models of (a) a pickup action, (b) intrinsic causal relationships, and (c) persistence relationships. statuses, if there is no intervening action.

2 The Representation Language We have three types of predicates: probabilistic, context, and action predicates. Some of the predicates are timed predicates. A timed predicate always has the rst attribute indicating the time the associated event or relationship denoted by the predicate occurs. We model only discrete time and throughout the paper we represent the set of time points by the set of integers. If t is a time point, t+5 denotes the fth time point after t. If A is a ground timed atom and the time attribute is t, we say A (happens) at time t. In our running example, if we can assume that object size and weight never change in the model then the best way to represent them is by using non-timed predicates. This way we avoid specifying their values at every time point and thus greatly simplify the reasoning process. Other p-predicates: alarm, sensor, and location are timed. Context predicates (c-predicates) have value true or false and are deterministic. They are used to describe the context the agent is in and to eliminate the unnecessary probabilistic information from consideration by the agent. An atom formed from a context predicate is called a context atom (c-atom). A context literal is either a c-atom or the negation of a c-atom. A context base is a normal logic program, which is a set of universally quanti ed sentences of the form C0 L1; L2; :::; Ln; n 0, where stands for 2

implication, comma stands for logical conjunction, C0 is a c-atom, and the Li are context literals. The context base is the component of the knowledge base de ning the c-predicates. We use completed logic programs proposed by Clark [10] for the semantics of the context base. Action predicates are also deterministic and are always timed. A probabilistic predicate (p-predicate) represents a class of similar random variables. p-predicates appear in probabilistic sentences and are the focus of inference processes. An atom formed from a probabilistic predicate is called a probabilistic atom (p-atom). Normally, each random variable in a probability model can take values in a nite set and in each possible realization of the world, that variable can have one and only one value. We capture that property by requiring that each p-predicate has at least one attribute. The last attribute of a p-predicate represents the value of the corresponding variable. For example, the variable alarm can have the value yes or no and can be represented by a two-position predicate|the rst position indicates the time and the second indicates the status of the alarm. We must specify the range of the values of random variables represented by a p-predicate, so associated with each p-predicate p must be a statement of the form V AL(p) = fv1; : : :; vng, where v1; : : :; vn are constants. Let A = p(t1; : : : ; tm?1; tm) be a p-atom, we use obj (A) to designate the tuple (p; t1; : : :; tm?1) and val(A) to designate tm. So if A is a ground p-atom then obj (A) represents a concrete random variable or object in the model and val(A) is its value. We also de ne Ext(A), the extension of A, to be the set f p(t1; : : : ; tm?1; vi)j1 i ng. If A is an atom of predicate p, then V AL(A) means V AL(p). We assume that p-predicates are typed, so that each attribute of a p-predicate is assigned values in some well-de ned domain. We denote the set of all such predicate declarations in a knowledge base by PD. Let A be a (probabilistic or context) atom. We de ne ground(A) to be the set of all ground instances of A.

Example 1 The following are declarations of p-predicates in our robot ex-

ample. PD = flocation(T : Time; V ); V AL(location) = ffloor; inGripperg; weight(V ); V AL(weight) = fheavy; lightg; size(V ); V AL(size) = fsmall; largeg; alarm(T : Time; V ); V AL(alarm) = fyes; nog; sensor(T : Time; V ); V AL(sensor) = factivated; deactivatedgg

A probabilistic sentence has the form (P (A0jA1; : : :; An) = ) L1; : : :; Lm where n 0; m 0; 0 1; Ai are p-atoms, and Lj are context literals. The sentence can have free variables and each free variable is considered to 3

be universally quanti ed over the entire scope of the sentence. Let S be the above probabilistic sentence, we de ne context(S ) to be the conjunction L1; : : :; Lm , ante(S ) to be the conjunction A1; : : :; An and cons(S ) to be A0.

3 Representation for Planning Problems

3.1 Actions Representation

Actions are grouped into classes. The actions in a class dier in action parameters and are called instances of the action class. For example, the class of actions move can have many instances: each describes a speci c action at a speci c time, handles a speci c object and involves speci c source and destination locations. For each action class, we introduce a new timed action predicate and we always assume that the name of the predicate is exactly the name of the action class. The rst attribute denotes the time when the action occurs and the other possible attributes denote the parameters of the action. For example, the predicate move(:; :; :; :) represents the action class move . The atom move(5; cup; locA; locB ) represents the action instance: move, at time 5, the cup at position locA to position locB . Henceforth, we use the term action to refer to action instances. We describe each eect of an action in our language by a probabilistic sentence of the form P (p(t1; : : :)j : : : ; q(: : :); : : :) = a(t2; : : :); c1(: : :); : : :; cm(: : :) where a(: : :) is an action atom and is the only action atom in the sentence, p and q are p-predicates and the ci are c-predicates. q and the ci may not be timed, but p and a must be timed. If the above sentence is ground, we call p(: : :) a consequent of a(: : :), each q(: : :) a probabilistic condition of it, and the conjunction c1(: : :); : : :; cm (: : :) the context of its corresponding eect. Certain conditions may determine whether an action can be executed; we call these executability conditions (or preconditions). If an action is executed, certain conditions may determine the eects that the action will have; we call these rami cation conditions [7]. We divide the executability conditions into two types: uncertain and deterministic. For example, the executability conditions of the pickup action might be the gripper is available and the object is on oor. In our planning scenario we can assume that the information about the availability of the gripper is deterministic and the information about the location of the object is uncertain. The deterministic executability conditions are represented by context literals and the uncertain executability conditions are represented by p-atoms. The rami cation conditions can also 4

be classi ed as uncertain and deterministic. The uncertain conditions appear in conditional probability statements and are represented by p-atoms|some q(: : :) in the above sentence. Our proposed concept of context-sensitivity can be used to represent deterministic rami cation conditions: they are described by context literals. For example, a drive action may be performed on country roads or highways. Depending on the context, the consequents of action: time of arrival, the fuel consumed, etc. might have dierent values with dierent chances. A more thorough discussion of action context can be found in [6]. We call an action model any set of probabilistic sentences whose context includes an action atom.

De nition 1 An action model is said to be discrete time normative (dtnormative) if it can be represented as a nite set of sentences of the form P (p(t1; : : :)j : : : ; q(t2; : : :); : : :) = a(t3; : : :); : : : ; c(t4; : : :); : : : where a is an action predicate, t2 t1 , and t3 ; t4 < t1. Here q(t2; : : :) is a representative condition and c(t4; : : :) is a representative c-atom.

Example 2 A possible representation of the pickup action is given in Fig-

ure 2. In this formulation, location(t; floor) is the probabilistic executability condition of pickup(t) and gripperavailable is its deterministic executability condition. The atoms weight(:) and size(:) are rami cation conditions. An interesting justi cation for allowing the case t1 = t2 in our dt-normative action models is provided by the last sentence: The pickup action in uences the location of the object in the next stage and if that object falls to the

oor, that location of the object will also in uence the status of the sound sensor in that stage. Notice that there are no sentences with the condition location(t; inGripper), i.e. the executability condition is falsi ed. The status of location(t + 1; :) in that case may be determined by persistence rules of the domain.

In our model, one action can have eects far in the future (delayed effects [14]) and during that time interval other actions can occur. We also allow many actions directly aecting the same variable. For example, a plan to plant rice can start with selecting good seeds, fertilizing the eld and frequently killing the weeds. All those actions aect the nal yield of rice. We can also represent actions with variable duration straightforwardly in our model as the following example illustrates. 5

P(location(t + 1; inGripper)j location(t; floor); weight(heavy); size(small)) = :9 pickup(t); gripperavailable P(location(t + 1; floor)j location(t; floor); weight(heavy); size(small)) = :1 pickup(t); gripperavailable P(location(t + 1; inGripper)j location(t; floor); weight(light); size(small)) = :93 pickup(t); gripperavailable : : : and P(sensor(t + 1; activated)j location(t + 1; floor); location(t; floor); weight(heavy); size(small)) = :99 pickup(t); gripperavailable :::

Figure 2: The representation of pickup in our language.

Example 3 Suppose we drive a car to a city A and there is a .5 chance that

we will arrive at A after 5 hours and .5 chance that we will be on the road at time 5 and arrive at A after 6 hours. A possible set of sentences describing this situation is P(location(t + 5; cityA)) P(location(t + 5; onRoad)) P(location(t + 6; cityA)jlocation(t + 5; onRoad)) P(location(t + 6; onRoad)jlocation(t + 5; onRoad))

= :5 = :5 =1 =0

startdrive(t) startdrive(t) startdrive(t) startdrive(t)

The rst two sentences say that if we depart at time t then there is a 50% chance we will be on the road at time t + 5 and at 50% chance we will arrive at A at t + 5. The last two sentences are conditioned on the fact that at time t + 5 we are still on the way. In that case we will surely arrive at A at t + 6.

3.1.1 Representing Graphical Action Models

One popular representation scheme for actions in planning under uncertainty is in the form of hcondition; effect; probability valuei tuples [9, 8]. Through an example, we show how to convert an action model in that representation to our language. In addition to being able to represent actions with propositional eects, our language can represent actions with simple metric eects. We can represent actions with metric eects by using function symbols, as long as the corresponding domains are nite. Consider a simpli ed version of an example from [8] about the action drive on valley road (we make the simpli cation only to shorten the example). We need to transport potatoes to 6

.2

t’ = t+5 fuelt’ = fuelt –9 potato_deliveredt’= .9 potatoe_loadedt

sunnyt= TRUE .8 drive on valley road

t’ = t+5 fuelt’ = fuelt –9 potatoe_deliveredt’= potatoe_loadedt

.1

t’ = t+5 fuelt’ = fuelt –9 potatoe_deliveredt’= potatoe_loadedt

.9

t’ = t+5 fuelt’ = fuelt – 10 potatoe_deliveredt’= potatoe_loadedt

sunnyt= FALSE

Figure 3: A graphical action model: the action drive on valley road. city B. One possible action (depicted in gure 3) is drive through a valley road. The condition of that action is the status of sunshine and the consequents are the fuel level left and the amount of potatoes delivered to B. We assume the probabilistic independence of consequents to make the set of sentences more compact. The corresponding set of sentences is given in gure 4. We use the predicates , p l and p d to represent the amount of left in the truck's tank, the amount of potatoes loaded and the amount of potatoes delivered, respectively, at each time point. We require the sets of values of those predicates be nite and can be represented by function symbols. Literally, V AL(fl) = f0; f (0); f (f (0)); : : : ; f n (0)g and V AL(p l) = V AL(p d) = f0; g(0); g(g(0)); : : : ; gm(0)g, where n and m are some constant integers, f and g are function symbols associated with each predicate. 0 denotes the lowest levels of fuel or potatoes. f and g denote the next higher levels. In this case, V + 9 is a shorthand notation for f 9(V ) or g9(V ), respectively. If g(V ) denotes the amount x of potatoes then V denotes :9x potatoes.

3.1.2 Representing Markov Processes

Another important class of action models which is frequently considered is the Markov processes models . These processes can be represented in our language by the Markovian dt-normative action model. The previous model of pickup is Markovian dt-normative.

De nition 2 An action model is Markovian dt-normative or simply Markovian, if it can be represented as a nite set of sentences of the form P (p(t + 1; : : :)j : : : ; q(t1; : : :); : : :) = a(t; : : :); : : :; c(t; : : :); : : : where a is the ac7

P(fl(t + 5; V ) P(p d(t + 5; V 0 ) P(p d(t + 5; V 0 )) P(fl(t + 5; V ) P(fl(t + 5; V ) P(p d(t + 5; V 0 )

jsunny(t; true); fl(t; V + 9); p l(t; V 0 )) jsunny(t; true); fl(t; V ); p l(t; V 0 + 1)) jsunny(t; true); fl(t; V ); p l(t; V 0)) jsunny(t; false); fl(t; V + 9); p l(t; V 0 )) jsunny(t; false); fl(t; V + 10); p l(t; V 0)) jsunny(t; false); fl(t; V ); p l(t; V 0 + 9))

= 1 drive(t) = :2 drive(t) = :8 drive(t) = :1 drive(t) = :9 drive(t) = 1 drive(t)

Figure 4: The action model of drive on valley road. tion predicate, and t1 is either t or t + 1. Here q(t1; : : :) is a representative condition and c(t; : : :) is a representative c-atom.

3.1.3 Completely Quanti ed and Consistent Action Models

In this paper, we consider only complete models of action. This means if we have a description specifying that under some conditions an action sets a random variable to a speci c value with a probability then we also require the probability that the same variable is set to any other possible value under the same conditions by the same action be speci ed. We formalize this requirement by the following de nition.

De nition 3 An action model is called completely quanti ed if for an arbitrary ground instance P (p(t; : : :; v)j : : :) = : : : of some sentence in

the set, conforming to the type speci cations, there exists another ground instance of some sentence in the same set which can be constructed from the above sentence by replacing v by another value v 0 in V AL(p) and possibly by some 0. A completely quanti ed action model is consistent if for an arbitraryPground instance P (p(t; : : : ; v)j : : :) = : : : of some sentence in the set, fj is the probability value of dierent ground instances of sentences which might dier from the above sentence only in v and the probability valueg = 1.

3.2 Representing Domain Knowledge

Thus far we have only discussed how to describe individual actions but interaction between actions and other causal relationships between variables in a speci c domain are important for projecting the consequences of a plan. At least two kinds of domain knowledge are of interest in plan evaluation: intrinsic causal relationships in the domain and persistence rules. The relationship between object size and object weight is an intrinsic causal relationship not 8

in uenced by any possible action of the robot. We call a variable a derived eect of an action if it is not a consequent but one of its direct causes is a consequent of the action. Variables directly in uenced are represented as consequents in the action description. For example, the status of alarm is a derived eect of the action pickup. Persistence rules are an important component of temporal reasoning. The basic assumption of persistence is that the state of variables not aected by actions or other events will tend to remain unchanged over time. In our example, we should have persistence rules for object location, sound sensor and alarm. Example 4 A possible set of sentences for the persistence of location is: jlocation(t; floor)) jlocation(t; floor)) jlocation(t; inGripper)) jlocation(t; inGripper))

P(location(t + 1; floor) P(location(t + 1; inGripper) P(location(t + 1; floor) P(location(t + 1; inGripper)

=1 =0 = :01 = :99

The uncertainty in the future location of the object when it is held in the robot's gripper can be attributed to the uncertainty of exogenous events.

Notice that the conditions of the persistence rules may contain external events and our representation language allows such description. Darwiche and Goldszmidt [3] present a nice discussion on modelling persistence. We can easily represent their approach by sentences in our language.

3.3 The Combining Rules

An important component of a KB is the set of combining rules. Combining rules are used, for example, to infer P (AjB; C ) from P (AjB ) and P (AjC ). They re ect the conception of the KB builders concerning the interaction between distinct causes of the same variable. The necessity of combining rules arises from the fact that in practice conditional probabilities for multiple causes of a given variable are hard to obtain. Several generic combining rules, such as generalized Noisy-OR [15], have been proposed. We de ne a combining rule as any algorithm that takes as input a set of ground probabilistic sentences with the same consequent fP (A0jAi1; : : : ; Ain ) = i : : : j1 i mg such that [mi=1fAi1; : : :; Ain g is coherent and produces as output P (A0jA1; : : :; An) = , where A1,: : : , and An are all dierent and fA1; : : : ; Ang is a subset of [mi=1 fAi1; : : :; Ain g. In building a temporal network to evaluate a plan, we need the combining rules to infer the combined eects of actions and domain causal relationships. i

i

i

9

The combining rules are generally dependent on the concrete domain. Some plausible rules are: (1) Actions take precedence over other causes. This means that if the conditions of an action are satis ed, the link matrix entries of each consequent of the action are given by the set of sentences describing the action irrespective of other causes. (2) When the conditions of an action are not satis ed, the link matrix entries of each of its consequents are determined by the interaction between the intrinsic causal relationships in the domain model and the persistence rules. (3) The combined eect of the intrinsic causal relationships in the domain model and the persistence rules on a variable can be determined by some specialized or generic rules like generalized Noisy-OR.

3.4 The Planning Framework

For each planning domain, we can build a KB = hPD; PB; CB; CRi, where PB = ACT [ DB . ACT contains the action descriptions and DB contains the domain knowledge. CB is the context base and CR is the set of combining rules. We say that a KB is dt-normative if both ACT and DB are dt-normative, a KB is Markovian dt-normative if both ACT and DB are. Similarly, we can de ne the concept of completely quanti ed and consistent KB. We omit the detailed de nitions due to space limitations. In many probabilistic [9] and decision-theoretic [8] planning systems, a planning problem is characterized by an initial state probability distribution. We represent such a distribution by a set of probability sentences without conditions at time 0 in KB.

Example 5 The starting state of our robot planning problem can be described as: fP (alarm(0; activated)) = 0; P (alarm(0; deactivated)) = 1; P (location(0; floor)) = 1; P (location(0; inGripper)) = 0; P (sensor(0; activated)) = :5; P (sensor(0; deactivated)) = :5g We also can represent the knowledge about uncertain future events by storing in KB the probability sentences without conditions at future time points. In the robot example, we might know that at some time points (for example, from 6:00 PM to 8:00 PM), there is a chance that somebody works late in the oce. Although our framework does not exclude plans with concurrent actions, for similicity of exposition we will consider only serial plans. A (serial) plan 10

is a set of ground action atoms such that there are no two dierent actions occurring at the same time and any action is at a nonnegative time. When a plan is performed, there may be some environmental constraints or observations which the plan cannot alter. In our robot example, we can observe or require that the sound sensor be deactivated whenever a plan for the robot is carried out. In that case, we want to override the statements P (sensor(0; activated)) = :5; P (sensor(0; deactivated)) = :5 by new statements P (sensor(0; activated)) = 0; P (sensor(0; deactivated)) = 1 for plan evaluation or generation purposes. We also might know that the guard will visit the room at time points 4, 9, : : :, and the robot has no means to prevent that. We account for these constraints or observations by the concept of a planning environment [12, 7]. This concept is similar to the notion of evidence in a standard Bayesian network.

De nition 4 A planning problem's environment E is simply a coherent set

of p-atoms, i.e. there are no two ground instances A and B of atoms in E such that obj (A) = obj (B ) but val(A) 6= val(B ).

Example 6 A possible environment of our robot problem is E = fsensor(0; activated); guardarrives(0; no); guardarrives(1; no); guardarrives(2; no), guardarrives(3; no),guardarrives(4; yes); : : :g:

A plan is performed in some context. The context is determined by the context base in KB in combination with a set of context information pertaining to each planning situation. A set of context information C is simply a set of c-atoms. We de ne a planning framework as consisting of KB, an environment E, and a set of context information C, and denote it as hKB; E; C i.

4 Declarative Semantics

4.1 The Relevant Part of a Knowledge Base

It is reasonable to assume that we are only interested in things happenning within a nite horizon; this means there exists a large enough positive integer t such that things happenning outside the time interval [0; t] are not relevant to our semantics. Throughout the rest of the paper, when we talk about a timed atom we always assume that it is at a time t; 0 t t. 11

For any given planning problem, only a portion of the KB will be relevant. The relevant part of the KB is determined by the given context information C, the environment E and the plan PL.

De nition 5 Given hKB; E; C i and a plan PL, the set of relevant p-atoms (RAS) is de ned recursively: (1) ground(E ) RAS . (2) If S is a ground

instance of a probability sentence, conforming to type-constraints, such that context(S ) is a logical consequence of completed(C [PL[CB ) and ante(S ) RAS then cons(S ) 2 RAS . (3) If a p-atom A is in RAS then Ext(A) RAS . (4) RAS is the smallest set statisfying the above conditions.

The RAS is constructed in a way similar to Herbrand least models for Horn programs. Context information is used to eliminate the portion of PB which is not related to the current problem. Completed(C [ PL [ CB ) is the completed logic program with the associated equality theory [10] constructed from C [ PL [ CB . If (P (A0jA1; : : : ; An) = ) L1; : : : ; Lm is a ground instance of a sentence in PB and L1; : : : ; Lm can be deduced from completed(C [ PL [ CB ), then that sentence asserts that in the present context A0 is directly in uenced (or caused) by A1; : : :; An. If, in addition, A1; : : : ; An are relevant atoms then it is natural to consider A0 as relevant. Condition (1) is obvious: any ground instance of the environment is relevant. Given hKB; E; C i and a plan PL, the set of relevant probabilistic sentences (RPB) is de ned as the set of ground probabilistic sentences S such that context(S ) is a logical consequence of completed(C [ PL [ CB ), cons(S ) 2 RAS , and ante(S ) RAS . The relevant PB contains the basic causal relationships between p-atoms in RAS. In the case of multiple causes represented by multiple sentences, we need combining rules to construct the combined probabilistic in uence. Given hKB; E; C i and a plan PL, the combined relevant PB (CRPB) is constructed by applying the corresponding combining rules to each maximally coherent set of sentences in RPB which have the same atom in the consequent. Combined relevant KBs play a similar role to completed logic programs [10]. Each sentence in CRPB describes all random variables which directly cause the random variable in the consequent. The eect model of a plan PL in a framework hKB; E; C i is speci ed by the semantics of CRPB.

4.2 Probabilistic Independence Assumption

In addition to the probabilitistic quantities given in a PB, we assume some probabilistic independence relationships speci ed by the structure of 12

probabilistic sentences. Probabilistic independence assumptions are used as the main device to construct a probability distribution from local conditional probabilities. We formulate the independence assumption in our framework by using the structure of CRPB. De nition 6 (In uenced by) Given a set of ground probabilistic sentences, let A and B be two p-atoms. We say A is in uenced by B if (1) there exists a sentence S , an atom A0 in Ext(A) and an atom B 0 in Ext(B ) such that A0 = cons(S ) and B 0 2 ante(S ) or (2) there exists another p-atom C such that A is in uenced by C and C is in uenced by B . Assumption 1 Given an environment E, a set of context information C, a plan PL, and a KB, we can construct CRPB. We assume that if P (A0jA1; :::; An) = is in CRPB then for all ground p-atoms B which are not in Ext(A0) and not in uenced by A0, A0 and B are probabilistically independent given A1; :::; An.

4.3 Semantics

The RAS contains all relevant atoms for an inference problem. We assume that in such a concrete situation, the belief of an agent can be formulated by using possible models on RAS. De nition 7 Given an environment E, a set of context information C, a plan PL, and a KB, a possible model M of the corresponding CRPB is a set of atoms in RAS such that for all A in RAS, Ext(A) \ M has one and only one element. A probability distribution on the possible models is realized by a probability density assignment to each model. Let P be a probability distribution on the possible models, we de ne P (A1; : : :; An), where A1; : : :; An are atoms P in RAS, as fP (M )jM is a possible model containing A1; : : : ; Ang. We de ne P (A0jA1; : : :; An) as 0 if P (A1; : : :; An) = 0, or as P (A0)=P (A1; : : :; An) if P (A1; : : :; An) > 0. We say P satis es a sentence P (A0jA1; : : : ; An) = if P (A0jA1; : : :; An)= P (A1; : : :; An) and P satis es CRPB if it satis es every sentence in CRPB. De nition 8 A probability distribution induced by the environment E, the set of context information C, a plan PL and a KB is a probability distribution on possible models of CRPB satisfying CRPB and the independence assumption implied by CRPB. 13

We will use this possible world semantics to prove the soundness and completeness of the Bayesian network construction algorithm for plan projection which we discuss next.

5 Plan Projection

In a planning framework hKB; E; C i, plan projection is the problem of calculating the consequences of performing a plan PL. The following de nition formalizes the concept of consequences of a plan. De nition 9 Given a planning framework hKB; E; C i and a plan PL, let F be a ground formula of probabilistic atoms. An assertion \The probability of F after performing PL in context C and environment E is " is a logical consequence of the performance of PL if for all probability distributions P induced by E; C [ PL, and KB, P (fM j M is a possible model and F and E are true in M g) = P (fM jE is true in M g) In this paper we consider only the case in which F is a ground atom. De nition 10 Given a planning framework hKB; E; C i and a plan PL. (1) A plan projection procedure is called sound wrt ground p-atoms if for any ground p-atom A, if the procedure returns a value then the assertion \The probability of A after performing PL in context C and environment E is " is a logical consequence of the performance of PL. (2) A plan projection procedure is called complete wrt ground p-atoms if for any ground p-atom A, if the assertion \The probability of A after performing PL in context C and environment E is " is a logical consequence of the performance of PL then the procedure returns the value after a nite amount of time. We have developed a query answering procedure which uses a Bayesian network construction algorithm to construct a Bayesian network and evaluate the posterior probability of a ground or non-ground atomic query. Essentially, the procedure uses a backward chaining process, starting from the query, to build the relevant portion of the Bayesian network and utilizes some probability propogation algorithm on the network to update probability values. We are able to prove the soundness and completeness of the procedure for a large class of KB's, characterized by the acyclicity property [11], which de nes a major class of logic programs [1]. 14

Theorem 1 Given a planning framework hKB; E; C i and a plan PL, if the

dt-normative KB is acyclic and E and C are nite sets of ground atoms and all the domains, except time, are nite and CRPB is completely quanti ed and consistent then the procedure is (1) sound wrt ground p-atoms, and (2) complete wrt ground p-atoms.

In the full paper, we give simple syntactic rules for checking acyclicity. In case the language contains function symbols, some domains may be in nite. We have developped syntactic criteria for soundness and completeness properties. Those criteria encompass a large class of logic programs and we hope that they also allow a major class of probabilistic knowledge bases. We present them in the full paper.

6 Plan Generation In this section, we show that plan generation can be rigorously formulated as an abduction process in a formal theory. The abductive view of plan generation was proposed by [5] for deterministic planners in an attempt to use logic programming techniques in planning. The use of abduction for decision-theoretic planning is discussed in [13].

De nition 11 Given hKB; E; C i and a plan PL, we say that a plan PL is related to a ground probabilistic atom A if A is in the RAS constructed from E; C [ PL, and KB. A plan PL is related to a ground formula F of probabilistic atoms if PL is related to an arbitrary atom A in F.

An abductive framework is characterized by a set of abducible atoms, a theory T, and a set of integrity constraints. In a planning problem, the theory T consists of a knowledge base KB, an environment E, and a set of context atoms C. We can have dierent views on de ning the set of abducible atoms. In the simplest case, the set of abducible atoms is the set of ground instances of the action atoms. But, we also can de ne a larger set by including ground context atoms and/or probabilistic atoms. In that case, we can generate conditional plans: a plan which can be performed only when some deterministic or uncertain conditions occur. To keep things simple, in this paper we use the rst de nition. The integrity constraints can be general constraints that cannot be embedded in the action model. In this paper, 15

our only integrity constraint is that plans must be serial. But integrity constraints could represent other things like critical resources which cannot be overconsumed. De nition 12 Given hKB; E; C i and a ground formula F of probabilistic atoms. We de ne the set of abducible atoms to be the set of all ground instances of action atoms. The set PL of some abducible atoms is called an abductive solution for the planning problem with goal F on KB, E, and C if: (1) PL is a nite set of abducible atoms; (2) PL is consistent, which means that there are no two dierent atoms in PL occurring at the same time and every atom in PL is at a nonnegative time; and (3) PL is related to F. The purpose of a plan generation algorithm should not be to generate all plans related to a formula F because if a plan PL is related to F then any consistent superset of PL is also related to F. We use the concept of minimal plans, which are similar to essential plans in [9]. A minimal plan wrt to a ground formula F is any plan PL such that the posterior probability of F after performing any subplan of PL, if it is still related to F, is smaller than the probability after performing PL. A plan generation procedure can perform a spectrum of functions from basic to complex. At the most basic level, the procedure should be able to generate a plan related to the input formula F. At the most complex level, the procedure could return the optimal plan, according to some optimality criteria. We are interested in probabilistic planning algorithms which generate a plan such that the probability of achieving an input formula F exceeds some speci ed value . We present our abductive procedure as an planning algorithm for the case F is an atom. The procedure is, in fact, capable of generating all minimal plans related to the input atom F. Our Bayesian network construction procedure uses the notion of the set of predecessors of an atom and of the set of sentences which de ne this set of predecessors. The following de nition formalizes these concepts. De nition 13 Given hKB; E; C i, a ground p-atom A, and a plan PL, (a) The set of supporting atoms of A, denoted by SAS(A), is the smallest subset of RAS such that: (1) A 2 SAS (A); (2) if B 2 SAS and P (B j : : :; B 0; : : :) = is a sentence in RPB then B 0 2 SAS (A); (3) if A 2 SAS (A) then Ext(A) SAS (A). (b) The set of supporting probabilistic sentences of A, denoted by SPB(A), is the set of all sentences in RPB such that all of their atoms are in SAS(A). 16

We can prove a theorem allowing us to build only that portion of a Bayesian network relevant to a given problem by performing a backwardchaining reasoning process. Details are in the full paper. Due to space limitations we cannot present the whole abductive procedure in this paper. The procedure receives as input a ground atom A and a number in the interval [0; 1] and ouputs a plan such that posterior probability of A, after performing it, . The procedure consists of several steps. At rst, the supporting sets of A, SAS(A) and SPB(A), when the plan is empty are constructed. If SPB(A) is not empty, a Bayesian network is constructed from SPB(A) and the environment E. This network is used to evaluate posterior probability of A when the plan is empty. If the posterior probability of A is still less than or the network is empty, the procedure calls an abductive procedure, FIND-AB-SOLUTION, which generates a tentative plan and expands SAS(A) and SPB(A) to account for the new actions. A new Bayesian network is constructed and the new posterior probability of A is assessed. The process is repeated until a solution is found or no new tentative plan is generated.

function FIND-AB-SOLUTION(var SAS: set of atoms; var SPB: set of probabilistic sentences; var PL: a set of action atoms): fsuccess, failureg; var S: a probabilistic sentence; S ; A : substitution; Q: a target atom; BEGIN Nondeterministically choose an atom Q in SAS [ Ext(A) but not in E list; Nondeterministically choose an action sentence S and A s.t. there exists an mgu S of Q and cons(S) and action(S)S A is ground and at a time t 0 and there is no action in PL at time t and there exists a computed answer from ( context(S)S A ; C [ CB) ; IF there is such an action action(S)S A chosen THEN BEGIN EXPAND-SUPPORTING-SETS(SAS,SPB,PL, action(S)S A ); PL := PL [faction(S)S A g; END; Nondeterministically choose between: 1. RETURN: IF some new PL was chosen THEN RETURN success ELSE RETURN failure; 2. RETURN FIND-AB-SOLUTION(SAS,SPB,PL) END

FIND-AB-SOLUTION is a nondeterministic procedure which returns a new plan in each call. The actual implementation should have the form 17

of a backtracking procedure. In each invocation of the function, it tries to choose an atom Q in the current set of SAS(A){the SAS(A) wrt KB, E, C, and the current plan PL. If there are some action atoms which are acceptable and have Q as consequent, it nondeterministically choose one action and expand the current SAS(A) and SPB(A) using the new action. EXPAND-SUPPORTING-SETS expands the current SAS(A) and SPB(A) using the new abducible atom. The procedure does not assume any speci c combining rule. Because we have preferences over eects of actions and other causal relationships, we can include those rules in the procedure to reduce the computational time.

6.3 The Correctness and Completeness of Abductive Procedure De nition 14 Given a planning framework hKB; E; C i.

(1) An plan generation procedure is called threshold-sound wrt ground p-atoms if for any ground p-atom A and probability threshold , when the procedure returns a plan PL then the assertion \The probability of A after performing PL in context C and environment E is " is a logical consequence of performing PL, where is some value . (2) An plan generation procedure is called threshold-complete wrt ground p-atoms if for any ground p-atom A and probability threshold , when there exists a PL such that the assertion \The probability of A after performing PL in context C and environment E is ; " is a logical consequence of the performance of PL then the procedure returns a plan PL such that the assertion \The probability of A after performing PL in context C and environment E is " is a logical consequence of performing PL, , after a nite amount of time.

Theorem 2 Given a planning framework hKB; E; C i. If the dt-normative

KB is acyclic and E and C are nite sets of ground atoms and all the domains, except time, are nite then the abductive procedure is (1) threshold-sound wrt ground p-atoms, and (2) threshold-complete wrt ground p-atoms, if PB is completely quanti ed.

7 Related Work

In [13], the authors propose a logic for representing probabilistic actions and argue that abduction lays an important groundwork for decision theoretic and probabilistic planning. Those are also motivations of our work. Our 18

actions representation scheme is much richer than theirs. We allow conditional probability sentences, they only use deterministic rules with possible uncertain antecedents. We think our concept of context-sensitivity is also an important contribution to planning area. Furthermore, we present a solid semantics for Bayesian network construction algorithm. The representation of actions by Bayesian networks is investigated in [4]. [4] does not provide a semantics for their Bayesian network construction algorithm. Their concept of environment model represents the domain relationships, whose structure is required not aected by actions. We allow them to be changed by actions. In a recent paper, [2] considers the plan generation problem and the construction of Bayesian network to evaluate generated plans. Blythe does not provide a logical foundation of Bayesian network construction process and also does not allow domain knowledge other than external events. BURIDAN, [9], is an planner under uncertainty The plan generation procedure has some similarity to BURIDAN. The evolving SAS and SPB sets play the role of the relevant set of proposition-action pairs. BURIDAN also has a nondeterministic portion to generate plans. The dierences are in the viewpoints on representation of planning environment and eects of actions. We explicitly describe times and duration of an action, and so, we do not consider the demotion, promotion or confrontation cases as in BURIDAN. We allow domain knowledge to be considered in planning and BURIDAN does not. In our representation, we exploit the probabilistic independence between variables and, as a result, our constructed Bayesian networks are much smaller than what BURIDAN constructs for plan evaluation.

References [1] K. R. Apt and M. Bezem. Acyclic programs. New Generation Computing, pages 335{363, Sept 1991. [2] J.S. Blythe. Planning with external events. In Proceedings of the Tenth Conference on Uncertainty in Arti cial Intelligence, Seattle, July 1994. [3] A. Darwiche and M. Goldszmidt. Action networks: A framework for reasoning about actions and change under uncertainty. In UAI-94, pages 136{144, Seattle, July 1994. 19

[4] R. Davidson and M.R. Fehling. A structured, probabilistic representation of action. In UAI-94, pages 154{161, Seattle, July 1994. [5] K. Eshghi. Abductive planning with event calculus. In Proceedings of the Fifth International Conference on Logic Programming, 1988. [6] A. Goldman. A Theory of Human Action. Princeton University Press, Princeton, NJ, 1970. [7] P. Haddawy. Representing Plans Under Uncertainty: A Logic of Time, Chance, and Action, volume 770 of Lecture Notes in Arti cial Intelligence. Springer-Verlag, Berlin, 1994. [8] P. Haddawy and M. Suwandi. Decision-theoretic re nement planning using inheritance abstraction. In 2nd ICPS, pages 266{271, June 1994. [9] N. Kushmerick, S. Hanks, and D. Weld. An algorithm for probabilistic planning. (To appear in Arti cial Intelligence). [10] J. W. Lloyd. Foundation of Logic Programming. Springer-Verlag, 1987. second edition. [11] Liem Ngo and P. Haddawy. A theoretical framework for temporal probability model construction. 1995. Submitted to UAI-95. [12] R.N. Pelavin. A Formal Logic for Planning with Concurrent Actions and External Events. PhD thesis, Univ. of Rochester, Dept. of Computer Science, 1988. [13] D. Poole and K. Kanazawa. A decision-theoretic abductive basis for planning. In Working Notes of AAAI Spring Symposium, 1994. [14] Y. Shoham. Time for action: On the relation between time, knowledge and action. In Proceedings of the Nineth International Joint Conference in Arti cial Intelligence, pages 954{959, 1989. [15] S. Srinivas. A generalization of the noisy-or model. In UAI-93, pages 208{217, July 1993.

20