A A SAT-based Approach to Cost Sensitive Temporally Expressive ...

2 downloads 6977 Views 653KB Size Report
for BB-CDCL which can significantly improve the search efficiency. ..... Then at the given optimized makespan, we minimize the total action costs, defined as the total costs ...... The LPSAT engine and its application to resource planning.
A A SAT-based Approach to Cost Sensitive Temporally Expressive Planning QIANG LU1,2,† , RUOYUN HUANG2,† , YIXIN CHEN2,§ , YOU XU2 , WEIXIONG ZHANG2 , and GUOLIANG CHEN1 1 University of Science and Technology of China 2 Washington University in St. Louis

Complex features, such as temporal dependencies and numerical cost constraints, are hallmarks of real-world planning problems. In this paper, we consider the challenging problem of cost-sensitive temporally expressive (CSTE) planning, which requires concurrency of durative actions and optimization of action costs. We first propose a scheme to translate a CSTE planning problem to a minimum cost (MinCost) satisfiability (SAT) problem and to integrate with a relaxed parallel planning semantics for handling true temporal expressiveness. Our scheme finds solution plans that optimize temporal makespan, and also minimize total action costs at the optimal makespan. We propose two approaches for solving MinCost SAT. The first is based on a transformation of a MinCost SAT problem to a weighted partial Max-SAT (WPMax-SAT), and the second, called BB-CDCL, is an integration of the branch-and-bound technique and the conflict driven clause learning (CDCL) method. We also develop a CSTE customized variable branching scheme for BB-CDCL which can significantly improve the search efficiency. Our experiments on the existing CSTE benchmark domains show that our planner compares favorably to the state-of-the-art temporally expressive planners in both efficiency and quality. Categories and Subject Descriptors: I.2.8 [ARTIFICIAL INTELLIGENCE]: Problem Solving, Control Methods, and Search—Plan execution, formation, and generation General Terms: Algorithms Additional Key Words and Phrases: Planning; Temporal expressiveness; Numerical cost constraint; Satisfiability

1. INTRODUCTION

An essential quality of a problem formalism is its modeling capability. It has been a continuing endeavor in extending the modeling capability of various planning formulations. An important development beyond classical planning is temporal planning, which deals with durative actions occurring over extended intervals of time [Penberthy and Weld 1994]. Particularly, both preconditions and effects of durative actions can be temporally quantified. Representative temporal planners include ZENO [Penberthy and Weld 1994], TGP [Smith 1999], TLPlan [Bacchus and Ady 2001], TP4 [Haslum and Geffner 2001], LPG [Gerevini and Serina 2002], TALPlanner [Kvarnstr¨om and Magnusson 2003], VHPOP [Younes and Simmons 2003], SAPA [Do and Kambhampati 2003], TM-LPSAT [Shin and Davis 2004], SGPlan [Wah and Chen 2006], CPT [Vidal and Geffner 2006], LPG-td [Gerevini et al. 2008], TFD [Eyerich et al. 2009], Crikey [Coles et al. 2009], Crikey3 [Coles et al. 2008], POPF [Coles et al. 2010], and a concurrency version of LPG (LPG-c) [Gerevini et al. 2010]. These planners have been successfully applied to many planning problems. † Joint

first authors with equal contribution.

§ Corresponding author: Department of Computer Science and Engineering, Washington University in St. Louis, Saint Louis,

MO 63130, USA. E-mail: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c YYYY ACM 0000-0003/YYYY/01-ARTA $10.00

DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:2

Qiang Lu et al.

Despite their success, these planners are restrained by two limitations. First, some existing temporal planners, such as ZENO, TGP, TLPlan, TP4, LPG, LPGP, TALPlan, VHPOP, SAPA, SGPlan, CPT, and LPG-td, are unable to deal with temporally expressive problems. Temporal action concurrency was first introduced into the planning domain definition language in PDDL2.1 [Fox and Long 2003]. Cushing et al. [Cushing et al. 2007] first presented the notion of temporal expressiveness. A planning problem is temporally expressive if all of its solutions require action concurrency, which indicates that one action occurs within the time interval of another action. Second, most existing temporal planners either attempt to minimize the total duration of the solution plan (i.e. makespan), or do not consider any quality metric at all. However, for many applications, it is desirable to optimize not only the makespan, but also the total action costs [Do and Kambhampati 2003], which can represent many quantities, such as the cost of resources used, the total money spent, or the total energy consumed. Action costs was introduced as a new criterion in the IPC-6 planning competition [IPC 2008]. Both required concurrency and action costs are important features in real-world planning. In this paper, we propose a general planning paradigm, called cost-sensitive temporally expressive (CSTE) planning. A CSTE planning problem is a temporally expressive problem in which actions have associated costs. Although CSTE planning problems are complex and difficult to solve, they are important and ubiquitous in many applications. Example CSTE domains include: (1) Peer-to-Peer network communication. In Peer-to-Peer network communication, one peer’s uploading has to be concurrent with one or more other peers’ downloading. Besides the required concurrency, modern communication is service oriented; communication actions are charged by different costs, depending on the types of network service used. A desirable planner will need to find temporally expressive solutions that also minimize the total action costs and thus require a CSTE planning. (2) Web service composition. Web service composition (WSC) is the problem of integrating multiple web services to satisfy a particular request [Rao and Su 2004]. Planning has been adopted as one of the major methods for WSC [Carman et al. 2003; Rao and Su 2004]. WSC problems may require CSTE planning, since different web services operate under different conditions and different rates of cost (some are free). As a result, it is desirable to optimize the QoS metrics, such as total price, reliability, and reputation. Moreover, temporally concurrent actions are often needed to coordinate multiple web services. (3) Autonomous systems. Planning for autonomous systems, including robotics, rovers, and spacecrafts, often requires CSTE planning. Consider a spacecraft controlling example [Smith 2003] in which the spacecraft movement is made by firing thrusters. Multiple operations need to be performed within the time interval when the thrusters are fired, thus requiring action concurrency. Moreover, operation costs such as energy need to be minimized in order to best utilize on-board resources. (4) Real time manufacturing system. In real time manufacturing systems [Do et al. 2008], action concurrency is often mandatory. For instance, in baking ceramics [Cushing et al. 2007], the kilning and baking actions need to be executed simultaneously, in which kilning has to be executed within the time interval of baking. Moreover, to produce more products with less time and materials, one needs to optimize the arrangement of concurrent actions to achieve shorter makespans and less total action costs. Currently, few existing automated planners can handle CSTE problems. There are only a few temporally expressive planners, such as TM-LPSAT [Shin and Davis 2004], Crikey [Coles et al. 2008], Crikey3 [Coles et al. 2009], POPF [Coles et al. 2010], LPG-c [Gerevini et al. 2010], and TFD [Eyerich et al. 2009]. But most of them only optimize temporal makespan. In this paper, we introduce an efficient CSTE planing framework based on a Satisfiability (SAT) transformation, as shown in Figure 1. Central to this approach is a transformation for turning a CSTE instance into an optimization problem with SAT-based constraints (which is called a MinCost SAT formulation). The translation is based on the planning as SAT framework [Kautz and ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:3

Fig. 1. The architecture of our CSTE planner.

Selman 1992]. We first propose a basic transformation scheme based on our previous work [Huang et al. 2009]. Then, we leverage and enhance the transformation scheme based on a relaxed parallel planning semantics. Our enhanced transformation can handle more temporal features than the basic translation scheme. Moreover, it can improve the search efficiency and make our solver find solutions with shorter makespan than the basic scheme. To solve the encoded MinCost SAT instance, a SAT problem with an objective of minimizing the total cost of literals assigned to be true [Li 2004], we develop BB-CDCL, a branch-and-bound algorithm based on the conflict driven clause learning (CDCL) procedure [Zhang et al. 2001; Mitchell 2005], to directly solve MinCost SAT problems. We also propose an effective heuristic-cost-based variable branching scheme to further improve search efficiency. Figure 1 shows the architecture of our CSTE planner. Starting from a small makespan N , we repeatedly increase N and solve the corresponding MinCost SAT instance until a solution is found. Such a scheme guarantees to find the minimum makespan. Furthermore, since our BB-CDCL algorithm can minimize the objective function, we can also minimize the total action costs at the optimal makespan. Thus, we find a Pareto optimum for this multi-objective optimization problem. Our results show good performance of this approach. In particular, our results show that our SAT-based approach is currently a good choice for CSTE planning, especially for problems whose solutions require a high level of action concurrency. The solution strategy that we proposed solves the problem efficiently, comparing favorably against other existing temporally expressive planners. The rest of this paper is organized as follows. In Section 2, we define cost sensitive temporally expressive (CSTE) planning. In Section 3, we present the SAT-based CSTE planning framework, including the basic translation scheme and the enhanced scheme. In Section 4, we present our BBCDCL algorithm enhanced with a variable branching mechanism for solving encoded MinCost SAT problems. We present our experimental results in a variety of CSTE planning domains in Section 5. We discuss related work in Section 6 and conclude in Section 7. 2. BACKGROUND

Our method applies to temporal planning problems defined in PDDL2.1 and above [Fox and Long 2003]. In these problems, actions have starting and ending preconditions, overall preconditions, starting and ending effects, all of which are conjuncts of propositions. Each action has a duration that is defined by a positive integer and a cost that is specified in a positive real number. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:4

Qiang Lu et al.

We now formally define cost sensitive temporally expressive (CSTE) planning. A fact f is an atomic proposition that can be either true or false; we use ft to represent the fact f at time t. A state S is a set of fact propositions. We use St to represent the state at time t. For each fact ft , St can only include ft or ¬ft . For convenience, we assume (ft ∈ / St ) = (¬ft ∈ St ). Definition 2.1. (Cost-sensitive durative action). A cost-sensitive durative action o is defined by a tuple (ρ, µ, π⊢ , π↔ , π⊣ , α⊢ , α⊣ ), where ρ and µ are the duration and cost of o, respectively; π⊢ , π⊣ are precondition fact sets that must be true at the start and at the end of o, respectively; π↔ is the overall fact sets that must be true over lifetime, respectively; and α⊢ , α⊣ are the effect fact sets at the start and the end of o, respectively. In this paper, we assume that action durations are integers where ρ(o) > 0 and costs are real number where µ(o) ≥ 0. Given a durative action o, we use π⊢ to represent π⊢ (o). The same abbreviation applies to π↔ , π⊣ , α⊢ , and α⊣ . In PDDL2.1, the annotations of temporal precondition and overall facts are: 1) π⊢ : “(at start f)”, 2) π⊣ : “(at end f)”, and 3) π↔ : “(over all f)”. The annotations of effects are: 1) α⊢ : “(at start f)” and 2) α⊣ : “(at end f)”. Given an action o and a sequence of states [St , St+ρ(o)−ǫ ] where ǫ > 0 and ǫ → 0, o is valid at time t (denoted as ot ) if the following conditions are satisfied: a) ∀f ∈ π⊢ , ft ∈ St ; b) ∀f ∈ π⊣ , ft+ρ(o)−ǫ ∈ St+ρ(o)−ǫ ; and c) ∀ f ∈ π↔ and ∀ t′ ∈ (t, t + ρ(o) − ǫ], ft′ ∈ St′ . Note that f may represent positive precondition or negative precondition there. Action o’s execution at time t will affect states St+ǫ and St+ρ(o) where ǫ > 0 and ǫ → 0. States St+ǫ and St+ρ(o) satisfy ot ’s effects if: 1) for each add-effect f ∈ α⊢ , ft+ǫ ∈ St+ǫ , 2) for each delete-effect ¬f ∈ α⊢ , ft+ǫ ∈ / St+ǫ , 3) for each add-effect f ∈ α⊣ , ft+ρ(o) ∈ St+ρ(o) , and 4) for each delete-effect ¬f ∈ α⊣ , ft+ρ(o) ∈ / St+ρ(o) . In our previous work [Huang et al. 2009], we assume a discrete time horizon (the time t takes integer values) which means ǫ always equals to 1. Below we show an example of the PDDL definition for a serve action from the P2P domain. We will discuss the details of the P2P domain in Section 5.1. (:durative-action serve :parameters ( ?c - computer ?f - file ) :duration (= ?duration (file-size ?f) ) :condition( and (at start (free ?c) ) (over all (not(free ?c ))) (at start (saved ?c ?f)) (over all (serving ?c ?f) )) :effect(and (at start (not (free ?c )) ) (at end (free ?c ) ) (at start (serving ?c ?f) ) (at end(not (serving ?c ?f) )) (increase (total-cost) (file-size ?f)*SERVE_RATE)) Definition 2.2. (Cost-sensitive temporal (CST) planning). A cost-sensitive temporal planning problem Π is defined as a tuple (I, F, O, G), where I is the initial state, F is a set of facts, O is a set of cost-sensitive durative actions, and G is a set of goal facts. Definition 2.3. (Solution plan). Given a CST planning problem Π = (I, F, O, G), a plan P = (p0 , p1 , . . . , pn−1 ) is a sequence of action sets, where each action set pt ⊆ O indicates the actions ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:5

executed at time t. P is a solution plan if there exists a state sequence S0 , S1 , ..., Sn satisfying: a) S0 = I; b) for each action ot ∈ pt , ot is valid at time t, and St+1 , St+ρ(o) satisfy ot ’s effects; c) for all f ∈ G, fn ∈ Sn . Given a plan P = (p0 , p1 , . . . , pn−1 ), we first optimize the makespan, defined as the duration of the plan: max

{max{t + ρ(o) − 1}}.

t=0,··· ,n−1 o∈pt

Then at the given optimized makespan, we minimize the total action costs, defined as the total costs of the selected actions (multiple appearances are counted multiple times): n−1 X

X

µ(o).

t=0 o∈pt

The formulation of CST planning is a subset of PDDL2.1. Specifically, the PDDL features supported by our temporal planner include predicate representations, typed representations, untyped representations, grounded representations, and negative preconditions. The PDDL features which are not supported include object fluent representations, schematic representations, ADL conditions, conditional effects, universal effects, derived predicates, numeric state variables, and timed initial literals. The semantics of temporal planning that we consider is as expressive as what is defined in PDDL2.1, except for the discrete time setting. Unlike many previous PDDL2.1 planners that are not temporally expressive, our approach is capable of handling temporal expressiveness. Definition 2.4. (Temporal dependency) Given two durative actions o and o′ , we define that o temporally depends on o′ when one of the following conditions holds: (1) ∃f ∈ π⊢ (o), such that f ∈ α⊢ (o′ ) and ¬f ∈ α⊣ (o′ ); (2) ∃¬f ∈ π⊢ (o), such that ¬f ∈ α⊢ (o′ ) and f ∈ α⊣ (o′ ). A temporal dependency among actions may require actions to be executed concurrently. Definition 2.5. (Required concurrency). A CST planning problem Π has required concurrency if and only if it has at least one solution plan and every solution plan of Π has concurrently executed actions. Two factors can lead to concurrencies in a temporally expressive problem. One is the required concurrent interaction (i.e., concurrent execution) among actions, and the other is enforced deadlines [Coles et al. 2008]. Definition 2.6. (Cost-sensitive temporally expressive (CSTE) planning). A CSTE planning problem is a CST planning problem with required concurrency. 3. SAT-BASED CSTE PLANNING FRAMEWORK

We now formulate CSTE planning as an optimization problem with SAT constraints, which is known as MinCost SAT. Our overall planning algorithm is referred to as a SAT-based CSTE Planning (SCP) framework, shown in Algorithm 1. Our planning framework follows the bounded SAT solving strategy, originally proposed in SATPlan [Kautz and Selman 1992; 1996] and Graphplan [Blum and Furst 1997]. We start from a lower bound of the makespan (N=1), construct a planning graph and encode the CSTE problem as a MinCost SAT instance, either prove it unsatisfiable or find a solution with the optimal makespan. At the optimal makespan, we further solve it with respect to the MinCost SAT objective until the minimized costs solution is found or the search times out. In this section, we first present the MinCost SAT formulation, then we introduce a basic encoding method that transforms a CSTE problem to a MinCost SAT. This encoding method was ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:6

1 2 3 4 5 6 7 8 9 10 11 12

Qiang Lu et al.

ALGORITHM 1: A SAT-based CSTE Planning Framework (SCP) Input: A CSTE planning problem: Π Output: A solution plan transform durative actions into simple ones; N ← 0; repeat N ← N + 1; encode the problem into a MinCost SAT instance with makespan N ; optimally solve the encoded MinCost SAT instance; until a solution is found or search timeout; if a solution is found then decode the optimal solution and return; else return with no solution; end

first presented in our previous work [Huang et al. 2009]. Further, we discuss a limitation of the basic encoding and then propose an enhanced encoding method based on a new action transformation, which can fully address the limitation. 3.1. MinCost SAT and Weighted Partial Max-SAT formulations

We first present the SAT formulations. A SAT problem is defined as Φ = (V, C), where V is a set of Boolean variables and C is a set of clauses. A variable assignment ψ assigns each variable x in V to true or false. Given a variable assignment ψ and a variable x ∈ V , we let the function vψ (x) to be 1 when variable x is true in the variable assignment, or 0 otherwise. Likewise, given a variable assignment ψ and clause c ∈ C, we let vψ (c) to be 1 when c is satisfied by the variable assignment, or 0 otherwise. Given a SAT problem Φ = (V, C), a valid solution is a variable assignment ψ, such that for each c ∈ C, vψ (c) = 1. We use the notation of assignment function ψ : V ∪ C → {0, 1} in the following definitions. First, we define MinCost SAT problems [Li 2004]. Definition 3.1. (MinCost SAT Problem) A MinCost SAT problem is a tuple Φc = (V, C, µ), where V is a set of Boolean variables, C is a set of clauses, and µ is a function µ : V → N. A solution to Φ is a variable assignment ψ that minimizes the objective function: X cost(ψ) = µ(x)vψ (x), x∈V

subject to: vψ (c) = 1, ∀c ∈ C. Another extended SAT problem that we consider is the weighted partial Max-SAT Problem (WPMax-SAT). Definition 3.2. (WPMax-SAT problem) A WPMax-SAT problem is a tuple Φa = (V, C h , C s , w), where V is a set of variables, C h and C s are sets of hard and soft clauses, respectively, and w is the weight function of soft clauses defined by w : C s → N. A solution to Φa is a variable assignment ψ that maximizes the function: X weight(ψ) = w(c)vψ (c), ∀c∈C s

subject to: vψ (c) = 1, ∀c ∈ C h . ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:7

A WPMax-SAT problem Φa amounts to finding a variable assignment, such that all hard clauses are satisfied, and the total weight of satisfied soft clauses is maximized. A MinCost SAT problem can be easily converted to a WPMax-SAT problem (See the converting details in Appendix A). 3.2. Basic MinCost SAT Encoding 3.2.1. Transformation of actions. In the first step of our SCP framework (Algorithm 1), we use the TGP style semantics to compile temporal actions into STRIPS actions [Smith 1999]. Specifically, each cost-sensitive durative action o is converted to two simple actions and one propositional fact, written as (o⊢ , o⊣ , f o ). The transformation is based on the assumption ǫ = 1. We use the symbol a to denote the simple action which indicates the starting (a = o⊢ ) or ending events (a = o⊣ ) of o. The fact f o , when is true, indicates that o is being executed. We denote the set of all such f o as F o = {f o |o ∈ O}. Each simple action a is defined as a tuple (µ, pre, eff ) where µ is the cost of a, pre is the precondition fact set, and eff is the effect fact set. Specifically, given a durative action o = (ρ, µ, π⊢ , π↔ , π⊣ , α⊢ , α⊣ ), the transformed actions are o⊢ = (µ, π⊢ , α⊢ ) and o⊣ = (0, π⊣ ∪ π↔ , α⊣ ). In our basic encoding scheme, this transformation would only take effects on those actions with ρ > 1. Otherwise, the durative action with ρ = 1 is viewed as a simple action and does not need transformation in our basic encoding. In summary, we transform a CSTE planning problem Π = (I, F, O, G) into a classical planning problem Πs = (I, F s , Os , G), where F s = F ∪ F o and Os = {o⊢ , o⊣ |o ∈ O} ∪ {af |f ∈ F s }. The use of no-op actions is a standard technique for SAT-based planning which uses a planning-graph preprocessing to handle the frame problem. A no-op action for f is defined as af = (0, {f }, {f }). The idea of transforming durative actions to simple actions was first proposed in [Long and Fox 2003]. An advantage of this scheme is that some techniques from classical planning can be applied to the transformed problem, such as mutual exclusion detection. Given the above representation, it is necessary to encode fact and action mutual exclusion (mutex) constraints to ensure the correctness of solutions. An algorithm that detects mutexes between durative actions in temporal planning was proposed in [Smith 1999]. We use a similar definition of mutexes based on planning graphs for all facts f ∈ F s and transformed actions a ∈ Os . A planning graph is a directed, leveled graph with two kinds of nodes (fact and action nodes ) and three kinds of edges (precondition-edges, add-edges, and delete-edges) [Blum and Furst 1997]. In a planning graph for any given makespan N , two facts f and h at time t are marked to be exclusive of each other if and only if:

(1) [Static mutex] f = ¬h, or (2) [Dynamic mutex] any action a ∈ add(f ) and any action b ∈ add(h) are mutex. add(f ) is the set of actions that have f as an add-effect. Two simple actions a and b at a given makespan t are marked to be exclusive of each other if and only if: (1) [Competing needs] ∃f, f ∈ pre(a) and ∃h, h ∈ pre(b) that f and g are mutex at the previous time step, (2) [Effect interference] ∃f, f ∈ eff (a) ∧ ¬f ∈ eff (b), (3) [Parallel interference] ∃f, ¬f ∈ eff (a) ∧ f ∈ pre(b) or ∃f, ¬f ∈ eff (b) ∧ f ∈ pre(a), or (4) [Concurrency interference] ∃f, f ∈ π↔ (o) ∧ ¬f ∈ eff (b) where a is o⊢ , o⊣ , or no-op action for f o . This mutex definition is similar to the classical mutex definition in Graphplan [Blum and Furst 1997] except for the “concurrency interference” mutex. The “concurrency interference” mutex guarantees that actions that are concurrently executed with durative action o will not delete any overall preconditions of o. In the basic encoding, we allow two actions to be parallelly executed (not mutex) at the same makespan only if they can be executed in any order.

ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:8

Qiang Lu et al.

3.2.2. Transform to MinCost SAT. For a CSTE problem instance Π = (I, F, O, G) and the corresponding transformation Πs = (I, F s , Os , G), given a makespan N , we define a MinCost SAT problem Φ with the following variable set V and clause set C. The variable set V includes two types of variables:

(1) action variables xa,t , 0 ≤ t < N, a ∈ Os . (2) fact variables xf,t , 0 ≤ t ≤ N, f ∈ F s . Each variable in V represents the assignment of an action or a fact at time t. The clause set C has the following clauses: (1) Initial state (for all f ∈ I): xf,0 . All initial state facts must be true at time 0. (2) Goal state (for all g ∈ G): xg,N . All goal state facts must be true at time N . (3) Preconditions of simple actions (for all a ∈ Os , 0 ≤ t < N ): for each precondition fact f ∈ pre(a), we add a clause xa,t → xf,t . If an action is true at time t, then each precondition fact of the action must be true at time t. (4) Add-effects of simple actions (for all f ∈ F s , 0 < t ≤ N ): _ xf,t → xa,t−1 . {a|f ∈add(a)}

If a fact f is true at time t, then there must exist an action having f as its add-effect which is true at time t − 1. (5) Durative actions (for all o, t, o ∈ O, 0 ≤ t < t + ρ < N ): xo⊢ ,t ↔ xo⊣ ,t+ρ−1 , ^ xo⊢ ,t → (xf o ,t′ ), t+1≤t′ ≤t+ρ−1

xo⊢ ,t →

^

(

^

xf,t′ ).

t+1≤t′ ≤t+ρ−1 f ∈π↔

If a start action o⊢ is true at time t, action o⊣ must be true at time t + ρ − 1, and vice versa. If a start action o⊢ is true at time t, the fact f o and all the overall facts determined by π↔ must be true in the executing duration (t, t + ρ). These constraints enforce that o is executed in [t, t + ρ). Note it is not necessary to encode this type of constraints for those actions whose duration ρ equals to 1. (6) Action mutexes (0 ≤ t < N ): for each pair of mutex actions (a1 , a2 ): _ ¬xa1 ,t ¬xa2 ,t . These clauses indicate that those mutex actions cannot be true at the same time. (7) Fact mutexes (0 ≤ t ≤ N ): for each pair of mutex facts (f1 , f2 ): _ ¬xf1 ,t ¬xf2 ,t . These clauses indicate that those mutex facts cannot be true at the same time.

Since the transformed problem is a classical planning problem with additional temporal constraints, the encoding introduced above shares many similarities to the well known planning as SAT encoding [Kautz and Selman 1992; Kautz 2004] except for clauses in class (5). For a given valid assignment ψ, the corresponding plan sequence P = (p0 , p1 , · · · , pN −1 ) and states S0 , S1 , · · · , SN , we have: — Clauses in class (1) guarantee that S0 = I. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:9

— For each action ot ∈ pt , clauses in class (3) of actions a = o⊢ , o⊣ and clauses (5) of overall facts make sure that ot is valid at time t. Clauses (4) ensure that St+1 and St+ρ satisfy ot ’s addeffects. Similar to the SATPlan encoding for classical planning [Kautz 2004], our mutex clauses guarantee the consistency of delete effects. Clauses in classes (6) and (7) guarantee that St+1 and St+ρ satisfy o’s delete effects at time t. Therefore, for each action ot ∈ pt , ot is valid at time t, and St+1 and St+ρ satisfy ot ’s effects. — Clauses (2) ensure that for all f ∈ G, fN ∈ SN . According to Definition 2.3, P is a solution plan of the CSTE problem. Therefore, any plan derived from a satisfying assignment of Φ is a solution plan for CSTE problem Π. The encoding above produces a standard SAT problem. The cost of each variable x ∈ V in a MinCost SAT problem Φc = (V, C, µ) is defined as:  µ(o), if x = xo⊢,t for some action o ∈ O and a time step t µ(x) = 0, otherwise In other words, for each action o whose cost is µ(o), we make the corresponding variable xo⊢ ,t to have a cost µ(o). All other variables have zero costs. Our approach is not only effective for handling temporally expressive semantics, but also capable of accommodating some other attributes of parallelism in temporal planning. According to the analysis in [Rintanen 2007], whether a temporal planning problem can be compiled into a classical planning problem in polynomial time is determined by whether self-overlapping is allowed. Our approach supports self overlapping. Suppose that in a plan, an action o (with duration ρ) has two instances, starting at times t and t′ (t < t′ ), respectively. To initiate actions, we have different variables xo⊢ ,t and xo⊢ ,t′ to indicate the different starting times of the two instances. Those f o facts, along with all related conditions, will be enforced to be true from t + 1 to t + ρ − 1 and t′ + 1 to t′ + ρ − 1, even if they have overlapping durations. Thus, these invariant conditions of the two action instances do not exclude each other’s existence. 3.3. Enhanced MinCost SAT encoding 3.3.1. Limitations of the basic encoding. The key limitation of the basic encoding is the assumption that ǫ = 1. Our previous definition of transformed actions is actually based on classical planning as SAT framework [Blum and Furst 1997]. For classical planning, the execution of an action at time t will affect the state at time t + 1. For example, given a simple action a = (1, {f }, ∅), if a is true at time t, then f is true at t + 1. This definition of transformed actions based on that assumption ǫ = 1 can largely simplify the transformation progress since it can directly adopt existing techniques of planning as SAT framework. However, this definition does not exactly satisfy the semantics of durative actions of temporal planning. As we discussed in Section 2, a durative action o’s execution at time t will affect the state St+ǫ (ǫ → 0). This continuous starting effect definition allows that two actions o and o′ are executed at the same time layer while in a continuous order (ot , o′t+ǫ ). Although the assumption ǫ = 1 makes temporal planning problems easily convertible to simple planning problems, it may cause the basic encoding to fail on certain instances. For the example of Figure 2(a), there are two actions where o = (1, 1, ∅, ∅, ∅, {f }, {¬f }) and o′ = (1, 1, {f }, ∅, ∅, ∅, {g}). Since o adds the starting effect f , if o is true at time t, f should be true at t + ǫ. Then, o′ can be executed at time t + ǫ. Given an initial state ∅ and a goal state {g}, there should be a solution (o0 , o′0+ǫ ). However, our basic encoding cannot transform the action o with ρ = 1 to a simple action since there are inconsistent facts f and ¬f in the starting and ending effects. Thus, the basic encoding cannot handle this problem. In the example of Figure 2(b), there are two actions where o = (2, 1, ∅, ∅, {g}, {f }, {h}) and o′ = (1, 1, {f }, ∅, ∅, ∅, {g}). Given an initial state ∅ and a goal state {h}, there should be a solution (o0 , o′0+ǫ ). In the basic encoding, since we assume ǫ = 1, the ending precondition g cannot be satisfied at t + 1 which means that o is always false. Thus, there is no solution in the basic encoding SAT instance. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:10

Qiang Lu et al.

pre g, add h

add f

del f

add f O

O pre f

add g

O’

pre f

add g

t+1

t+2

O’

t

t+1

t

t+2

(a)

(b) Fig. 2. Two examples of CSTE problems.

add f

add f

del f

pre g, add h

O

O pre f

add g

O’ t t+ɛ

t+1 (a)

pre f O’ t t+ũ

add g t+1

t+2

(b)

Fig. 3. The enhanced encoding of the two examples of CSTE problems.

3.3.2. Enhanced transformation of actions. To address the above limitations, we enhance the basic encoding by removing the assumption ǫ = 1 in the starting effect states St+ǫ . Specifically, we set ǫ → 0 instead of ǫ = 1. We propose to achieve it by adopting a more relaxed parallel semantics and a concurrent enabling relation in our new encoding. For the examples in Figure 2, if o is true at t, then f is true at t + ǫ. Thus, o′ can be true at t + ǫ which means o and o′ can be true at the same makespan based on a fixed order o ≺ o′ (o must be executed before o′ ), as shown in Figure 3. In our basic encoding, the strict parallel semantics requires that actions assigned to be true at the same makespan must be able to be executed in any order. Obviously, o and o′ do not satisfy this semantics since they can only be executed in a fixed order o ≺ o′ . In our new encoding, to allow o and o′ to be assigned to true at the same makespan, we extend our encoding by allowing two actions to be true at the same makespan if they can be executed in a fixed partial order. This definition of the relaxed parallel semantics was first proposed in nonmonotonic logic programs [Dimopoulos et al. 1997] and has been studied in ∃-step plan [Rintanen et al. 2006]. We now formally define our new transformation of actions as follows. Given a durative action o = (ρ, µ, π⊢ , π↔ , π⊣ , α⊢ , α⊣ ),

(1) If ρ > 1, we transform it to a starting action o⊢ = (µ, π⊢ , α⊢ ), an ending action o⊣ = (0, π⊣ ∪ π↔ , α⊣ ), and a fact f o indicating that o is being executed. (2) If ρ = 1, we transform it to a simple action ao = (µ, π⊢ ∪ π↔ ∪ π⊣ , (α⊢ /{f |f ∈ α⊢ ∧ ¬f ∈ α⊣ or ¬f ∈ α⊢ ∧ f ∈ α⊣ }) ∪ α⊣ ). This transformation may delete some starting add-effects or delete-effects. Therefore, we will add some concurrent enabling clauses to represent deleted starting add-effects and mutex clauses to represent deleted starting delete-effects. Note that we cannot handle the problems when there exists opposite facts in π⊢ , π↔ , and π⊣ when ρ = 1, eg., for a fact f ∈ π⊢ , ∃¬f ∈ π↔ or ∃¬f ∈ π⊣ . ′ ′ ′ The new transformed simplified planning problem is defined as Πs = (I, F s , Os , G), where ′ ′ ′ F s = F s = F ∪ F o and Os = {o⊢ , o⊣ |ρ > 1, o ∈ O} ∪ {ao |ρ = 1, o ∈ O} ∪ {af |f ∈ F s }. ′ Note that Os is a simple action set that is transformed from CSTE problem Π under the new rules and af is a no-op action for f . ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:11

In order to represent the partial order of actions encoded at the same makespan, we define a concurrent enabling relation between any two transformed actions. Definition 3.3. Given two simple actions a and b and a fact f , a concurrently enables b on the fact f (denoted as CEnab(a, b, f )) if and only if: (1) f ∈ eff (a) ∧ f ∈ pre(b) where a = o⊢ and ρ > 1, or f ∈ α⊢ ∧ f ∈ pre(b) where a is transformed from o and ρ = 1, (2) ∄g, ¬g ∈ eff (a) ∧ g ∈ eff (b), and (3) ∄g, ¬g ∈ eff (a) ∧ g ∈ pre(b) . The relation that an action a concurrently enables an action b on a fact f requires that a must be a transformed starting action and adds the precondition f of b (Rule (1)), and a and b are not exclusive of each other (Rules (2) and (3)). After transforming durative actions to simple actions, we construct a planning graph and develop a new algorithm to detect mutex constraints. In a planning graph for any given makespan N , two facts f and h at time t are marked to be exclusive of each other if and only if: (1) [Static mutex] f = ¬h, or (2) [Dynamic mutex] any action a ∈ add(f ) and any action b ∈ add(h) are mutex. Two simple actions a and b at a given makespan t are marked to be exclusive of each other if and only if: (1) [Competing needs] ∃f, f ∈ pre(a) and ∃h, h ∈ pre(b) that f and g are mutex at the previous makespan, (2) [Effect interference] ∃f, f ∈ eff (a) ∧ ¬f ∈ eff (b), (3) [Parallel interference] ∃f, ¬f ∈ eff (a) ∧ f ∈ pre(b) and ∃f, ¬f ∈ eff (b) ∧ f ∈ pre(a), (4) [Concurrency interference] ∃f, f ∈ π↔ (o) ∧ ¬f ∈ eff (b) where a is o⊢ , o⊣ , or no-op action for f o , or (5) [Partial interference] ∃f, ¬f ∈ eff (a) ∧ f ∈ pre(b) and ∄h, st. CEnab(b, a, h). We temporarily add the ignoring starting effects in transformed action’s effects for the special case that ρ = 1 when detecting mutex constraints. For example, if there are two actions a = (1, ∅, {f }) and b = (1, {f }, {g}) where a is transformed from o = (1, 1, ∅, ∅, ∅, {¬f }, {f }), we consider eff (a) = {¬f, f } when we detect mutex constraints. Thus, we will mark a and b to be mutex. In reference to the mutex definition in our basic encoding, we relax the “parallel interference” to “a deletes a precondition of b and b deletes a precondition of a” and add the “partial interference” mutex. The interference mutexes allow two actions to be executed at the same makespan in a fixed order (a ≺ b) if and only if 1) a and b do not interfere with each other, or else 2) a concurrently enables b on a fact f (CEnab(a, b, f )) and a does not delete any precondition of b. Our parallel semantics, which is defined by the above mutex rules, has two differences compared with the semantics studied in previous work [Blum and Furst 1997; Rintanen et al. 2006]. First, our definition of mutex is more relaxed than the mutex definition in Graphplan [Blum and Furst 1997]. In Graphplan, the “interference” mutex is defined as “if one of the actions deletes a precondition or add-effect of the other”. It allows two parallel actions to be executed at the same makespan only if they can be executed in any order. However, we allow two actions to be executed at the same makespan in a fixed order if one concurrently enables the other. Specifically, we divide the interference mutex into “effect interference” and “parallel interference” mutexes and relax the latter to “a deletes a precondition of b and b deletes a precondition of a”. Second, our parallel semantics is more strict than ∃-step plan [Rintanen et al. 2006]. ∃-step plan is used for classical planning problems which does not distinct starting or ending effects. Our concurrent enabling relation represents that an action enables another action only by its starting add-effects, while the ∃-step plan considers all add-effects (including starting and ending effects). For the example in Figure 2(b), our encoding will find a concurrent enabling relation CEnab(o⊢ , ao′ , f ) while ∃-step ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:12

Qiang Lu et al.

plan will find more enabling relations like Enab(o⊢ , ao′ , f ) and Enab(ao′ , o⊣ , g). Thus, our concurrent enabling relation is more strict than ∃-step plan, which is necessary for finding correct temporal plans. 3.3.3. Transform to MinCost SAT. Based on our new action transformation and mutex definition, we now define the enhanced MinCost SAT encoding. For a CSTE problem instance ′ ′ ′ Π = (I, F, O, G) and the corresponding transformation Πs = (I, F s , Os , G), given a makespan ′ N , we define a MinCost SAT problem Φ with the following variable set V ′ and clause set C ′ . The variable set V ′ includes two types of variables: ′

(1) action variables xa,t , 0 ≤ t < N, a ∈ Os . ′ (2) fact variables xf,t , 0 ≤ t ≤ N, f ∈ F s . Each variable in V ′ represents the assignment of an action or a fact at time t. The clause set C ′ has the following clauses: (1) Initial state (for all f ∈ I): xf,0 . (2) Goal state (for all g ∈ G): xg,N . ′ (3) Preconditions of simple actions (for all a ∈ Os , 0 ≤ t < N ): for each precondition fact ′ f ∈ pre(a), if there exists an action a satisfying CEnab(a′ , a, f ), we add a clause: _ xa,t → xf,t ∨ xa′ ,t . {a′ |CEnab(a ′ ,a,f }

Otherwise, we add:

xa,t → xf,t . If an action is true at time t, then each precondition fact of the action or an action that adds the precondition must be true at time t. ′ (4) Add-effects of simple actions (for all f ∈ F s , 0 < t ≤ N ): _ xf,t → xa,t−1 . {a|f ∈add(a)}

(5) Durative actions (for all o, t, o ∈ O, 0 ≤ t < t + ρ < N ): xo⊢ ,t ↔ xo⊣ ,t+ρ−1 , ^ xo⊢ ,t → (xf o ,t′ ), t+1≤t′ ≤t+ρ−1

xo⊢ ,t →

^

(

^

xf,t′ ).

t+1≤t′ ≤t+ρ−1 f ∈π↔

(6) Action mutexes (0 ≤ t < N ): for each pair of mutex actions (a1 , a2 ): _ ¬xa1 ,t ¬xa2 ,t . (7) Fact mutexes (0 ≤ t ≤ N ): for each pair of mutex facts (f1 , f2 ): _ ¬xf1 ,t ¬xf2 ,t .

(8) Concurrent enabling circular mutexes (0 ≤ t < N ): for each concurrent enabling action circle, defined as (a1 , a2 , · · · , am ) which satisfies CEnab(a1 , a2 , f1 ), · · · , CEnab(ai , ai+1 , fi ), · · · , CEnab(am , a1 , fm ), we add an exclusive clause: _ _ _ ¬xa1 ,t ¬xa2 ,t · · · ¬xam ,t . ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:13

These clauses make sure that actions will not circularly enable each other. For example, given two actions o = (1, 1, {f }, ∅, ∅, {g}, ∅) and o′ = (1, 1, {g}, ∅, ∅, {f }, ∅), an initial state ∅ and a goal state {f, g}, there should be no solution. However, if we do not add circular mutex clauses, since there are two concurrent enabling relations CEnab(o⊢ , o′⊢ , g) and CEnab(o′⊢ , o⊢ , f ), we will find a wrong plan (o0 , o′0+ǫ ) or (o′0 , o0+ǫ ). Compared with the basic encoding (Section 3.2.2), the enhanced encoding has three differences. First, we add more clauses for concurrent enabling relation in the precondition clauses (Clauses 3). Second, our mutex clauses are fewer than before since we use a stricter mutex definition. Third, we add circular mutex clauses for concurrent enabling relations (Clauses 8). Based on the concurrent enabling definition, the enhanced encoding can address the limitation of the basic encoding as described in Section 3.3.1. For the example in Figure 2(b), we show the constructed relaxed plangraph at makespan 2 in Figure 4(b). We only show the planning graph at makespan 2 because it has “leveled off” after makespan 2 [Blum and Furst 1997]. Since CEnab(a1 , a2 , f ), a2 could be true at makespan 0 and we add two precondition clauses xa2 ,0 → xf,0 ∨ xa1 ,0 and xa2 ,1 → xf,1 ∨ xa1 ,1 (Note that xf,0 is not shown in Figure 4(b)). The partial assignment ψ where vψ (xa1 ,0 , xa2 ,0 , xa3 ,1 , xf,1 , xg,1 , xh,2 ) = (1, 1, 1, 1, 1, 1) satisfies the encoded SAT problem. Thus, the decoded plan (o0 , o0+ǫ ) is a solution of this example. Specifically, we set ǫ = 10−⌈log N ⌉−1 where N is the total number of actions in solution plans in our implementation. Similarly, the plan (o0 , o0+ǫ ) is a solution of the example in Figure 2(a). To show the difference between our two encoding schemes, we present the relaxed planning graph constructed by the basic encoding in Figure 4(a). Since a3 is not encoded in makespan 1 which means xa3 ,1 cannot be true, xa1 ,0 is false according to the durative action clause xa1 ,0 ↔ xa3 ,1 . Then, xf,1 is false according to the add-effect clause xf,1 → xa1 ,0 . Similarly, we could propagate that xa2 ,1 , xg,2 , xa3 ,2 , xa1 ,1 , xf,2 , and xa2 ,2 are all false. For planning graphs with larger makespans, there is no satisfiable assignment neither based on similar propagations. Thus, there is no assignment that could satisfy the basic encoding for this example. In summary, since the enhanced encoding based on concurrent enabling relations can describe the situation that o′ is executed immediately after o, it solves more problems than the basic encoding. Furthermore, compared with the planning graphs shown in Figure 4, the enhanced encoding has a more compact planning graph which means it may find solutions with shorter makespans than the basic encoding. For instance, given a CSTE problem with two actions o = (1, 1, ∅, ∅, ∅, {f }, ∅) and o′ = (2, 1, {f }, ∅, ∅, ∅, {g}), an initial state ∅, and a goal state {g}, the optimal solution found by our basic encoding scheme is (o0 , o′1 ) which has a makespan of 3, while by our enhanced encoding scheme is (o0 , o′0.01 ) which has a makespan of 2.01. 4. A BRANCH-AND-BOUND ALGORITHM FOR SOLVING MINCOST SAT

In this section, we develop a branch-and-bound based CDCL (BB-CDCL) algorithm for optimally solving MinCost SAT problems. BB-CDCL can be used to solve the MinCost SAT instances from both the basic and enhanced encodings. Based on the standard branch-and-bound procedure, we introduce a key planning specific technique: a variable branching scheme based on heuristic costs, which significantly improves the problem solving efficiency. 4.1. The BB-CDCL algorithm

Here we give an overview of the BB-CDCL procedure, which integrates two popular schemes, CDCL and branch-and-bound search procedures. The conflict-driven clause learning (CDCL) algorithm [Zhang et al. 2001; Mitchell 2005] is a modern variant of the DPLL algorithm [Davis et al. 1962; Zhang and Malik 2002] which adopts clause learning and constraint propagation techniques. It has been used in many modern SAT solvers [Marques-Silva et al. 1996; Moskewicz et al. 2001]. Although branch-and-bound has been studied and applied to SAT solving [Planes 2003; Alsinet et al. 2003; Fu and Malik 2006; Larrosa et al. 2009], planning-specific variable ordering techniques have not been extensively studied in a SAT solver. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:14

Time

Qiang Lu et al.

0

1

0

1

f

a1

fo

2

2

a2

fo

a1

actions

facts

actions

0

0

1

1

f

a1

fo

fo

g

a2

facts

actions

(a) Relaxed planning graph of the basic encoding

a2

a2

h goal facts

a3 initial facts

actions

fo

g

g

g

2

f

a1

CEnab

a3

initial facts

Time

f

f

a1

3

facts

actions

h goal facts

(b) Relaxed planning graph of the enhanced encoding

Fig. 4. Comparison of relaxed planning graphs between the basic and enhanced encoding schemes for the

example in Figure 2(b). The simple actions a1 , a2 , and a3 represent o⊢ , ao′ , and o⊣ , respectively. For simplicity, no-ops are represented by dots. The ignored actions and facts at makespan 0, 1, and 2 are those that cannot be true at the corresponding makespans in any satisfiable assignment.

The BB-CDCL procedure is shown in Algorithm 2. The algorithm uses a propagation queue that contains all literals pending propagation and a representation of the current assignment. In the procedure, a variable is free if it has not been assigned a value. Initially, all variables are free. The BB-CDCL algorithm repeatedly propagates the literals in the propagation queue and returns a conflict if there is any (Line 5). Once a conflict occurs, the procedure analyze() checks the conflict to generate a learned clause [E´en and S¨orensson 2003] (Line 7); after that, it calls backtrack() to undo the assignment until exactly one of the literals in the learned clause becomes unassigned (Line 12). If no conflict occurs, it calls the cost propagate() procedure to estimate the heuristic cost of variables changed by the last propagation (Line 15). It prunes a search node if the current costs exceeds τ , the cost of the incumbent (the current best) solution (Line 16-17), or calls decide() to select a free variable, assigns it to be true or false, and inserts it into the propagation queue (Line 24). Then a new iteration takes place. Each time a satisfying solution is found (when there is no free variable) (Line 19), it updates the incumbent solution, including the solution number num and threshold τ , and then backtracks (Line 20-22). BB-CDCL keeps searching the whole space until all satisfying solutions are either visited or pruned, in order to find the one that minimizes cost(), the objective function of the MinCost SAT problem. The procedure stops when a top level conflict is found (Line 8-9). 4.2. Heuristic cost based variable branching

The variable branching scheme is one of the most important components in a SAT solver. A good branching scheme can improve the search efficiency significantly. In our basic BB-CDCL procedure, the variable branching scheme is the same as that in MiniSat [Zhang and Malik 2002; E´en and S¨orensson 2003], an improved version of VSIDS (Variable State Independent Decaying Sum) [Moskewicz et al. 2001]. VSIDS works as follows: (1) Each variable x has a priority value p(x), initialized to 0. (2) δp is a priority increment that is initialized to 1. (3) In decide(), with a constant probability P0 , randomly select an unassigned variable x, and with probability 1 − P0 , select the unassigned variable with the highest priority value. Assign the selected variable to true. (4) Whenever a learnt clause is generated by analyze() in BB-CDCL, for each variable x in the new learnt clause, we update the priority values p(x) to p(x) + δp . After that, multiply δp by a constant θ > 1. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

A:15

ALGORITHM 2: BB-CDCL(Φc ) Input: MinCost SAT problem Φc Output: a solution with minimum cost cost init() ; τ ←∞; num ← 0 ; while true do conflict ← propagate(); if conflict then learnt ← analyze(conflict); if conflict is of top-level then return num > 0 ? SAT:UNSAT; else add learnt to the clause database; backtrack(); end else cost propagate() ; if cost(ψ) ≥ τ then backtrack(); else if all variables are assigned then num++ ; τ ← cost(ψ) ; backtrack(); else decide(); end end end end

(5) Periodically divide all priority values by a large constant γ and reset δp to 1. In MiniSat, P0 = 0.02, θ = 1.2, and γ = 100. VSIDS is competitive to other variable branching heuristics for SAT solving [E´en and S¨orensson 2003]. MinCost SAT problems differ from SAT problems in that they have an optimization goal of minimizing the total variable costs. Hence, the variable branching mechanism can be improved by considering the variable costs. We present a branching scheme that is customized and more effective than VSIDS for CSTE planning. Specifically, we propose a heuristic function to evaluate each variable’s potential costs. Then, we integrate the heuristic evaluation into the VSIDS branching scheme. 4.2.1. Heuristic cost evaluation. The implementation of our heuristic cost function is customized for CSTE planning based on the relaxed planning graph [Blum and Furst 1997; Hoffmann and Nebel 2001]. At each decision point (corresponding to a partial assignment ψ) during the search, for each variable x (both assigned and unassigned in ψ are included), h(x) is maintained as a lower bound of the following quantity: the minimum total action costs of any solution plan that: 1) reaches the assignment vψ (x) = 1 from the initial state I, and 2) is consistent with the partial assignment ψ.

Definition 4.1. Given a partial assignment ψ, for each variable x ∈ V , the lower bounding function h(x) is defined as: — if vψ (x) = 0, then h(x) = ∞; ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:16

1 2 3 4 5 6 7 8 9 10 11

Qiang Lu et al.

ALGORITHM 3: cost init() Input: Πs = (I, F s , Os , G), Φc = (V, C, µ), N for all xf,0 ∈ V do set h(xf,0 ) = 0 if f ∈ I and h(xf,0 ) = ∞ otherwise ; end for t=0 to N do for all xa,t ∈ V do compute h(xa,t ) using Definition 4.1; end for all xf,t+1 ∈ V do compute h(xf,t+1 ) using Definition 4.1; end end

— if vψ (xf,t ) = 1 or xf,t is unassigned, then:  h(xa,t−1 ), t > 0  ∀a,fmin ∈add(a) h(xf,t ) = 0, t = 0 and f ∈ I  ∞, t = 0 and f ∈ /I

— if vψ (xa,t ) = 1 or xa,t is unassigned, then:  n o [ ′ ′ ′ min h(xf,t ) {h(xa ,t )|a st. CEnab(a , a, f )} h(xa,t ) = µ(xa,t )α(xa,t ) + max f ∈pre(a)

f

where α(xa,t ) = 0 if vψ (xa,t ) = 1, otherwise α(xa,t ) = 1.

For a variable x assigned to be false, since no solution plan satisfying ψ can reach vψ (x) = 1, we have h(x) = ∞. The lower bound of a non-false assignment fact variable xf,t is the minimum estimated value of action variables {xa,t−1 |f ∈ add(a)}. The necessary condition for xa,t to be true is that each precondition of action a should be satisfied. For each precondition f , it is satisfied either f is true or an action which concurrently enables a is true. Thus, a lower bound for h(xa,t ) is the maximum of the lower bound of the cost for satisfying a’s precondition variables and the lower bound of the cost for satisfying any precondition f is the minimum of the h values of xf,t and all action variables that concurrently enable f . Note that α(x) makes the variable cost µ(x) only be counted once in h(x) or cost(ψ). The heuristic function can be applied to MinCost SAT instances from both the basic and enhanced encodings. Since there is no concurrent enabling relation in our basic encoding, it only counts h(xf,t ) and ignores any concurrent enabling action variables’ cost h(xa′ ,t ) when computing a heuristic cost of an action variable h(xa,t ) in the SAT instance from the basic encoding. The algorithms for initializing and maintaining the h(x) values for all x ∈ V are shown in Algorithms 3 and 4, respectively. To initialize the cost function h(x), we first set h(xf,0 ) = 0 if f ∈ I or h(xf,0 ) = ∞ otherwise. Then, we set the initial values for variables from level 0 to N following Definition 4.1. Algorithm 4 updates the h values each time when no conflict occurs during the search. It uses a priority queue U to store all variables whose h values need to be updated after a constraint propagation. Our heuristic function is based on the idea of integrating the max-heuristic rule with the concurrent enabling relations. The max-heuristics have been extensively studied in SAT solvers (such as MinCostChaff [Fu and Malik 2006] and DPLLBB [Larrosa et al. 2009]). Compared with the generic max-heuristics in MinCostChaff and DPLLBB , the implementation of the heuristic function is customized for CSTE planning which considers durative actions and concurrent enabling relations. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

A:17

ALGORITHM 4: cost propagate() Input: Πs = (I, F s , Os , G), Φc = (V, C, µ) initialize U as a priority queue sorted by t; while U 6= ∅ do get x from U , U ← U \{x}; if x = xa,t ∈ V then if vψ (xa,t )=false then newcost ← ∞; else compute newcost using Definition 4.1; end if newcost 6= h(xa,t ) then h(xa,t ) ← newcost; for all f ∈ add(a) do U ← U ∪ {xf,t+1 }; end end end else if x = xf,t ∈ V then if vψ (xf,t )=false then newcost ← ∞; else compute newcost using Definition 4.1; if newcost 6= h(xf,t ) then h(xf,t ) ← newcost; for all a such that f ∈ pre(a) do U ← U ∪ {xa,t }; end end end end 4.2.2. New variable branching scheme. Integrating the new heuristic costs into the VSIDS heuristic, we have the following variable branching rule for BB-CDCL:

(1) Each variable x has a priority value p(x). Initialize p(x) as follows:  h(x), if h(x) 6= ∞ p(x) = 0, otherwise

h(x) is initialized in Algorithm 3. (2) δp is a priority increment that is initialized to 1. (3) In decide(), with a constant probability P0 , randomly select an unassigned variable x, and with probability 1 − P0 , select the unassigned variable with the highest priority value. Assign the selected variable to false. (4) Whenever a learnt clause is generated by analyze() in BB-CDCL, for each variable x in the new learnt clause, increase the priority values p(x) to p(x)+δp . After that, multiply δp by a constant θ > 1. (5) Whenever no conflict occurs after calling propagate(), we call cost propagate() to update the heuristic costs in Algorithm 2 (Line 15). For each variable x whose heuristic cost has been changed, we update the priority value p(x) as follows: ( p(x) − hold (x) + h(x), if hold (x) 6= ∞ and h(x) 6= ∞ p(x) + h(x), if hold (x) = ∞ and h(x) 6= ∞ p(x) = p(x), if h(x) = ∞ (6) Periodically divide all priority values by a large constant γ and reset δp to 1. In our implementation, we set P0 = 0.02, θ = 1.2, and γ = 100. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:18

Qiang Lu et al.

Compared with the VSIDS heuristics, our BB-CDCL heuristic gives higher priority to variables with higher heuristic costs. Since a heuristic cost of a variable x is the lower bound of the total costs of any solution plan that reaches the assignment vψ (x), branching early on those variables with higher heuristic cost and assigning them to be false will likely avoid higher costs solution plans. Further, it will result in a search space with a lower heuristic cost that is more likely to lead to solution plans with lower costs. For example, suppose there are two actions a1 and a2 both of which add a goal fact f , which means that there may be two different solution assignments satisfying the goal variable xf,N . If h(xa1 ,N −1 ) > h(xa2 ,N −1 ) and we first assign xa1 ,N −1 to be false, our SAT solver will make xa2 ,N −1 to be true through propagation. 5. EXPERIMENTAL RESULTS

We report our experimental results in two aspects. First, we test seven temporally expressive planners to show the performance of our CSTE planner. Second, we study the effectiveness of our new variable branching scheme, by evaluating the efficiency of some MinCost and Max-SAT solvers. We run all experiments on a workstation with a dual core AMD Opteron 2200 processor and 2GB memory. Sun Java 1.6 and Python 2.6 run-time systems are used. The time limit, for each instance, is set to 1800 seconds. 5.1. Testing domains

Our experiments are performed on a P2P domain we develop and several other CSTE domains [Coles et al. 2008]. Since the original domain definitions in [Coles et al. 2008] do not specify action costs, we examine those actions and assign them reasonable numerical costs. Besides the original problems, we also generate some larger instances from a problem generator that we develop. Note that we do not use all the domains in [Coles et al. 2008] because some of them cannot scale to large problems (e.g. the Match domain), and a few of them have variable-duration actions (e.g. the Caf´e domain). The following is a brief description of the domains that are included in the experiments. (1) Peer-to-Peer Domain. This domain models file transfers in Peer-to-Peer (P2P) networks. In Peer-to-Peer (P2P) networks, each computer, called a peer, may upload or download data from another. One critical issue in P2P networks is that a substantial amount of inter-peer data communication traffic is unnecessarily duplicated. For those systems having consistent and intensive data sharing between peers, communication latency is a potential bottleneck of the overall network performance. Mechanisms in network design, particularly proxy caching (a.k.a. gateway caching), have been proposed to reduce duplicated data transmission. Making a good use of the proxy cache is critical for optimizing data transmission. Note that we build the action models based on the study of the data communication in P2P networks. On the other hand, we need to point out that there are some other ways which are helpful to generate the definition of action models, such as ARMS and LAMP algorithms which can automatically discover action models from a set of successful observed plans [Yang et al. 2007; Zhuo et al. 2010]. There are at least two different types of optimization in P2P networks. The first one is approached from the user’s point of view: each individual user wants all the data needed within the shortest possible time [Bhattacharya and Ghosh 2007]. The other type is approached from the point of view of a network service provider (such as an Internet service provider (ISP)), who owns the network but does not control individual peers. The main concern of a service provider is to reduce the overall communication load. These two performance metrics are typically conflicting. We adopt performance metrics that lie in the middle of the above mentioned two. Under these metrics, the network owner knows each peer’s needs, and the objective is to minimize the overall makespan for all the data delivery for all peers and minimize the total communication loads caused by different actions ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:19

including serving and downloading. The problem, when casted as a planning problem, is temporally expressive. The main constraint in this problem is to satisfy a file request from a peer p1 , the same file has to be offered by another peer p2 . p1 can execute the download action to get the file, when 1) there is a route between p1 and p2 , and 2) p2 is serving the file throughout the transferring. As such, these serve and download actions require concurrency in any valid plan. In addition, the proxy cache, which stores local caching files, will guarantee that, when p2 is serving a file, any peer that is routed to p2 can download the file very quickly. The upload bandwidth of a peer is typically much narrower than its download bandwidth. Therefore, enforced by the optimality goal, the more peers downloading this particular file, the larger the whole network’s throughput will be, which brings about a shorter time span in a solution plan. In the serve action, for example, the processing time of a file is proportional to its file size. We assume that by actively sharing a file, the uploading peer uses up its uploading bandwidth. That is, we assume that it cannot share another file simultaneously. This assumption will not impose a real restriction as we can introduce a time sharing scheme to extend the method we develop. A predicate “serving” as one of the add-effects at the beginning indicates that the peer is sharing a file. When sharing a file from a peer, the connected route will guarantee that any other peers can get this file in a constant time (because download speed is much faster), as long as it is routed to the uploading peer. The instances are generated randomly with different parameter settings, and the size of each file object is randomly chosen from four to eight units. The goal state for each instance is that each peer gets all requested files. There are two types of problem settings with different structures of network topology: one is loosely connected and the other is highly connected. We assume that downloading is cheaper than uploading and thus has a lower action cost. (2) Matchlift domain. In a Matchlift problem [Coles et al. 2008], an electrician enters an building to fix fuses during an outrage. Since there is no light, the electrician needs to light a match to make it possible to repair in a dark room. The required concurrencies between the action of lighting of a match, and the action of mending the fuse, make the problem temporally expressive. Furthermore, before mending a fuse, the electrician may need to travel through the building by taking the elevator to the appropriate floor, and then find and enter the correct room. The original Matchlift domain [Coles et al. 2008] has some flaws, in which an electrician’s position is not updated until the end of a durative action. This can introduce a huge increase to the number of electricians needed in Crikey and Crikey3, and eventually electricians will exist everywhere. To make Crikey and Crikey3 work properly in this domain, we fix the flaws and use the revised version of Matchlift domain for this set of experiments. We generate all instances randomly using different parameters for the numbers of floors, rooms, matches and fuses. Each instance has the same number of fuses and matches. In other words, these instances are easier because we can always find a valid plan, such that there is exactly one fixing action to be concurrent with a lighting match action. The action costs are set in a way that reflects how much energy is consumed. For example, actions for operating the elevator have higher costs than others. (3) Matchlift-Variant domain. The original Matchlift domain only requires one electrician to do the repairing. Also, there are always enough matches available. It is of a relatively weak form of required concurrency. We make a revised Matchlift domain (called Matchlift-Variant domain), which requires more concurrencies due to two changes. First, the number of matches is less than the number of fuses, so that multiple electricians need to share one match. Second, we reduce the duration of the ‘mend fuse’ action so that an electrician is able to conduct more mending actions during one match’s lighting, which also results in higher concurrencies. The setting for actions costs is the same as the Matchlift domain. (4) Driverslogshift domain. The Driverslogshift domain [Coles et al. 2008] is an extended version of the Driverslog domain from IPC-3 [IPC 2002]. It has the same problems as those defined in ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:20

Qiang Lu et al.

… … ... I

… … ... II

… … ... III

Fig. 5. This figure partially illustrates temporal dependencies of actions for instances in three domains: Trucks, Matchlift and P2P. Each node represents a temporal action. Each edge represents a temporal dependency between two temporal actions.

the original Driverslog domain, as long as the worker is in the ‘working’ status. The working status, for each individual worker, is modeled as a durative action with a fixed duration. After the working action is over, the worker needs to take a rest, which takes a constant duration. The working action has to be concurrent with other actions by the worker. This is why the problem is temporally expressive. The possible actions of a worker, are driving trucks between locations, loading/unloading the trucks, and walking between locations. The problem instances in the Driverslogshift domain have much longer makespan than those in the Matchlift and P2P domains. Compared with P2P and Matchlift, this domain has long durative actions, which give rise to a long makespan. Therefore, it is relatively difficult to optimally solve instances in this domain. We set action costs to reflect the usage of energy: driving a truck and loading/unloading the packages have higher action costs than other actions. High concurrencies in CSTE planning problems are very different from most other temporal planning problems we have seen. Figure 5 illustrates the temporal dependencies (Definition 2.4) in several instances from different domains. All these instances have comparable problem sizes. The instance of P2P domain has 90 facts and 252 actions, and the instance of Matchlift domain [Coles et al. 2008] has 216 facts and 558 actions. Figure 5 (I) is an instance of the Trucks domain, which is temporally simple and thus has all actions isolated. In Figure 5 (II) for the Matchlift domain, each action has up to two actions temporally depending on it. In Figure 5 (III) for the P2P domain, each action has up to five actions temporally depending on it. 5.2. Results of optimizing makespans

Seven planners are tested and compared in our experiments. We test five temporally expressive planners: Crikey [Coles et al. 2009] (runnable java JAR), Crikey3 [Coles et al. 2008] (static statically-linked binary for x86 Linux), POPF [Coles et al. 2010], LPG-c [Gerevini and Serina 2002], and Temporal Fast Downward (TFD) [Eyerich et al. 2009]. We also test SCP and SCP2. SCP uses the basic encoding and SCP2 uses the enhanced encoding. Both SCP and SCP2 do not minimize action costs. For each makespan, SCP and SCP2 use MiniSAT2 [E´en and S¨orensson 2003], integrated with our new branching scheme, as its SAT solver to either prove unsatisfiability or find a first satisfiable solution, disregarding the objective in the MinCost SAT formulation. We do so because these planners optimize makespan but not action costs and we want a fair comparison of their runtime. Note that SCP2 always gives the optimal makespan according to our empirical results. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:21

Table I. Number of solved problems for each planner in each domain.

Domains # Crikey Crikey3 POPF LPG-c TFD SCP SCP2

P2P 14 0 7 14 0 0 13 14

Matchlift 11 11 11 10 2 11 7 7

Matchlift-Variant 12 7 6 7 3 3 12 12

Driverslogshift 9 9 9 9 9 0 9 9

Σ 46 27 33 40 14 14 41 42

Line “#” shows the number of instances in each domain.

45 40 Number of solved problems

35 30 25 20 15 10 5 00

200

400

600

800 Time

1000

1200

Crikey Crikey3 POPF LPG-c TFD SCP SCP2

1400

1600

Fig. 6. Overall results of all planners in all domains. We show the number of solvable problems with regards

to increasing time limits.

We first show the overall results of search time and number of solved problems in Table I and Figure 6. As shown in Table I, our planners (SCP and SCP2) can solve more problems than other planners. Figure 6 shows that our planners (SCP and SCP2) can search much faster than Crikey, LPG-c, and TFD. POPF runs fast when compared with our planners. However, SCP and SCP2 can solve more problems than POPF (SCP solves 41, SCP2 42, and POPF 40 problems). Further, SCP2 can solve more problems and run faster than SCP. We show the makespan results in Figure 7 for each domain. As shown in the figure, SCP finds solutions with shorter makespans than other planners (Crikey, Crikey3, POPF, LPG-c, and TFD) in most problems. However, SCP2 performs even better than SCP. Specifically, SCP2 can find shorter makespans than SCP in 16 instances. This is due to the relaxed parallel plan semantics in SCP2. SCP without this semantics may not find the optimal makespan for certain cases. More detailed results of search time and makespan are presented in Tables II, III, IV, and V. Since all planners except SCP2 do not consider action costs, we only present search time and makespan in these tables. We will show the results of total action costs in next section. The detailed results of each domain are presented as follows. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:22

Qiang Lu et al.

200

160

Crikey Crikey3 50

POPF LPG-c TFD

140

Makespan

Makespan

60

Crikey3 POPF SCP SCP2

180

120 100 80

40

SCP SCP2

30

20

60 10

40 20 0

4

2

6

8

10

Instance

12

0 0

14

4

2

6

8

10

(a) P2P

(b) Matchlift

35

800

Crikey

Crikey

Crikey3

Crikey3

700

POPF

30

POPF

LPG-c

LPG-c 600

TFD SCP 25

Makespan

Makespan

12

Instance

SCP2

20

SCP SCP2

500

400

300 15 200

10 0

2

4

6

8

Instance

(c) Matchlift-Variant

10

12

100 1

2

3

4

5

6

7

8

9

Instance

(d) Driverslogshift

Fig. 7. Comparisons of the solution makespans.

P2P domain. The results of P2P domain are shown in Table II. If a planner is able to solve all instances, more information is presented in the row ‘Σ’ in Table II, which is the summation of solving time or makespan over all instances. This is for an easy comparison of different solvers. Crikey, LPG-c, and TFD are not included because they all fail to solve any instance in this domain. Instances 1 to 9 have simple topologies. Each peer is connected to no more than two other peers. Also, in the initial state, only leaf peers (those connected to only one peer) have files to share. There are less concurrencies in this setting. Crikey fails to solve any instance in this category. Crikey3 solves 7 out of 14 instances. It is faster on three simpler instances but slower than SCP and SCP2 on three other larger instances. Overall, the makespans of solutions found by Crikey3 are up to four times longer than those found by SCP and SCP2. Instances 10 to 14 have more complicated network topologies. Nearly all nodes in these networks are connected to more than one other node. Every peer has some files needed by all others. In these cases, high concurrencies are required to derive a plan. Both Crikey and Crikey3 fail to solve any of these instances. Crikey3 times out and Crikey reports no solution. It may be due to their incompleteness. POPF and SCP2 solve all the instances. Since POPF tries to optimize makespan, it always finds shorter makespans than Crikey3. POPF also runs much faster than Crikey3, SCP, and SCP2. However, SCP and SCP2 find shorter solution makespans than POPF in 6 instances. Compared with SCP, SCP2 finds even shorter makespans in these 6 instances and runs faster on large instances (Instances 13 and 14). ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:23

Table II. Experimental results in the P2P domain. P

C,F

1 2 3 4 5 6 7 8 9 10 11 12 13 14

4,4 5,5 6,6 6,6 5,5 6,6 7,6 6,7 7,7 5,25 6,18 6,24 6,30 7,35 Σ

Crikey3 T H 0.1 22 0.1 32 0.2 40 1.4 72 7.2 100 111.5 150 Time Out 287.7 200 Time Out Time Out Time Out Time Out Time Out Time Out n/a

POPF T H 0.0 22 0.0 32 0.1 40 0.0 34 0.1 43 0.2 56 0.3 73 0.3 66 0.4 79 0.8 32 0.6 20 1.1 23 1.9 31 3.1 36 9.0 587

SCP T H 0.8 22 3.2 32 16.6 40 0.7 27 1.8 34 3.8 39 18.3 54 25.4 49 81.5 60 35.0 32 10.0 20 22.9 23 959.4 31 Time Out n/a

SCP2 T H∗ 0.6 22 3.2 32 17.1 40 0.3 26 1.4 31 1.6 36 10.4 49 21.8 46 33.8 55 72.7 32 23.2 20 33.4 23 323.2 31 387.1 36 929.9 479

Column ‘P’ is the instance ID. Columns ‘C’ and ‘F’ are the numbers of peers and files, respectively, in the networks. Columns ‘T’ and ‘H’ are the solving time and makespan of solutions, respectively. ‘Timeout’ means that the solver runs out of the time limit of 1800s and ‘-’ means no solution is found. If a planner solves all the instances, row Σ gives the sum of all numbers in the corresponding column (‘T’ or ‘H’). We use H∗ for SCP2 since SCP2 gives the optimal makespan H. Table III. Experimental results in the Matchlift domain. P

L,M,R,U

1 2 3 4 5 6 7 8 9 10 11

2,3,4,3 3,2,9,2 2,3,4,3 3,3,9,3 3,4,9,4 3,5,9,5 3,6,9,6 3,7,9,7 4,4,16,4 4,5,16,5 4,6,16,6 Σ

Crikey T H 3.0 13 0.7 11 2.9 23 9.6 19 52.8 35 24.4 39 131.9 37 73.8 42 60.4 39 234.1 28 376.7 47 970.3 333

Crikey3 T H 0.1 18 0.3 14 0.1 28 0.1 34 0.1 43 0.4 47 0.4 58 1.9 58 0.1 43 0.1 58 0.7 58 4.3 459

POPF T H 0.0 13 0.0 9 0.0 23 0.0 19 0.1 27 0.1 29 0.1 33 0.1 32 0.1 29 0.1 29 0.2 35 0.9 278

LPG-c T H 0.1 17 12.6 9 n/a

TFD T H 0.1 13 0.1 9 0.1 23 0.1 19 0.1 25 8.3 34 1.2 42 20.8 35 7.3 39 0.1 30 0.1 33 38.3 302

SCP T H 0.6 13 0.3 9 20.8 23 2.6 19 390.8 25 1472.2 27 Time Out Time Out 1196.4 32 Time Out Time Out n/a

SCP2 T H∗ 0.9 13 1.3 9 17.0 23 5.0 19 303.9 25 573.9 26 Time Out Time Out 366.1 27 Time Out Time Out n/a

The numbers in Columns ‘L’, ‘M’, ‘R’ and ‘U’ represent the numbers of floors, matches, rooms and fuses, respectively, which are used in generating the instances. Matchlift domain. The results of the Matchlift domain are shown in Table III. On all instances, POPF is the fastest to find solutions. POPF finds solutions with shorter makespans than Crikey, Crikey3, LPG-c, and TFD in most instances. SCP and SCP2 solve problems with longer time than Crikey, Crikey3, POPF, and TFD (SCP and SCP2 run time out in four instances). However, they can find shorter solution makespans than all other planners in solved instances. Compared with SCP, SCP2 runs faster on larger instances (Instances 5, 6, and 9) and finds shorter makespans on instances 6 and 9. Matchlift-Variant domain. The results are shown in Table IV. All instances are generated with increasing numbers of fuses and electricians. All the other settings, such as the number of floors, rooms and matches, are randomly set. Instances with the same number of fuses and electricians may still have different degrees of concurrency, due to different numbers of matches and other resources. For example, although Instances 7 and 8 have the same parameters, Instance 8 is more difficult than Instance 7 due to different ways the fuses are distributed among rooms. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:24

Qiang Lu et al. Table IV. Experimental results in the Matchlift-Variant domain.

P

E,M,U

1 2 3 4 5 6 7 8 9 10 11 12

2,2,4 2,1,4 2,3,5 2,2,5 2,4,6 2,2,6 3,2,7 3,2,7 3,4,8 3,2,8 4,3,8 4,1,8 Σ

Crikey T H 10.9 14 147.7 13 6.1 19 106.0 25 20.3 23 121.6 25 167.1 17 Time Out Time Out Time Out Time Out n/a

Crikey3 T H 5.1 17 7.5 16 0.1 23 41.8 27 0.1 33 42.0 27 Time Out Time Out Time Out Time Out Time Out Time Out n/a

POPF T H 5.3 19 17.6 13 0.0 23 41.8 27 0.1 27 Time Out 13.4 24 Time Out 0.2 23 Time Out Time Out Time Out n/a

LPG-c T H 0.4 13 460.9 21 0.1 33 n/a

TFD T 0.5 5.9 104.7 n/a

H 19 29

33

SCP T 1.0 0.5 2.9 13.8 194.6 20.4 14.6 13.5 647.1 50.2 384.9 1.1 1344.5

H 15 13 19 22 23 22 18 18 23 22 18 13 226

SCP2 T H∗ 1.0 14 0.8 13 5.5 19 56.3 21 284.1 23 44.8 21 12.8 17 21.0 17 1325.0 23 110.1 21 135.2 17 2.3 13 1999.0 219

The numbers in columns ‘E’, ‘M’ and ‘U’ represent the numbers of electricians, matches, and fuses, respectively. Table V. Experimental results in the Driverslogshift domain. P

D,P,T

1 2 3 4 5 6 7 8 9

2,2,2 2,2,2 2,3,2 2,3,2 2,4,2 2,4,3 3,6,3 3,6,4 3,7,3 Σ

Crikey T H 16.7 122 4.0 122 18.8 122 19.8 122 38.3 102 10.4 122 201.4 102 180.4 102 159.5 102 649.3 1018

Crikey3 T H 0.1 224 0.1 122 0.1 225 0.2 323 0.1 238 0.1 326 0.2 102 0.2 125 0.2 125 1.3 1810

POPF T H 0.0 118 0.0 122 0.0 122 0.1 224 0.0 102 0.0 224 0.0 102 0.0 102 0.0 102 0.3 1218

LPG-c T H 336.7 712 2.2 244 712.5 346 365.3 122 653.3 224 85.8 224 9.3 102 2.2 102 15.9 102 2183.2 2178

SCP T H 4.5 102 9.1 122 18.9 122 20.0 122 11.5 102 4.0 123 264.1 102 228.1 102 191.2 102 751.3 999

SCP2 T H∗ 3.0 102 11.6 122 42.9 122 17.7 122 11.7 102 2.4 122 112.6 102 155.9 102 141.8 102 499.6 998

The numbers in columns ‘D’, ‘P’ and ‘T’ represent the numbers of drivers, packages, and trucks, respectively. As shown in Table IV, SCP2 finds optimal makespans on all instances tested, whereas Crikey, Crikey3, and POPF run out of time on most instances and generate suboptimal plans on the few instances they finished. For the instances they solved, POPF has the worst makespans. Both LPG-c and TFD are not good in this domain, with three instances solved. By comparing to the optimal makespans found by SCP2, we see that SCP fails to find the optimal solution for instances 1, 4, 6, 7, 8, 10, and 11. Driverslogshift domain. POPF again is the fastest among all planners. Crikey3 runs slightly slower than POPF while its makespans are much worse than that of all others except LPG-c. As shown in Table V, the optimal makespans provided by SCP2, are typically much shorter than those by Crikey3, POPF, and LPG-c. For example, the optimal makespan for Instance 6 in Table V is about one third of the makespan reported by Crikey3 and half of makespans by POPF and LPG-c. SCP does not find the optimal solution on instance 6. TFD is not included because it fails to solve any instance in this domain. Overall, our experiments on all the CSTE domains show that: 1) Our planners, SCP and SCP2, solve the problems efficiently and compare favorably with the existing temporally expressive planners such as Crikey, Crikey3, LPG-c, and TFD; 2) SCP2 can guarantee to find the optimal makespans while all other planners cannot. SCP finds optimal makespans on most instances. For those instances SCP cannot optimize, most of times it finds better makespans than the other planners. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:25

Table VI. Numbers of variables and clauses in P2P domain and Matchlift-Variant domain. P 1 2 3 4 5 6 7 8 9 10 11 12

#VAR 4428 11700 27814 3876 7175 11934 19485 18310 25767 57200 33798 52362

P2P #Clause 18052 51428 132648 16856 33737 60612 103707 102873 147766 529947 225429 422329

T 0.6 3.2 17.1 0.3 1.4 1.6 10.4 21.8 33.8 72.7 23.2 33.4

Matchlift-Variant #VAR #Clause T 3076 17758 1.0 1931 8921 0.8 5882 44999 5.5 5097 34036 56.3 9408 92656 284.1 5326 37510 44.8 5886 46443 12.8 6306 52344 21.0 14431 189557 1325.0 7887 69399 110.1 10455 117585 135.2 3935 24157 2.3

Column ‘#VAR’ is the number of variables, column ‘#Clause’ is the number of clauses and column ‘T’ is the overall solving time by SCP2. Number of variables and clauses. One may concern about the size of SAT encoding and the time cost of SCP2, which are issues that any optimal planner faces. In Table VI, we list the numbers of variables and clauses of each instance (in the last iteration). We show data in the P2P and the Matchlift-Variant domains. The solving time is presented to show the difficulty of the instances. Similar to other SAT problems, it is obvious that the size of encoding does not necessarily reflect the complexity of a problem. For example, the numbers of variables and clauses of Instance 4 in the Matchlift-Variant domain are both slightly fewer than those of Instance 3. However, Instance 4 is solved 10 times more slowly than Instance 3. In general, the encoding of SCP2 on current problem instances may have up to hundreds of thousands of variables, which are within the capability of current SAT solvers. Various improvements to SAT-based planning could also be applied to SCP2. We plan to further study techniques such as encoding in new formulations [Robinson et al. 2008; Huang et al. 2010] and deriving constraint-based pruning clauses [Chen and Yang 2010]. 5.3. Results of minimizing the total action costs

Under the proposed SCP2 framework, we consider three strategies for solving MinCost SAT instances and minimizing action costs at the optimal makespan: 1) using BB-CDCL algorithm with VSID variable branching scheme (denoted as SCP2bb ), 2) using BB-CDCL with our heuristic cost based variable branching scheme (denoted as SCP2bbh ), and 3) using a transformation from MinCost SAT to WPMax-SAT (weighted partial Max-SAT, introduced in Appendix A) and a generic WPMax-SAT solver WBO [Manquinho et al. 2009] (denoted as SCP2max ). The reduction from MinCost SAT to WPMax-SAT is commonly used in solving MinCost SAT problems. WBO, the winner of weighted partial Max-SAT (industrial track) in the Max-SAT 2010 Competition [Max-SAT 2010], is an anytime solver, which keeps finding better solutions as it progresses. Figures 8 and 9 show the solution quality and the solving time of different strategies. For each strategy, we present the results (total action costs and search time) of the first solution and the bb best costs solution found in 1800 seconds. For example, SCP2bb 1 and SCP2b represent the first bb solution and the best costs solution of SCP2 , respectively. Similar notation applies to SCP2bbh actually represents the result of the original MiniSAT which uses and SCP2max . Note that SCP2bbh 1 our new variable branching scheme and quits at the first solution. Thus, the search time of SCP2bbh 1 is the same with the results of SCP2 presented in Section 5.2. bbh Comparing the results of SCP2bb in Figures 8 and 9, we can see that the new b to SCP2b variable branching scheme is better than the original VSIDS scheme. SCP2 finds better solutions and is faster using the new variable branching scheme. Comparing the results of SCP2bbh to 1 ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:26

Qiang Lu et al.

Total action costs

10000

8000

2500

SCP2bb1 SCP2bbb SCP2bbh 1 SCP2bbh b SCP2max 1 SCP2max b

2000

Total action costs

12000

6000

4000

1500

SCP2bb1 SCP2bbb SCP2bbh 1 SCP2bbh b SCP2max 1 SCP2max b

1000

500 2000

0 0

4

2

6

8

10

12

0 1

14

2

3

4

Instance

(a) P2P

Total action costs

1600

1400

1200

45000

SCP2bb1 SCP2bbb SCP2bbh 1 SCP2bbh b SCP2max 1 SCP2max b

40000

1000

800

600

7

8

9

35000

30000

25000

7

8

9

SCP2bb1 SCP2bbb SCP2bbh 1 SCP2bbh b SCP2max 1 SCP2max b

20000

15000

10000

400

200 0

6

(b) Matchlift

Total action costs

1800

5

Instance

5000

2

4

6

8

Instance

(c) Matchlift-Variant

10

12

0 1

2

3

4

5

6

Instance

(d) Driverslogshift

Fig. 8. Comparisons of the total action costs by different solvers. The y-axis shows the solution quality in terms of total action cost.

SCP2bbh b , we can find that they find the same solutions with the same time in a lot of instances (About 33 out of total 46 instances). It shows that our new variable branching scheme has a high probability to guide the search to find the best solution the first time it finds any satisfiable solution. SCP2bbh also shows competitive performance compared with SCP2max . SCP2bbh finds better b b b solutions in 12 instances and better summarized total action costs in P2P, Matchlift, and Driverslogshift domains than SCP2max . For most instances, SCP2bbh spends less time to find the b b best solutions than SCP2max . b In conclusion, our new variable branching scheme can significantly improve the basic BB-CDCL algorithm in both search time and solution quality. Furthermore, the BB-CDCL algorithm with the new variable branching scheme shows competitive performance against the state-of-the-art WPMax-SAT solver WBO. Detailed results for all instances in the testing CSTE planning domains are presented in Tables VII, VIII, IX, and X in Appendix B. Since the information of the cost of solutions found by other temporal expressive planners is also important, we also list the cost results of Crikey, Crikey3, POPF, LPG-c, and TFD in Appendix B. Surprisingly, Crikey, Crikey3, POPF, and LPG-c find the solutions with better total action costs than SCP2bbh and SCP2max in a few problems, b b because for these problems lower costs can be achieved under a longer makespan. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

1800

1200 1000

1600 1400 1200

Search time

1400

Search time

1800

SCP2bb1 SCP2bbb SCP2bbh 1 SCP2bbh b SCP2max 1 SCP2max b

1600

1000

800 600 400

SCP2bb1 SCP2bbb SCP2bbh 1 SCP2bbh b SCP2max 1 SCP2max b

800 600 400

200 00

200 4

2

6

8

10

Instance

12

01

14

2

3

(a) P2P

Search time

1000 800

4

5

Instance

6

7

8

9

7

8

9

(b) Matchlift

800

SCP2bb1 SCP2bbb SCP2bbh 1 SCP2bbh b SCP2max 1 SCP2max b

700 600

Search time

1200

A:27

500

600

SCP2bb1 SCP2bbb SCP2bbh 1 SCP2bbh b SCP2max 1 SCP2max b

400 300

400

200 200 00

100 2

4

6

Instance

8

(c) Matchlift-Variant

10

12

01

2

3

4

5

Instance

6

(d) Driverslogshift

Fig. 9. Comparison of the anytime performance of different solvers.

6. RELATED WORK 6.1. Temporal planning

The works most related to our research are temporally expressive planners that can handle concurrency, such as TM-LPSAT [Shin and Davis 2004], LPG-c [Gerevini et al. 2010], Crikey [Coles et al. 2009], Crikey3 [Coles et al. 2008], POPF [Coles et al. 2010], and TFD [Eyerich et al. 2009]. Compared with our method that compiles planning problems to SAT problems, TM-LPSAT and LPG-c use different compilation methods to encode planning problems. TM-LPSAT compiles temporal metric problems with continuous time into linear programs with SAT (LP-SAT) constraints and uses an LP-SAT solver [Wolfman and Weld 1999] to find solutions. LPG-c [Gerevini et al. 2010] introduces a revised representation, temporal action graph (TA-graph) with concurrency, that supports action concurrency. A recent work in [Hu 2007] theoretically studies compilation of temporally expressive problems into a constraint satisfaction formulation. Crikey, Crikey3, POPF, and TFD perform a state-based heuristic search. Crikey and Crikey3 use enforced hill climbing (EHC) followed by best-first search if EHC fails. POPF, built on Crikey3, avoids many of the overheads incurred through splitting actions by using a forward search over a partial-order rather than a total order plans. TFD adds time increments of ǫ > 0 after each action insertion to support this concurrency. Compared with our planner, all these temporally expressive planners cannot optimize action costs. ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:28

Qiang Lu et al.

6.2. Cost sensitive planners

Several pieces of related work focus on optimizing makespans or action costs for classical planning while none of them is capable of handling temporally expressive planning problems. For classical STRIPS planning without durative actions and concurrency, there exist planners that can optimize the total action costs. Most planners minimizing the total action costs use heuristic state space search, such as LAMA [Richter and Westphal 2008; 2010], HSP∗0 , HSP∗F [Haslum 2008], FF(ha ) [Keyder and Geffner 2008], CO-PLAN [Robinson et al. 2008]. Research has been carried out on optimizing the number of actions [Vidal and Geffner 2004; Haslum and Geffner 2000; Bttner and Rintanen 2005; Helmert et al. 2008], which is a special case of optimizing the total action costs. SAT-based classical STRIPS planners have also been extended to optimize the action costs. Three recent representative works, Plan-A [Chen et al. 2008], SATPlan≺ [Giunchiglia and Maratea 2007] and COS-P [Robinson et al. 2010], can find plans that minimize the total action costs for classical planning. Plan-A completely searches the space of the SAT instance translated from a deterministic planning problem to minimize its total costs. SATPlan≺ makes improvements in finding solutions with better total action costs by using OPTSAT [Giunchiglia and Maratea 2006], a tool for solving SAT constrained optimization problems. COS-P modifies the SAT solver RSAT2.02 to create an effective weighted partial Max-SAT procedure (PWM-RSat) for problems where all soft constraints are unit clauses. It uses PWM-RSat to optimize costs at the makespan that it successfully solved the encoded WPMax-SAT problem. Recent temporal planners that aim at optimizing makespan and action costs include MO-GRT [Refanidis et al. 2001] and SAPA [Do and Kambhampati 2003]. MO-GRT extends the heuristic state-space search to temporal planning. SAPA is a domain-independent heuristic forward chaining planner that can handle durative actions, metric resource constraints, and deadline goals. It is designed to deal with multi-objective metrics, such as makespan and action costs. Nevertheless, neither MO-GRT nor SAPA can handle temporally expressive domains. 6.3. SAT solvers

Translating metric planning problems into SAT formulations and calling MinCost SAT or MaxSAT solvers to solve them forms another choice for optimizing action costs [Robinson et al. 2010]. There are several MinCost SAT solvers, such as Scherzo [Coudert 1996], Bsolo [Manquinho and Marques-Silva 2002], Eclipse [Li 2004], and MinCostChaff [Fu and Malik 2006]. MinCostChaff is the first MinCost SAT solver that incorporates many modern SAT techniques [Fu and Malik 2006; Moskewicz et al. 2001]. Experimental results show that MinCostChaff has orders of magnitude of performance improvement over other MinCost SAT solvers. However, compared with Max-SAT solvers, MinCostChaff is slower since its lower bounding function does not perform well [Fu and Malik 2006]. There are several state-of-the-art Max-SAT solvers developed for Max-SAT Competition. One of the best WPMax-SAT solvers is WBO [Manquinho et al. 2009], the winner of weighted partial Max-SAT (industrial track) in the Max-SAT 2010 Competition [Max-SAT 2010]. WBO uses a new unified framework that aggregates and extends Pseudo-Boolean Optimization (PBO) and Maximum Satisfiability (MaxSAT). It proposes a new unsatisfiability-based algorithm which can be orders of magnitude more efficient that existing algorithms. Besides WBO, there are some other efficient Max-SAT solvers, such as SAT4J [SAT4J 2004], MaxSolver [Xing and Zhang 2005], IncWMaxSatz [Darras et al. 2007], W-MaxSatz [Li et al. 2006; 2007; Li et al. 2009], and Clone [Pipatsrisawat and Darwiche 2007]. 7. CONCLUSIONS AND FUTURE WORK

In this paper, we proposed and developed a SAT-based framework for cost sensitive temporally expressive (CSTE) planning, an important but difficult planning domain that is potentially useful for many applications. Our work was motivated by the observations that high action concurrency is a main characteristic of temporally expressive planning problems and that it is often desired to ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:29

optimize action costs. Such high concurrency and cost sensitivity were exemplified by the new P2P communication network domain we introduced in this paper. We proposed a general framework for CSTE planning which translates a CSTE planning problem into a MinCost SAT problem, an optimization problem with SAT-clause constraints. Based on a basic translation scheme SCP, we introduced a new translation scheme SCP2 which leveraged and enhanced the basic scheme based on a relaxed parallel planning semantics. Furthermore, two approaches for solving MinCost SAT problems were developed under this framework. One is to use a generic WPMax-SAT solver, and the other uses the branch-and-bound technique based on the CDCL algorithm. We developed a planning-specific scheme, which is a heuristic-cost based scheme for variable branching, to improve the branch-and-bound procedure. Our experimental results in several CSTE domains showed that our solver based on the enhanced encoding was able to find solutions with the minimum makespans and low total action costs. Compared with the basic translation scheme SCP, our new translation scheme SCP2 could solve problems faster and find solutions with shorter makespans. We showed that the proposed variable branching scheme is very effective in improving search efficiency. The proposed planners compare favorably against existing temporally expressive planners. Compared with heuristic search based approaches, our framework, as an anytime algorithm, has at least three advantages. The first advantage is its flexibility. Given more time, our new planner can always search for better solutions with lower total action costs at the optimal makespan. Second, we can leverage the extensive research on SAT. We can always expect performance enhancements, by adapting more efficient WPMax-SAT or MinCost SAT solvers. In addition, we can easily utilize additional SAT instance evaluation strategies [Rintanen et al. 2006] to further improve the overall performance. The third advantage is its capability in handling high concurrencies. Our results showed that, for those problems with high concurrencies, our framework has clear advantages over the existing temporally expressive planners. It is a continuing effort to advance the state-of-the-art of temporally expressive planning. While we developed an efficient CSTE planner with encouraging experimental results in several problem domains, our work can be further extended. One limitation of our current method is that it is not expressive enough for numerical constraints and other complex PDDL2.1 properties. For example, in light of POPF, the major limitation of our current implementation is that it does not support variable action durations. To fully support PDDL2.1 and to take advantage of SAT-based planning techniques, we plan to enhance our planner to handle more complex temporal constraints and other richer semantics. Another interesting future work is to extend our encoding to handle problems with large action durations. For a temporal planning problem with very large action durations, our planner may create planning graphs at very large makespans. Thus, the encoded MinCost SAT instance may be too large to be solved efficiently. We plan to extend our encoding to address this issue, through ideas such as merging multiple steps into one or encoding partial-order logic of actions. We will also study how to integrate some other reduction methods, such as partial-order reduction [Chen and Yao 2009; Chen et al. 2009], into our approach. ACKNOWLEDGMENTS

We thank Miguel Ram´ırez for the suggestion of applying WPMax-SAT solvers to MinCost SAT problems. We also thank the anonymous reviewers of our previous submission for pointing out the limitations of SCP encoding. This research was supported by China Scholarship Council, National Natural Science Foundation of China (No.61033009), NSF grants DBI-0743797, IIS-0713109, CNS 1017701, and Microsoft Research New Faculty Fellowship. REFERENCES A LSINET, T., M ANYA , F., AND P LANES , J. 2003. Improved branch and bound algorithms for Max-SAT and weighted Max-SAT. In Proc. of the Catalan Conference on Artificial Intelligence.

ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:30

Qiang Lu et al.

BACCHUS , F. AND A DY, M. 2001. Planning with resources and concurrency: A forward chaining approach. In Proc. of the International Joint Conference on Artificial Intelligence. 417–424. B HATTACHARYA , A. A. AND G HOSH , S. 2007. Self-optimizing peer-to-peer networks with selfish processes. In Proc. of the International Conference on Self-Adaptive and Self-Organizing Systems. 340–343. B LUM , A. AND F URST, M. L. 1997. Fast planning through planning graph analysis. Artificial Intelligence 90, 1-2, 281–300. B TTNER , M. AND R INTANEN , J. 2005. Satisfiability planning with constraints on the number of actions. In Proc. of the International Conference on Automated Planning and Scheduling. 292–299. C ARMAN , M., S ERAFINI , L., AND T RAVERSO , P. 2003. Web service composition as planning. In Proc. of the Workshop on Planning for Web Services, International Conference on Automated Planning and Scheduling. C HEN , J. AND YANG , Y. 2010. Localising temporal constraints in scientific workflows. Journal of Computer and System Sciences 76, 6, 464–474. C HEN , Y., LV, Q., AND H UANG , R. 2008. Plan-A: A cost-optimal planner based on SAT-constrained optimization. In Proc. of the Workshop on International Planning Competition, International Conference on Automated Planning and Scheduling. C HEN , Y., X U , Y., AND YAO , G. 2009. Stratified Planning. In Proc. of the International Joint Conference on Artificial Intelligence. 1665–1670. C HEN , Y. AND YAO , G. 2009. Completeness and optimality preserving reduction for planning. In Proc. of the International Joint Conference on Artificial Intelligence. 1659–1664. C OLES , A., F OX , M., H ALSEY, K., L ONG , D., AND S MITH , A. 2009. Managing concurrency in temporal planning using planner-scheduler interaction. Artificial Intelligence 173, 1, 1–44. C OLES , A., F OX , M., L ONG , D., AND S MITH , A. 2008. Planning with problems requiring temporal coordination. In Proc. of the National Conference on Artificial Intelligence. 892–897. C OLES , A. J., C OLES , A., F OX , M., AND L ONG , D. 2010. Forward-chaining partial-order planning. In Proc. of the International Conference on Automated Planning and Scheduling. 42–49. C OUDERT, O. 1996. On solving covering problems. In Proc. of the Annual ACM IEEE Design Automation Conference. 197–202. C USHING , W., K AMBHAMPATI , S., M AUSAM, AND W ELD , D. S. 2007. When is temporal planning really temporal? In Proc. of the International Joint Conference on Artificial Intelligence. 1852–1859. C USHING , W., K AMBHAMPATI , S., AND TALAMADUPULA , K. 2007. Evaluating temporal planning domains. In Proc. of the International Conference on Automated Planning and Scheduling. DARRAS , S., D EQUEN , G., D EVENDEVILLE , L., AND L I , C.-M. 2007. On inconsistent clause-subsets for Max-SAT solving. In Proc. of the International Conference on Principles and Practice of Constraint Programming. 225–240. DAVIS , M., L OGEMANN , G., AND L OVELAND , D. 1962. A machine program for theorem-proving. Communications of the ACM 5, 394–397. D IMOPOULOS , Y., N EBEL , B., AND KOEHLER , J. 1997. Encoding planning problems in nonmonotonic logic programs. In Proc. of the European Conference on Planning. 169–181. D O , M. B. AND K AMBHAMPATI , S. 2003. SAPA: A multi-objective metric temporal planner. Journal of Artificial Intelligence Research 20, 155–194. D O , M. B., RUML , W., AND Z HOU , R. 2008. Planning for modular printers: Beyond productivity. In Proc. of the International Conference on Automated Planning and Scheduling. ¨ E E´ N , N. AND S ORENSSON , N. 2003. An extensible SAT-solver. In Proc. of the International Conference on Theory and Applications of Satisfiability Testing. 502–518. ¨ ¨ E YERICH , P., M ATTM ULLER , R., AND R OGER , G. 2009. Using the context-enhanced additive heuristic for temporal and numeric planning. In Proc. of the International Conference on Automated Planning and Scheduling. 130–137. F OX , M. AND L ONG , D. 2003. PDDL2.1: An extension to PDDL for expressing temporal planning domains. Journal of Artificial Intelligence Research 20, 61–124. F U , Z. AND M ALIK , S. 2006. Solving the minimum-cost satisfiability problem using SAT based branch-and-bound search. In Proc. of the International Conference on Computer-Aided Design. 852–859. G EREVINI , A., S AETTI , A., AND S ERINA , I. 2008. An approach to efficient planning with numerical fluents and multicriteria plan quality. Artificial Intelligence 172, 8-9, 899–944. G EREVINI , A., S AETTI , A., AND S ERINA , I. 2010. Temporal planning with problems requiring concurrency through action graphs and local search. Proc. of the International Conference on Automated Planning and Scheduling. G EREVINI , A. AND S ERINA , I. 2002. LPG: A planner based on local search for planning graphs with action costs. In Proc. of the International Conference on AI Planning and Scheduling. 13–22. G IUNCHIGLIA , E. AND M ARATEA , M. 2006. OPTSAT: A tool for solving SAT related optimization problems. In Proc. of the European Conference on Logics in Artificial Intelligence. Vol. 4160. 485–489.

ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

A:31

G IUNCHIGLIA , E. AND M ARATEA , M. 2007. SAT-based planning with minimal-♯actions plans and ”soft” goals. In Proc. of the Advances in Artificial Intelligence. Vol. 4733. 422–433. H ASLUM , P. 2008. Additive and reversed relaxed reachability heuristics revisited. In Proc. of the Workshop on International Planning Competition, International Conference on Automated Planning and Scheduling. H ASLUM , P. AND G EFFNER , H. 2000. Admissible heuristics for optimal planning. In Proc. of the International Conference on Automated Planning and Scheduling. 140–149. H ASLUM , P. AND G EFFNER , H. 2001. Heuristic planning with time and resources. In Proc. of the Workshop on Planning with Resources, International Joint Conference on Artificial Intelligence. H ELMERT, M., H ASLUM , P., AND H OFFMANN , J. 2008. Explicit-state abstraction: A new method for generating heuristic functions. In Proc. of the National Conference on Artificial Intelligence. 1547–1550. H OFFMANN , J. AND N EBEL , B. 2001. The FF planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research 14, 253–302. H U , Y. 2007. Temporally-expressive planning as constraint satisfaction problems. In Proc. of the International Conference on Automated Planning and Scheduling. 192–199. H UANG , R., C HEN , Y., AND Z HANG , W. 2009. An optimal temporally expressive planner: Initial results and application to P2P network optimization. In Proc. of the International Conference on Automated Planning and Scheduling. H UANG , R., C HEN , Y., AND Z HANG , W. 2010. A novel transition based encoding scheme for planning as satisfiability. In Proc. of the National Conference on Artificial Intelligence. IPC. 2002. The third international planning competition. http://planning.cis.strath.ac.uk/competition/ . IPC. 2008. The sixth international planning competition. http://ipc.informatik.uni-freiburg.de/ . K AUTZ , H. 2004. SATPlan04: Planning as satisfiability. In Proc. of the Workshop on International Planning Competition, International Conference on Automated Planning and Scheduling. K AUTZ , H. AND S ELMAN , B. 1992. Planning as satisfiability. In Proc. of the European Conference on Artificial Intelligence. K AUTZ , H. AND S ELMAN , B. 1996. Pushing the envelope: Planning, propositional logic, and stochastic search. In Proc. of the National Conference on Artificial Intelligence. 1194–1201. K EYDER , E. AND G EFFNER , H. 2008. The FF(ha ) planner for planning with action costs. In Proc. of the Workshop on International Planning Competition, International Conference on Automated Planning and Scheduling. ¨ , J. AND M AGNUSSON , M. 2003. TALplanner in the third international planning competition: Extensions K VARNSTR OM and control rules. Journal of Artificial Intelligence Research 20, 343–377. L ARROSA , J., N IEUWENHUIS , R., O LIVERAS , A., AND RODR´I GUEZ -C ARBONELL , E. 2009. Branch and bound for boolean optimization and the generation of optimality certificates. In Proc. of the International Conference on Theory and Applications of Satisfiability Testing. 453–466. L I , C. M., M ANYA , F., M OHAMEDOU , N., AND P LANES , J. 2009. Exploiting cycle structures in Max-SAT. In Proc. of the International Conference on Theory and Applications of Satisfiability Testing. 467–480. L I , C. M., M ANYA , F., AND P LANES , J. 2006. Detecting disjoint inconsistent subformulas for computing lower bounds for Max-SAT. In Proc. of the National Conference on Artificial Intelligence. 86–91. L I , C. M., M ANYA , F., AND P LANES , J. 2007. New inference rules for Max-SAT. Journal of Artificial Intelligence Research 30, 321–359. L I , X. Y. 2004. Optimization algorithms for the minimum-cost satisfiability problem. PhD Thesis, Department of Computer Science, North Carolina State University, North Carolina. L ONG , D. AND F OX , M. 2003. Exploiting a graphplan framework in temporal planning. In Proc. of the International Conference on Automated Planning and Scheduling. 52–61. M ANQUINHO , V., M ARQUES -S ILVA , J., AND P LANES , J. 2009. Algorithms for weighted boolean optimization. In Proc. of the International Conference on Theory and Applications of Satisfiability Testing. 495–508. M ANQUINHO , V. M. AND M ARQUES -S ILVA , J. P. 2002. Search pruning techniques in SAT-based branch-and-bound algorithms for the binate covering problem. IEEE Transations on Computer-Sided Design of Intergrated Circuits and Systems 21, 505–516. M ARQUES -S ILVA , J. P., S ILVA , J. P. M., S AKALLAH , K. A., AND S AKALLAH , K. A. 1996. GRASP - a new search algorithm for satisfiability. In Proc. of the International Conference on Computer-Aided Design. 220–227. M AX -SAT. 2010. The Sixth Max-SAT Evaluation. http://www.maxsat.udl.cat/10/ . M ITCHELL , D. G. 2005. A SAT solver primer. Bulletin of The European Association for Theoretical Computer Science 85, 112–132. M OSKEWICZ , M. W., M ADIGAN , C. F., Z HAO , Y., Z HANG , L., AND M ALIK , S. 2001. Chaff: Engineering an efficient SAT solver. In Proc. of the Annual ACM IEEE Design Automation Conference. ACM, 530–535. P ENBERTHY, J. S. AND W ELD , D. S. 1994. Temporal planning with continuous change. In Proc. of the National Conference on Artificial Intelligence. 1010–1015.

ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A:32

Qiang Lu et al.

P IPATSRISAWAT, K. AND DARWICHE , A. 2007. Clone: solving weighted Max-SAT in a reduced search space. In Proc. of the Australian Conference on Artificial Intelligence. 223–233. P LANES , J. 2003. Improved branch and bound algorithms for Max-2-SAT and weighted Max-2-SAT. In Proc. of the International Conference on Principles and Practice of Constraint Programming. 991. R AO , J. AND S U , X. 2004. A survey of automated web service composition methods. In Proc. of the International Workshop on Semantic Web Services and Web Process Composition. 43–54. R EFANIDIS , I., , R EFANIDIS , I., AND V LAHAVAS , I. 2001. A framework for multi-criteria plan evaluation in heuristic state-space planning. In Proc. of the Workshop on Planning with Resources, International Joint Conference on Artificial Intelligence. R ICHTER , S. AND W ESTPHAL , M. 2008. The LAMA planner: Using landmark counting in heuristic search. In Proc. of the Workshop on International Planning Competition, International Conference on Automated Planning and Scheduling. R ICHTER , S. AND W ESTPHAL , M. 2010. The LAMA planner: Guiding cost-based anytime planning with landmarks. Journal of Artificial Intelligence Research. R INTANEN , J. 2007. Complexity of concurrent temporal planning. In Proc. of the National Conference on Artificial Intelligence. R INTANEN , J., H ELJANKO , K., AND N IEMEL A¨ , I. 2006. Planning as satisfiability: parallel plans and algorithms for plan search. Artificial Intelligence 170, 12-13, 1031–1080. ROBINSON , N., G RETTON , C., NGHIA P HAM , D., AND S ATTAR , A. 2008. A compact and efficient SAT encoding for planning. In Proc. of the International Conference on Automated Planning and Scheduling. 296–303. ROBINSON , N., G RETTON , C., AND P HAM , D. N. 2008. CO-PLAN: Combining SAT-based planning with forward-search. In Proc. of the Workshop on International Planning Competition, International Conference on Automated Planning and Scheduling. ROBINSON , N., G RETTON , C., P HAM , D. N., AND S ATTAR , A. 2010. Partial weighted MaxSAT for optimal planning. In Pacific Rim International Conference on Artificial Intelligence. 231–243. SAT4J. 2004. http://www.sat4j.org/. S HIN , J. AND DAVIS , E. 2004. Continuous time in a SAT-based planner. In Proc. of the National Conference on Artificial Intelligence. 531–536. S MITH , D. E. 1999. Temporal planning with mutual exclusion reasoning. In Proc. of the International Joint Conference on Artificial Intelligence. 326–337. S MITH , D. E. 2003. The case for durative actions: A commentary on PDDL2.1. Journal of Artificial Intelligence Research 20, 149–154. V IDAL , V. AND G EFFNER , H. 2004. CPT: An optimal temporal POCL planner based on constraint programming. In Proc. of the Workshop on International Planning Competition, International Conference on Automated Planning and Scheduling. 59–60. V IDAL , V. AND G EFFNER , H. 2006. Branching and pruning: An optimal temporal POCL planner based on constraint programming. Artificial Intelligence 170, 298–335. WAH , B. W. AND C HEN , Y. 2006. Constraint partitioning in penalty formulations for solving temporal planning problems. Artificial Intelligence 170, 187–231. W OLFMAN , S. A. AND W ELD , D. S. 1999. The LPSAT engine and its application to resource planning. In Proc. of the International Joint Conference on Artificial Intelligence. 310–316. X ING , Z. AND Z HANG , W. 2005. MaxSolver: An efficient exact algorithm for (weighted) maximum satisfiability. Artificial Intelligence 164, 47–80. YANG , Q., W U , K., AND J IANG , Y. 2007. Learning action models from plan examples using weighted MAX-SAT. Artificial Intelligence 171, 107–143. YOUNES , H. L. S. AND S IMMONS , R. G. 2003. VHPOP: Versatile heuristic partial order planner. Journal of Artificial Intelligence Research 20, 405–430. Z HANG , L., M ADIGAN , C. F., M OSKEWICZ , M. H., AND M ALIK , S. 2001. Efficient conflict driven learning in a boolean satisfiability solver. In Proc. of the International Conference on Computer-Aided Design. 279–285. Z HANG , L. AND M ALIK , S. 2002. The quest for efficient boolean satisfiability solvers. In Proc. of the International Conference on Computer-Aided Verification. 17–36. Z HUO , H. H., YANG , Q., H U , D. H., AND L I , L. 2010. Learning complex action models with quantifiers and logical implications. Artificial Intelligence 174, 1540–1569.

ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

Online Appendix to: A SAT-based Approach to Cost Sensitive Temporally Expressive Planning QIANG LU1,2,† , RUOYUN HUANG2,† , YIXIN CHEN2,§ , YOU XU2 , WEIXIONG ZHANG2 , and GUOLIANG CHEN1 1 University of Science and Technology of China 2 Washington University in St. Louis

Appendix A. CONVERT MINCOST SAT TO WEIGHTED PARTIAL MAX-SAT

We introduce how to convert a MinCost SAT problem into a WPMax-SAT instance in this appendix. Given a MinCost SAT instance Φc = (V, C, µ), we construct a WPMax-SAT instance Φa = (V, C, C s , w). The hard clause set is equivalent to the clause set in the original MinCost SAT problem. The soft clause set C s is constructed as: C s = {¬x|x ∈ V }. For each clause c ∈ C s , its weight is consequently defined as: w(c) = µ(x), where c = ¬x. According to Definition 3.2, given a variable assignment ψ, the objective function of the WPMaxSAT instance is: X weight(ψ) = w(c)vψ (c) ∀c∈C s

=

X

µ(x)vψ (¬x) (∵ C s = {¬x|x ∈ V })

∀x∈V

=

X

µ(x)(1 − vψ (x))

∀x∈V

=

X

∀x∈V

=

X

µ(x) −

X

µ(x)vψ (x)

∀x∈V

µ(x) − cost(ψ).

∀x∈V

Hence, maximizing weight(ψ) is equivalent to minimizing cost(ψ). Since cost(ψ) is the objective function of the MinCost SAT problem Φc , solving a MinCost SAT problem is equivalent to solving the constructed WPMax-SAT problem Φa . For example, given a MinCost SAT problem Φc = (V, C, µ) as follows: Clause C : x1 ∨ x2 x2 ∨ ¬x3

µ(x) µ(x1 ) = 5 µ(x2 ) = 10 µ(x3 ) = 7

c YYYY ACM 0000-0003/YYYY/01-ARTA $10.00

DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

App–2

Qiang Lu et al.

The corresponding WPMax-SAT problem Φa = (V, C h , C s , w) is: Clause

w(c)

h

C : x1 ∨ x2 x2 ∨ ¬x3 C s : ¬x1 ¬x2 ¬x3

∞ ∞ 5 10 7

The optimal solution to Φc is vψ (x1 ) = 1, vψ (x2 ) = 0, vψ (x3 ) = 0, and the objective function cost(ψ) = 5. The optimal solution to Φa is the same ψ, and weight(ψ) = 17. By solving any of the two problems, we have the solution to the other problem. Appendix B. EXPERIMENTAL RESULTS OF TOTAL COSTS AND SOLVING TIME

Here we present the detailed results, in terms of both total action costs and problem solving time. Under the SAT based framework for CSTE planning, we compare the results of solvers with different strategies, including basic BB-CDCL algorithm (SCP 2bb b ), BB-CDCL with our new variable branching scheme (SCP 2bbh ), and Max-SAT solver WBO (SCP 2max ). Since one may b b concern about the total action costs of solutions found by other temporal expressive planners (Crikey, Crikey3, POPF, LPG-c, and TFD), we also list the cost results of these solvers in this section. ‘T’ and ‘C’ are running time and total action costs of solutions, respectively. Table VII. Comparison of time and total action costs of different algorithms in the P2P domain. P 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Σ

Crikey3 C 4110 5190 7340 12110 20190 30290 Time Out 35340 Time Out Time Out Time Out Time Out Time Out Time Out n/a

POPF C 520.0 700.0 1050.0 1320.0 2200.0 3300.0 3960.0 3850.0 4620.0 3500.0 2700.0 3600.0 4500.0 5400.0 41220.0

SCP 2bb b T C 0.7 2230.0 3.5 5400.0 24.5 9740.0 1780.4 1700.0 923.3 3270.0 6.1 5220.0 15.9 8190.0 58.0 6440.0 66.4 8710.0 125.4 5980.0 25.8 4950.0 198.1 6240.0 831.1 8440.0 1420.4 10050.0 5479.6 86560.0

SCP 2bbh b T C 0.7 830.0 3.6 1220.0 22.1 1910.0 2.3 1360.0 1.7 2700.0 2.8 4030.0 17.3 7820.0 50.1 4240.0 58.4 5670.0 96.4 3710.0 29.0 2800.0 26.5 3700.0 479.9 4600.0 530.0 10350.0 1320.9 54940.0

SCP 2max b T C 1.1 720.0 14.2 1110.0 23.3 2140.0 7.8 1320.0 75.1 2310.0 130.0 3730.0 11.8 8050.0 154.0 4450.0 113.0 5580.0 195.9 3610.0 25.1 2800.0 38.9 3600.0 1186.6 8850.0 454.7 10200.0 2431.6 58470.0

ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.

A SAT-based Approach to Cost Sensitive Temporally Expressive Planning

App–3

Table VIII. Comparison of time and total action costs of different algorithms in the Matchlift domain. Crikey C 332 150 352 452 644 906 804 854 644 936 1005 7079

P 1 2 3 4 5 6 7 8 9 10 11 Σ

Crikey3 C 312 130 332 432 524 584 684 734 624 714 684 5754

POPF C 332.0 110.0 564.0 552.0 946.0 966.0 1106.0 964.0 1546.0 834.0 1116.0 9036.0

LPG-c C 352 160 n/a

TFD C 332 110 352 452 644 1096 1008 1154 1144 734 804 7830

SCP 2bb b T C 0.8 332.0 1.2 110.0 21.8 456.0 6.3 452.0 557.5 1146.0 1674.5 568.0 Time Out Time Out 125.7 2244.0 Time Out Time Out n/a

SCP 2bbh b T C 0.9 332.0 1.2 110.0 19.9 352.0 5.6 452.0 548.7 644.0 1001.1 566.0 Time Out Time Out 1743.0 1946.0 Time Out Time Out n/a

SCP 2max b T C 0.7 332.0 1.1 110.0 35.7 352.0 5.6 452.0 361.7 1646.0 554.0 1918.0 Time Out Time Out 1727.6 2164.0 Time Out Time Out n/a

Table IX. Comparison of time and total action costs of different algorithms in the Matchlift-Variant domain. P 1 2 3 4 5 6 7 8 9 10 11 12 Σ

Crikey C 220 374 392 444 482 464 290 Time Out Time Out Time Out Time Out n/a

Crikey3 C 200 334 372 424 462 444 Time Out Time Out Time Out Time Out Time Out Time Out n/a

POPF C 220.0 474.0 403.0 444.0 594.0 Time Out 1116.0 Time Out 876.0 Time Out Time Out Time Out n/a

LPG-c C 354 354 362 n/a

TFD C 414 492 734 n/a

SCP 2bb b T C 1.4 220.0 0.7 254.0 41.0 396.0 633.2 353.0 651.2 482.0 1195.3 362.0 510.5 646.0 73.5 522.0 1042.8 780.0 211.7 916.0 225.5 816.0 615.0 358.0 5201.8 6105.0

SCP 2bbh b T C 1.1 220.0 0.7 254.0 8.3 392.0 68.8 342.0 220.6 482.0 22.2 562.0 21.1 396.0 18.9 290.0 692.0 534.0 296.2 416.0 178.4 476.0 2.2 358.0 1530.5 4722.0

SCP 2max b T C 1.2 220.0 0.7 254.0 6.4 392.0 45.7 342.0 168.7 482.0 34.5 362.0 18.5 396.0 21.5 290.0 399.6 554.0 209.2 416.0 186.6 476.0 2.1 358.0 1094.7 4542.0

Table X. Comparison of time and total action costs of different algorithms in the Driverslogshift domain. P 1 2 3 4 5 6 7 8 9 Σ

Crikey C 2850 1450 4450 3950 3900 2700 6400 7500 6900 40100

Crikey3 C 2600 1400 3200 3700 3600 2500 4700 5050 5050 31800

POPF C 3250.0 1450.0 3250.0 3500.0 3950.0 2800.0 5000.0 5100.0 5100.0 33400.0

LPG-c C 4650 2750 4900 4600 4100 4300 5000 5150 4600 40050

SCP 2bb b T C 6.7 3700.0 32.1 13200.0 83.7 18100.0 70.7 17550.0 20.5 5500.0 2.5 7300.0 787.9 27800.0 573.2 22300.0 798.5 23750.0 2375.8 139200.0

SCP 2bbh b T C 3.7 2800.0 12.3 2650.0 42.5 3250.0 30.8 17200.0 2.8 3950.0 2.7 3700.0 114.8 5200.0 376.5 7300.0 110.8 6350.0 697.1 52400.0

SCP 2max b T C 6.8 2700.0 98.0 1450.0 73.5 11750.0 80.0 10100.0 38.5 3950.0 5.2 3700.0 356.0 19600.0 464.5 25050.0 658.2 21650.0 1780.5 99950.0

ACM Transactions on Intelligent Systems and Technology, Vol. V, No. N, Article A, Publication date: January YYYY.