Integrating Temporal Logics and Model Checking ... - CiteSeerX

Integrating Temporal Logics and Model Checking Algorithms Teodor Rus and Eric Van Wyk Department of Computer Science The University of Iowa Iowa City, Iowa USA rus,[email protected] December 6, 1996 Abstract

Temporal logic and model checking algorithms are currently used as system veri cation techniques in various programming environments that include real-time and parallel programs. The diversity of systems and environments imply a diversity of logics and algorithms. But there are no tools to aid the logician or practitioner in the experimentation with dierent varieties of temporal logics and model checkers. Such tools could give users the ability to modify and extend a temporal logic and model checker as the problem domain changes. We have developed a set of tools that provide these capabilities by placing the model checking problem in an algebraic framework. Thus, our methodology provides a \temporal logic test bed" that allows for quick prototyping and easy extension to logics and model checkers. Our tools generate a model checker algorithm as an algebraic mapping (an embedding of one algebra into another algebra by derived operations) with the temporal logic as the source algebra and the sets of states of a model as the target algebra. We demonstrate these tools by extending CTL and its model checker to allow the formulas to quantify the paths over which the satisfaction of the temporal operators is de ned by allowing propositions to label the edges as well as the nodes in the model. We use this logic and its model checker to analyze program process graphs used during the parallelization phase of an algebraic compiler.

1 Introduction There is currently much research in the area of temporal logic and model checking [CES86]. It seems only reasonable that logicians and practitioners should have tools that allow them to experiment with various temporal logic notations and semantics. However, there are no tools that provide this capability. We propose a methodology which allows users to easily implement and modify temporal logics and model checking algorithms thus allowing users to 1

change and extend the notation and semantics of their own temporal logic. This methodology allows us to generate tools that foster an interactive experimentation with temporal logic. We demonstrate this by extending CTL and its model checker to allow the quanti cation of paths for the temporal operators based on propositions labeling the edges in the model. This model checker is used by a parallelizing algebraic compiler to analyze program processes graphs of a parallelized program. Model checking is a formal veri cation technique used to ensure that some system, be it a concurrent program or representation of a physical system, satis es a given speci cation. The concurrent program or other system is represented by a model that describes how the state of the program or system changes in time. A model, (or Kripke structure) M = is a directed graph with a nite set of states S , a nite set of edges E , and a proposition labeling function P which maps atomic propositions from the set AP to the set of states in S on which those propositions are true. We use the notation E (s); s 2 S to denote the set of successors of s in S . The speci cation is written as a temporal logic formula over the propositions labeling the model. Model checking is the problem of nding on which states, s, in a model M a formula f written in temporal logic is satis ed. In this paper we examine CTL, Computational Tree Logic, a branching time temporal logic commonly used for veri cation. CTL was developed by Emerson, Clarke and Sistla in [CES86] where they also develop a model checker algorithm. S

CTL formulas are de ned by the following rules: 1. true, false and any atomic proposition ap 2 AP are CTL formulas. 2. if f1 and f2 are CTL formulas, so are :f1 , f1 _ f2 , and f1 ^ f2 . 3. if f1 and f2 are CTL formulas, so are AXf1 , EXf1 , A[f1 Uf2 ], and E [f1Uf2 ]. The temporal operators of CTL are X (next-time) and U (until). The formula AXf1 (EXf1) is satis ed on a state if all (one of more) successors satisfy f1. The formula A[f1 Uf2 ] (E [f1 Uf2 ] ) is satis ed on a state if on all (one or more) paths beginning on this state there is a state on which f2 holds and f1 hold on all states before this state. The formal satisfaction rules to determine if a state s in a model M satis es a formula f , denoted M; s j= f or s j= f if M is assumed, are given below: s j= ap i s 2 P (ap) s j= :f i not s j= f s j= f1 ^ f2 i s j= f1 and s j= f2 s j= EXf e g f1 i 9t 2 S; (s; t) 2 E ^ t j= f1 s j= AXf e g f1 i 8t 2 S; (s; t) 2 E ) t j= f1 s j= A[f1 U f2 ] i 8 paths (s0 ; s1; s2; : : :); s = s0 and 9i[i 0 ^ s j= f2 ^ 8j [0 j < i ) s j= f1 ]] s j= E [f1 U f2 ] i 9 a path (s0; s1; s2; : : :), s = s0 and 9i[i 0 ^ s j= f2 ^ 8j [0 j < i ) s j= f1 ]] f

f

i

j

i

j

2

A traditional implementation of a model checking algorithm for CTL is not easily extended to more general versions of CTL. Code would have to be re-written. To illustrate our methodology we extend CTL so that propositions labeling the edges of the model can be used to quantify the paths examined in determining the satisfaction of CTL formulas. We call this logic CTLe. Note that this extended CTL is developed only to show how simply a logic and model checker can be extended, not to propose it as a new temporal logic. However, this extension of CTL with path-formulas has important applications in reasoning about dynamic systems (such as processes in parallel programs). We use this logic to reason about the parallel processes discovered in a sequential program by a parallelizing algebraic compiler. To label the edges of a model with propositions, we extend the model to M = , where S and E are as before, P maps state atomic propositions in AP to the set of states on which they hold and P maps atomic edge propositions in AP to the set of edges on which they hold. s

S

s

e

E

e

s

s

e

e

To allow path quanti cation, we extend the syntax and semantics of CTL. To create CTLe we de ne path formulas to be non-temporal formulas over the edge propositions constructed by the following rules. 10. true, false and any atomic edge proposition ap 2 AP are CTLe path formulas. 20. if f1 and f2 are CTL path formulas, so are :f1 , f1 _ f2, and f1 ^ f2. If an edge (s; t) 2 E satis es a path formula f for a model M we write M; (s; t) j= f or (s; t) j= f . Satisfaction of path formulas is de ned below: (s; t) j= ap i (s; t) 2 P (ap ) (s; t) j= :f i not (s; t) j= f (s; t) j= f1 ^ f2 i (s; t) j= f1 and (s; t) j= f2 e

e

e

e

e

In this extension, we also allow the temporal operators to be subscripted with path formulas to specify the conditions which must be met by the paths. Thus we add the following CTL syntax rule. 30. if f1 and f2 are CTLe formulas, and f is a CTLe path formula, then AXf eg f1, EXf e gf1 , A[f1 Uf e gf2 ], and E [f1 Uf e gf2 ] are also CTLe formulas. The formula AXf eg f1 (EXf e gf1 ) is satis ed on a state if all (one or more) successors satisfy f1 and the path to these successors satisfy the path formula f . The formula A[f1 Uf e gf2 ] ( E [f1Uf e g f2] ) is satis ed on a state if on all (one or more) paths beginning on this state there is a state on which f2 holds, f1 holds on all states before this state, and each edge in the path before the state on which f2 holds satisfy f . e

f

f

f

f

f

f

e

f

e

The satisfaction rules of these CTLe formulas are given by: 3

f

s j= EXf e gf1 i 9t 2 S; (s; t) 2 E ^ ((s; t) j= f ^ t j= f1) s j= AXf e gf1 i 8t 2 S; (s; t) 2 E ) ((s; t) j= f ^ t j= f1 ) s j= A[f1 Uf e g f2 ] i 8 paths (s0; s1; s2; : : :); s = s0 and 9i[i 0 ^ s j= f2 ^ 8j [0 j < i ) (s j= f1 ^ (s ; s +1) j= f )]] s j= E [f1 Uf e g f2 ] i 9 a path (s0; s1; s2; : : :), s = s0 and 9i[i 0 ^ s j= f2 ^ 8j [0 j < i ) (s j= f1 ^ (s ; s +1) j= f )]] e

f

f

e

f

i

j

j

j

e

i

j

j

j

e

f

The remainder of this paper describes the algebraic speci cation of CTL, how the algebraic methodology is used to extend CTL to CTLe, and how we use CTLe to examine abstract process graphs representing programs in parallelizing compilers. Section 2 describes the structure of our abstract process graphs and give some CTLe formulas that provide insight into the structure of generated parallel programs. Section 3 gives the algebraic speci cation of CTL. Section 4 presents the algebraic temporal logic and model checker integration techniques using CTL as an example. Section 5 shows how, in this algebraic framework, CTL and its model checker can be extended to CTLe.

2 A program abstraction model An abstract process graph is a directed graph in which nodes represent computations found in a source program by a parallelizing compiler and edges represent control and data dependency relationships between the computations. Our abstract process graph is not a traditional \control ow graph" whose edges direct the ow of control from one computation to the next. The nodes of an abstract process graph represent sequential processes, i.e., stand alone computations; the edges of an abstract process graph represent minimal restrictions on the execution order of the computations represented by the nodes required to ensure a correct execution of the computation represented by the graph. Thus, the edges are a set of execution order requirements that a scheduler must honor. To determine how a source program is broken into the pieces that are represented by the nodes of an abstract process graph we de ne units of computation to be types of computations that the programmer chooses to be executed sequentially. Thus, constructs in the source program that are not units of computation are either too lightweight to stand as individual processes and must be combined with other program constructs, or have as their components units of computation and are therefore represented as graphs whose nodes are units of computation and edges are control and data dependencies. For example, if assignment statements are units of computation, then an assignment statement in a source program will be represented as a node, whereas the expression on its right hand side is too lightweight to be a process; a sequence of assignment statements is represented as a collection of nodes each representing the individual assignment statements. Since computations are represented by language constructs, the rules specifying valid language constructs allow us to provide the following formal de nition of the unit of computation: a unit of computation is any valid construct recognized by one of the rules marked by the programmer as specifying units 4

of computation. That is, the programmer speci es which rules generate units of computation, and the compiler in turn generates sequential code from any construct recognized by these rules. By allowing the programmer to specify which constructs will be de ned as units of computation the programmer can control the granularity of the processes executing the parallel program generated by the compiler. Thus, the programmer can monitor program performance at the source language level.

Once a computation from the source program has been identi ed as a unit of computation, it is represented as a node in the abstract process graph representation. A node is a tuple where: Types is the set of types of the variables and constants used in the computation; State is the tuple , where V is the set of variables in the computation; Transition: ! 0 implements the computation by mapping , the values of the the variables V before the computation, to 0, the values of the variables in V after the computation. By V and V we denote the sets of variables which are written (i.e.,de ned) and read (i.e., referenced) in the computation v ! v0, respectively. Thus, if assignment statement has been speci ed as a unit of computation, an assignment statement x = i * y where x and y are

oating point variables and i is an integer variable would be represented by the node where x0 , y0 and i0 represent the new values of the variables after the computation. The value of is speci ed at the compilation time using universal constants (that can be chosen to be the type names) and is preserved by program execution. W

R

Each time a graph is constructed, two special nodes, denoted e and x, are created which perform no computation but stand for the entry and the exit points of the computation represented by the graph. Hence, when a unit of computation is discovered and its computation is represented by a single node, say n, the graph for this computation contains three nodes, e, n, x, and two edges, e ! n and n ! x, i.e. this graph has the shape e ! n ! x. The e ! n and n ! x edges are control dependency edges since e initiates the execution of n and n must complete before x. In the construction of graphs from nodes and other graphs we restrict our presentation in this paper to functional, branching, and repetition compositions. These are graph composition operations performed by a parallelizing compiler and are expressed by setting precedence relationships (control and data dependencies) between their component graphs. We construct these relationship as edges between the nodes of the abstract process graph. For that we 5

assume that the language construct denoting a computation obtained by composing units of computation at nodes n1 and n2 de nes a control relationship between n1 and n2 denoted by n1 n2 and use the notation: if n is a node of the abstract process graph then V (n ) is the set of variables written (de ned) by the transition of the computation at n , V (n ) is the set of variables read (referenced) by the transition of the computations at n . Let n1 n2 be the control relation between n1 and n2 . Then we have: 1. The computation represented by the node n2 is data ow dependent on the computation represented by the node n1 , n1 ! n2 , if V (n1 ) \ V (n2 ) 6= ;. 2. The computation represented by the node n2 is data anti dependent on the computation represented by the node n1 , n1 ! n2 , if V (n1 ) \ V (n2 ) 6= ;. 3. The computation represented by the node n2 is data output dependent on the computation represented by the node n1 , n1 ! n2 , if V (n1 ) \ V (n2 ) 6= ;. i

W

i

R

i

i

i

f

W

R

a

R

W

o

W

W

The consistency of the computations at n1 and n2 requires that if n1 ! n2 , d 2 ff; a; og, then an instance of the computation at n1 be execute before an instance of the computation at n2 . In the case of array references, the intersection in the above de nitions involves the analysis of the array index expressions, by solving a set of linear equations derived from the array index expressions or using Banerjee's inequalities [Ban88]. The distance for a data dependency is determined for each loop enclosing both nodes in the dependency and describes the number of iterations of that loop between the instances of the nodes that cause the dependency. For example, in a single loop with index variable I if two nodes both reference the array A[I], then the nodes reference the same array element in the same iteration of the loop and thus the distance is 0. In the case where one node writes to A[I+1] and a second node later in the loop body reads from A[I], then the data ow dependency goes from one iteration to the next iteration and thus has a distance of 1. In the general linear case, the distance may be a function of the index variable and therefore not have a constant distance. This occurs for references like A[I] and A[2*I] where the distance is the same value as the index variable I. The data dependency edges are labeled with the relevant information d 2 ff; a; og and the distance D, if it is constant. d

The simplest graph construction results from the functional composition of two unit of computation nodes, n1 and n2 . This graph will consist of four nodes: e; n1 ; n2, and x, and control dependency edges e ! n1 , e ! n2, n1 ! x, and n2 ! x. Formally, if g1 and g2 are process representation graphs then g1 g2 is a the graph representing the functional composition of the computations represented by g1 and g2 and has the shape in Figure 1. If there is a data dependency ( ow, anti, or output) from g1 to g2 requiring the sequential execution of some component nodes, a data dependency edge is added, g1 ! g2, d 2 ff; a; og, which ensures the correct execution order of the computation at the nodes. If there was no such data dependency, there would be no edge between g1 and g2 , indicating that they could be executed in parallel. The graph representing the functional composition g1 g2 has no control ow edge from g1 to g2 since it is not necessary. The data dependency edges ensure d

6

m

e ?? @@R gm gm 2 1 @@R m ?? x

Figure 1: Sequential composition the correct execution order or there is no data dependency and the computation at the nodes of g1 and g2 can be executed concurrently. This allows us to keep a minimal set of edges representing restrictions on the process execution order. The branch composition of two graphs g1 and g2 and a predicate p is denoted by Br(p; g1; g2) and has the shape in Figure 2, where true and false are control dependencies.

em

? m p true?? @@false Rgm gm 1 2 @@R m ?? x

Figure 2: Branch composition An enumeration loop represents the repeated execution of a computation body with dierent values associated with a variable called the loop index. The operator that constructs a loop from a loop index and a body is denoted by Lp(h[i]; g[i]) where i is the loop index, h[i] is the graph representing the loop index range, and g[i] is the computation performed by the body. A loop can also be seen as the repeated functional composition of the loop body, each repetition running with a new value of the loop index. That is, Lp(h[i]; g[i]) = (h[l] g[l]) : : : (h[u] g[u]) where i 2 [l::u] is the range of values taken by i. The copies of the nodes in the body of the loop could then be labeled with their value of the loop index variable. We could then add all the appropriate data dependency edges between nodes in the loop body instantiations to ensure the correct execution order of the nodes and allow the loop header node to initiate the execution of all iterations at once. Thus all copies of the the loop body could execute in parallel with the appropriate restrictions resulting from the data dependency between them. However, representing loops this way is clearly not practical because the size of the process graph grows exponentially in the depth of the largest loop 7

nest. In addition, because the range of loop index may not be known at compile time, this representation is not always possible. Therefore, the graph Lp(h[i]; g[i]) has the shape in : g represents the instantiation of all iterations of the loop body. We Figure 3 where h ?! also need add data dependency edges to ensure correct execution order of the loop iterations. i l::u

em

?m h ?i : l::u gm ? xm Figure 3: Enumerated repetition composition Our abstract process graphs are signi cantly dierent from most program ow graphs found in optimizing and parallelizing compilers. First, nodes do not represent basic blocks - segments of target code that contain no branching statements. The nodes of an abstract process graph represent source language constructs and their creation is controlled by the programmer by de ning the set of units of computation. This allow the programmer to control the granularity of the sequential processes executing the parallel program. We also aim to provide descriptive data dependency edges so that the control dependence edges implicitly provided by the textual representation of the computation can be marked as irrelevant to the execution order of the statements in the program. Abstract process graphs are, however, very similar to the Kripke structures used in model checking. In fact, by representing some of the information labeling the nodes and edges in the abstraction graph as atomic propositions, the projections of our graphs containing the nodes, edges, and atomic propositions are Kripke structures. Thus, many of the questions that optimizing and parallelizing compilers ask about programs can be represented as temporal logic formulas. These questions can then be answered by a model checking algorithm. The node propositions we are using further and their meaning are listed below: ` each node n has a unique label proposition, ` e label for entry nodes x label for exit nodes unit label for unit of computation nodes func label for functional composition nodes branch label for branch condition nodes loop label for do loop header nodes n

n

8

The edge propositions we are using further and their meaning are listed below: any control dependency true a true branch control dependency false a false branch control dependency loop a loop control dependency d any data dependency f any data ow dependency a any data anti dependency o any data output dependency D0 any constant distance data dependency with distance 0 for a loop header node labeled ` D+ any constant distance data dependency with positive distance for a loop header node labeled ` D? any non constant distance data dependency for a loop header node labeled ` ;`

;`

;`

One of the fundamental questions we can then ask is whether two nodes n1 and n2 can be executed in parallel, or more accurately, can instances of the computations represented by nodes n1 and n2 can be executed in parallel. The required conditions for parallel execution that are given as temporal logic formulas are sucient conditions that in general may be too restrictive. That is, there may be times when a formula for parallelism is not satis ed, but parallel execution is actually possible. For demonstration and correctness reasons, we always err on the side of safety. Two nodes, n1 and n2, not components of any loop or branch statements can be executed in parallel if there is no path labeled with data dependency edges between the two nodes. Thus, n1 and n2 may execute in parallel if n1 j= :E [ true Uf g n2 ] and n2 j= :E [ true Uf g n1 ]. However, it is rarely the case that there are no loops or branch statements in a program. For a slightly more intricate case, consider nodes n1 and n2 enclosed inside a single loop labeled n0 . All iterations of the loop can execute in parallel if there are no data dependencies that cross loop iterations. That is, if there are no edges labeled with positive (D+ 0 ) or unknown distance (D? 0 ) data dependency edges beginning at any node in the body of the loop n0 , then all iterations, and thus all instances of n1 and n2 in dierent iterations, can execute in parallel. This is the case if d

d

;n

;n

n0 j= AXfg(:EXf

D

+;n0 _D?;n0 g true)

We know that n1 and n2 are contained in the single loop beginning at node n0 if

entry j= EXfg (n0 ^ EXfg n1 ^ EXfg n2 ) 9

There are many other more intricate and complicated questions a parallelizing compiler must ask about the structure of the source program being parallelized. Writing CTLe formulas provides a concise and expressive manner to ask these questions. Presented above are only some of the less complex formulas to illustrate the need for the CTLe logic and model checker and thus the need for tools that allow the easy speci cation and generation of these logics and model checkers.

3 Algebraic speci cation of CTL The CTLe temporal logic and its model checker described above are not standard and must be custom written by the parties needing this particular model checker. For this reason, we have developed a set of tools that allow the easy speci cation and implementation of temporal logics and model checkers. These tools implement the model checker by specifying the temporal logic formulas and the sets of states of the model as algebras, and the model checker as an embedding morphism between them that maps a temporal logic formula to the set of states in the model on which it holds. In the algebraic methodology, a problem may be formulated as an expression of an algebra and its solution may be obtained by mapping the expression to another algebra. That is, the process of solving a problem can be viewed as simply calculating the homomorphic image of the problem given in the source algebra into the target algebra, called the problem solution. For example, a problem stated as \x + y z" in the algebra of expressions over variables, where values of the variables x; y; z are x = 6; y = 3; and z = ?9, can be solved by mapping this expression into an algebra of integers. To compute the mapping, we must specify how each operation in the source algebra can be implemented by a target operation in the target algebra. That is, since a source operation operates on zero or more source expressions to create other source algebra expression it does not necessarily have a similar operation in the target algebra. We must de ne, for each source operation, a derived operation in the target algebra that is composed of one or more target algebra operations. The derived operation is chosen such that given target images of the source operation arguments it constructs the target image of the result of the source operation applied on those arguments. The mapping of an expression from the source to the target algebra is then computed by identifying the free generators and operations used to create the source expression and building the target expression by employing the derived target operations associated with each free generator and operation used in the source expression. For example, the expression \x + y x" in the algebra of expressions is constructed from three free generators, or nullary operations: x : ; ! E , y : ; ! E , z : ; ! E , by two binary operations: : E E ! E , and + : E E ! E . Thus, to create the source to target algebra mapping, we must construct derived target operations that implement the source operations in the target algebra. The assignment x = 6; y = 3; and z = ?9 tells 10

us to create constant target operations that generate the values 6, 3, and -9 for the three free-generators \x", \y", and \z". The source operations + : E E ! E and : E E ! E are implemented in the target algebra as integer addition and multiplication. Thus, the embedding morphism identi es the generators x; y; z, in the source expression and compute their corresponding target images to be 6; 3; ?9. Then the embedding morphism identi es the source operation that creates the expression y z. The target derived operation associated with is computed, taking as operands the images of y and z (3 and -9) and generating ?27. Finally, the last source operation of + is employed to create x + y z. The associated target derived operation is integer addition and is computed on the image of the arguments, i.e., 6 + ?27 giving ?21 is computed. Thus, by associating each free generator and operation in the source algebra with a derived operation in the target algebra, we can construct the problem solving homomorphism. We typically use this methodology to implement algebraic compilers which view the source and target languages as algebras, and the compiler is an embedding morphism of the source language algebra into the target language algebra. In this paper, we describe how this methodology can also be applied to temporal logic and model checking. To use this methodology for temporal logic model checking we must give an algebraic speci cation of the CTL formulas and the sets of states of the model. This requires the speci cation of the operators and the carrier set for the CTL algebra A and the sets algebra A : For a given model M, the carrier set of A , FM, is the set of CTL formulas generated from the atomic propositions which appear on the states of M using the operators employed in the temporal logic. Thus, the free generators, or nullary operations, for A are true, false, ap1, ap2 , ..., ap which generate the CTL formulas denoted by their operator names. The other operations for creating CTL formulas from other CTL formulas are ^, _, :, AX , EX , AU , and EU . For example, the generator ap3 generates the CTL formulas \ap3" and from CTL formulas f1 = ap1 ^ ap5 and f2 = AXap4 , the operator AU generates the CTL formula \A[ap1 ^ ap5 UAXap4 ]". Algebraically, CTL is de ned as the algebra CT L

Sets

CT L

CT L

n

A

CT L

= : i

Similarly, we specify the language of the set of sets of states of a model M = as a universal algebra A with carrier set SM = 2 nullary operators ;, S , s 2 S , E (s ) 2 S , P (ap ) 2 AP , and binary operators \, [, and n (set dierence). Thus, the algebra for the set of sets of states of M is S

S

Sets

i

i

i

A

Sets

= : i

i

i

i

i

A homomorphism between two algebras requires that they be similar, that is for each operation of arity n in the source algebra, there is an operation of arity n in the target algebra. Since the two algebras A and A are not similar, we can not implement the model checker as a homomorphism, but we can implement it as an embedding morphism. Thus, CT L

Sets

11

for each operation of arity n in A we create a derived operation of arity n in the algebra A that implements the source operations in the target algebra. Thus to implement the model checker as an embedding morphism we must specify a derived operation in the target algebra for each operation in the source algebra. The sub-algebra of A composed of the target carrier set and the set of derived operations is similar to A . Thus we can view the model checker as either an embedding morphism from A to A or as a homomorphism from A to the speci ed sub-algebra of A . For example, the source operation ^ can be implemented by the target operation \ (set intersection). Given two CTL formulas f1 and f2 and the set of states on which they hold, S1 and S2, the set of states on which the formula f1 ^ f2 holds is the set S1 \ S2. Here, the source operation is implemented by a single target operation, but the temporal CTL operators can not be implemented by a single target operation. Thus, we will create derived operations that are some combination of the original target operators to implement these temporal operators in the target algebra. In the next section, we discuss the speci cation of the target derived operations as a programming language over sets and how the speci cation is used to generate the model checker implementation. CT L

Sets

Sets

CT L

CT L

CT L

Sets

Sets

4 Model checker generation from speci cations To generate a model checker, as well as an algebraic compiler, or other embedding morphism, we must specify the derived target operations associated with each source algebra operation in a format that allows our tools to read this speci cation and generate the implementation of the model checker, compiler, or embedding morphism [RW96]. For this speci cation, we use the fact that the signature of the source algebra operations can be used to generate a set of BNF grammar rules which de ne the language of the source algebra. The signature of A can be used to de ne a BNF grammar whose language is the set of CTL formulas. Thus, the problem of discovering which operations where used to build up an expression in the source algebra is the parsing problem of the CTL formulas. Our tools use this speci cation to generate a pattern matching parser [Kna94] that recognizes the language of the A source algebra. In the integer expression algebra example in the previous section, the BNF grammar derived from the signature of the source operations yields the ambiguous grammar: E ::= E \+" E; E ::= E \*" E; E ::= x; E ::= y; E ::= z; CT L

CT L

We can dis-ambiguate this grammar by splitting the carrier set E into the three level of generation E (Expression), T (Term), and F (Factor), to get the algebraic operations x : ; ! F , y : ; ! F , z : ; ! F , ? : T ! F , ? : E ! T , : T F ! T , and + : E T ! E . This gives us the familiar non-ambiguous grammar: E ::= E \+" T; E ::= T; T ::= T \*" F; T ::= F; F ::= x; F ::= y; F ::= z; From this non-ambiguous grammar the tools generate a pattern matching parser that recognizes the language of expressions over variables x, y, and z. 12

We have already seen the algebraic speci cation of CTL as A and the model state sets as A . While the BNF grammar generated directly from the signature of A is ambiguous, it can be dis-ambiguated by splitting the carrier set FM on the level of generation in the same way that we dis-ambiguated the expression algebra above. Thus, to keep the discussion simple, we present the ambiguous grammar and the derived operations associated with the rules of that grammar. CT L

Sets

CT L

In specifying the derived operations, we must describe how the target operations are to be used to create the target expression corresponding to the source expression. Thus, the derived operations are written in a pseudo-programming language for sets. The expected operations of \, [, and n as well as some conditional set description notations are included. We also allow set assignment statements and while-loop constructs. In these derived operations we use parameters denoted @i, i = 0; 1; : : :, to represent the target images of their operands. The parameter @0, which often appears on the left hand side of an assignment statement, refers to the target set associated with the left hand side non terminal symbol of the BNF grammar rule for that derived operation. @1, @2, and later @3 are the names for the target algebra constructs that are images of the source algebra construct generated by a grammar rule with the 1 , 2 , or 3 respectively, non-terminal symbol on the right hand side of the grammar rule. st

nd

rd

Below is a sampling of the speci ed derived target operations. The source operation true whose signature is true: ; ! Formula can be written as the following BNF rule and derived operation: Formula = \true" @0 = S Since the CTL formula true holds on all states in the model, the target expression associated with \Formula" is S, the set of all states in the model. Similarly, since the formula false is not satis ed on any states in the model, we have the following BNF rule and derived operation for the operation false: Formula = \false" @0 = ; Below, the set of states associated with the atomic proposition ap is P (ap), the set of states on which ap holds. Formula = \ap" @0 = P (ap) For the CTL operation :, with signature :: Formula ! Formula we have the BNF rule and derived operation of: 13

Formula = : Formula @0 = S n @1 For the operation \or", with signature _: Formula Formula ! Formula we have the BNF rule and derived operation of: Formula = Formula _ Formula @0 = @1 [ @2 The set of states on which f1 _ f2 is satis ed is the union of the set of states on which f1 is satis ed and the set of states on which f2 is satis ed. In a similar manner, the CTL operation of ^ give the BNF rule and derived operation: Formula = Formula ^ Formula @0 = @1 \ @2 For the next-time temporal operators AX and EX we have the following: Formula = \AX" Formula @0 = fr 2 S jE (r) @1g This derived operation conditionally uses the target operation [ over singleton sets frg to generate the set of all states, r, such that all successors of r are in the set @1. Similarly, for EX, we have the BNF rule and derived operation: Formula = \EX" Formula @0 = fr 2 S jE (r) \ @1 6 ;g Here, the derived operation nds all states r which have at least one successor in @1. The derive target operation for the temporal operators AU and EU require a more complicated use of the [ target operation. Here we see the use of assignment statements and loops to control the application of [. For AU we have Formula = \A" \[" Formula \U" Formula \]" let Z , Z 0 be sets; Z = ; ; Z 0 = @2 ; while ( Z 6= Z 0 ) do Z = Z 0 ; Z 0 = Z 0 [ fr 2 S jr 2 @1 ^ E (r) Z g ; @0 = Z ; This derived operation implements the least xed point solution to the equation Z = lfp(@2[ (@1 \ fr 2 S jE (r) (@1 \ Z )g)). Similarly for EU we have Formula = \E" \[" Formula \U" Formula \]" let Z , Z 0 be sets; Z = ; ; Z 0 = @2 ; while ( Z 6= Z 0 ) do Z = Z 0 ; Z 0 = Z 0 [ fr 2 S jr 2 @1 ^ E (r) \ Z 6= ;g ; @0 = Z ; From this speci cation the algebraic tools can automatically generate the model checker algorithm. 14

5 Extending the temporal logic and model checker As stated earlier, the algebraic tools provide a framework which allows users of temporal logic to modify and experiment with temporal logics and model checkers in their various forms. We now demonstrate how easily this can be accomplished when the temporal logic and model checker are speci ed and implemented within the framework of this algebraic methodology. We extend the CTL algebra A to allow path quanti cation of the temporal operators. Also, we extend the model target algebra A to allow edge propositions, and we modify and add the derived operations that specify the new model checker. We must add to A operations to create the path formulas used in path quanti cation. Thus, new free generators for the edge propositions and operators to create path formulas must be added. The path formula free generators are true , false , ap 1 , ap 2 , :::, and ap m for true and false path formulas and the edge propositions ap j ; 1 j m that label the edges of the model M. We also add path versions of the boolean operators _ , ^ , and : and subscript them with an \p" to distinguish them from the CTLe boolean operators _, ^, and :. (Note that in the actual implementation, the subscripts are dropped since the parser can determine which logical operation is being employed by the context of the operator. This is similar to operator-overloading in programming languages.) The path formulas thus require us to split the carrier set FM of A into CLTe formulas and path formulas. Thus, the signature of the new operators above specify the path formulas, not CTLe formulas. We also modify the temporal operators AX, EX, AU, and EU to allow path quanti cation. The non path quanti ed operations are removed since they can be implemented by the path quantifying operations by using true as the path formula. EX and AX must be modi ed from their previous unary form to become binary operators which create a CTLe formula from a path formula and a CTLe formula. Also, AU and EU become ternary operators, working on three operands - a path formula and two CTLe formulas - to create a CTLe formula. Thus A can be speci ed as: A = : CT L

Sets

CT L

e

e

e

e

e

e

p

p

p

C T Le

C T Le

C T Le

i

e

e

e

p

p

p

We also extend the target algebra to accommodate the operations on edge propositions and path formulae. Thus the carrier set for AM is extended to a split carrier set of sets of states and sets of edges. We also add free generators for the empty set of edges ; , for the entire set of edges E , for each edge e 2 E , and for the set of edges labeled by each edge proposition P (ap j ) for each ap j 2 EP . The edge set operations \ , [ , and n are also added. The extended algebra can be speci ed as: AM = : e

j

e

e

e

i

e

j

e

i

e

e

i

e

e

e

With the source and target algebras appropriately extended, we then need to specify the derived target operations that specify the model checker. Note that after splitting the carrier 15

set into CTLe formulas and path formulas, the BNF grammar corresponding to the algebra's signature is again ambiguous. For simplicity, we again work with the ambiguous grammar since the dis-ambiguation process is clear and was seen above. The derived target operations for the new operation generators are given below. These are similar to the previous derived operations are are self explanatory. PathFormula = \true " @0 = E e

PathFormula = \false " @0 = ; e

e

PathFormula = \ap j " @0 = P (ap j ) e

e

PathFormula = \: " PathFormula @0 = E n @1 e

e

PathFormula = PathFormula \_ " PathFormula @0 = @1 [ @2 e

e

PathFormula = PathFormula \^ " PathFormula @0 = @1 \ @2 e

e

The macro below for the path quanti ed EX operator will nd all states r which have a successor r0 such that r0 is in @2 and (r; r0) is in @1. Formula = \EX" PathFormula Formula @0 = f r 2 S j 9r0 2 S; (r; r0) 2 @1 ^ r0 2 @2 g The macro for the AX operator will nd all states r such that all successors of r, r0, are in @2 and all edges (r; r0) are in @1. StateFormula = \AX" PathFormula StateFormula @0 = f r 2 S j 8r0 2 S; (r; r0) 2 @1 ^ r0 2 @2 g For AU we have StateFormula = \A" \[" StateFormula \U" \f" PathFormula \g" StateFormula \]" let Z , Z 0 be sets; Z = ; ; Z 0 = @3 ; while ( Z 6= Z 0 ) do Z = Z 0 ; Z 0 = Z 0 [ fr 2 S jr 2 @1 ^ 8r0 2 E (r); ((r; r0) 2 @2 ^ r0 2 @3)g ; @0 = Z ; 16

Similarly for EU we have StateFormula = \E" \[" StateFormula \U" \f" PathFormula \g" StateFormula \]" let Z , Z 0 be sets; Z = ; ; Z 0 = @3 ; while ( Z 6= Z 0 ) do Z = Z 0 ; Z 0 = Z 0 [ fr 2 S jr 2 @1 ^ 9r0 2 E (r); ((r; r0) 2 @2 ^ r0 2 @3)g ; @0 = Z ; The algebraic tools then use this speci cation to generate the model checker for the extended logic CTLe.

References [Ban88] Utpal Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic, Boston, 1988. [CES86] E.M. Clarke, E.A. Emerson, and A.P. Sistla. Automatic veri cation of nite-state concurrent systems using temporal logic speci cations. ACM Transactions on Programming Languages and Systems, 8(2):244{263, 1986. [Kna94] J.L. Knaack. An Algebraic Approach to Language Translation. PhD thesis, The University of Iowa, Department of Computer Science, Iowa City, IA 52242, December 1994. [RW96] T. Rus and E. Van Wyk. Algebraic implementation of model chacking algorithms. In Third AMAST Workshop on Real-Time Systems, Proceedings, pages 267{279, March 6 1996. Available upon request from A. Cornell, BYU, Provo, Utah.

17