Synthesizing Procedural Abstractions from Formal

Proc. of IEEE Comp. Soft. & Applications Conf. (COMPSAC), pp. 149-154, Sept. 1991

Synthesizing Procedural Abstractions from Formal Speci cations Betty H.C. Cheng Department of Computer Science Michigan State University East Lansing, Michigan 48824

Abstract

This paper describes the development of the Seed system, which demonstrates that the building blocks of a large software system can be correctly synthesized from user-supplied formal speci cations using techniques amenable to automation. Seed accepts a formal speci cation of a problem written in predicate logic and generates annotated program source code satisfying the speci cation. In addition to primitive programming language constructs, Seed is capable of synthesizing recursive and non-recursive procedures and functions, and abstract data types.

1 Introduction

Research concerning the use of formal methods in software tools seeks to facilitate the development of correct software [14, 8, 11, 13]. Our objective is to build a software development environment consisting of tools that will support the use of formal methods in all phases of the development of large software products, including design, speci cation, implementation, and maintenance. In the rst stage of this project, we have designed and implemented Seed, which is a tool that develops programs from formal speci cations while verifying their correctness [4]. In the Seed project, we have shown that the building blocks of a large software system can be correctly synthesized, in an automated manner, from user-supplied formal speci cations. Seed accepts a predicate logic speci cation of a problem and generates program source code satisfying the speci cation. Figure 1 gives a pictorial overview of the Seed system. Once a speci cation is entered, it undergoes pre-processing. The speci cation may be decomposed into logical tasks that are synthesized as functions or procedures. If an abstract data type (ADT) is to be generated, the ADT speci cation is tested for sucient completeness [10, 6]. After pre-processing is completed, programming statements are synthesized in order to satisfy the speci cation. The rules for choosing which programming language structures to synthesize are stored in a rule base. Background knowledge and domain-speci c information are entered into a fact base. During synthesis, Seed uses the fact base to disambiguate rule applications. Synthesized programming statements are veri ed for correctness; this often requires the use of a theorem prover that interfaces with Seed. During the veri cation of a statement, new logical expressions

Specification Editor

Specification

Decomposition

ADT Processing

Software Component Library

Domain-Specific Facts

Synthesis of Programming Structures

Composition of Results

Legend Annotated Program

Next Step Optional

Figure 1: Overview of Seed may need to be satis ed through further synthesis. This method leads to the simultaneous construction of a program and its proof of correctness in a bottomup manner. Once Seed has successfully synthesized a procedure, function, or ADT, the speci cation and resultant code are stored in a module library. If no application of rules will yield a satisfactory program, Seed informs the user, who may then modify the speci cation and restart the synthesis. A program speci cation is expressed in terms of pre- and postconditions that describe the initial and nal states, respectively, of a program. Seed's synthesis rules are based in part on Dijkstra's [7] and Gries' [9] use of weakest precondition for correct program development. The choice of a programming statement S for a postcondition R is veri ed by nding the weakest precondition (wp) of S with respect to R, where wp(S; R) describes the set of states in which execution of S can begin such that, upon termination

of the function is now e; at all other points the function is unchanged. The new function is represented as j ! e (a; i : e)[j] = ii = 6= j ! a[j]; where the new function is applied to parameter j. Because an assignment to an array element may be treated the same as an assignment to a simple variable, the corresponding wp is expressed as wp(a[i] := e; R) = Ra(a;i:e) :

of S, R will be satis ed. All rules are coded in Prolog as it can be readily applied to predicate logic speci cations. The resultant code is annotated with its pre- and postconditions for informative purposes. This paper describes the rules used to develop primitive programming structures and procedural abstractions that encapsulate these structures. See [4, 3] for further details concerning the synthesis of abstract data types, decomposition of speci cations, and a discussion of related work.

2 Primitive Structures

Predicate logic speci cations may contain conjunctive, equality, inequality, disjunctive, and quanti ed expressions. For each type of logical expression, we have developed rules to synthesize one or more primitive programming statements that achieve the behavior speci ed by the expression.

2.2 Alternatives

A predicate logic expression containing one or more occurrences of the disjunctive operator _ can be satis ed with an alternative statement (if- ). A speci cation R of the form R1 _ : : : _ Rn , is implemented by the statement if B1! S1 || B2! S2 .. .. . !. || Bn! Sn fi. Each disjunct Ri may be implemented by a guarded commandBi ! Si , where the statement Si is executed only if the guard Bi is true. The symbol || separates the guarded commands. The wp of the alternative statement is that at least one guard be true and that every guard Bi must logically imply the wp of its corresponding statement list Si with respect to the postcondition R. Symbolically, the wp is expressed as (9i :: Bi ) ^ (8i :: Bi ! wp(Si ; R)); where `::' indicates that the range of the quanti ed variable i is not pertinent to the current discussion. If none of the guards is true, then the wp is not satis ed; therefore, the execution of the alternative statement is aborted. The alternative statement is successfully executed if at least one guard Bi is true. Analogously, a disjunctive expression has the semantics that, if one disjunct has the value true, then the entire expression has the value true. Therefore, each disjunct is satis ed by one guarded command of an alternative statement. The rules we developed for synthesizing the alternative statement for a disjunctive expression are summarized in Figure 3. See [4] for further details and examples.

2.1 Assignments

A predicate logic expression of the form x = expr, occurring in a postcondition R, can be satis ed with an assignment statement, x:= expr, where expr is evaluated and its value is stored at location x. We assume that equality expressions are written with the orientation that x is greater than expr according to the following partial ordering on symbols: arrays and names, variables primitive operators (e.g. +; ?; ; =) constants After synthesizing an assignment statement, we must nd the weakest precondition (wp) of the assignment with respect to the postcondition R. This is expressed as wp(x:=expr; R) = Rxexpr , which represents the postcondition R with every occurrence of x replaced by the expression expr. This is termed a textual substitution of x by expr in expression R. Figure 2 summarizes the rules for synthesis of an assignment statement. Given an expression of the form x = expr, occurring in a postcondition R, we perform the following steps in developing an assignment statement. 1. The LHS of the equation x is a nonconstant term and is greater than the RHS expr with respect to the partial ordering on the symbols. 2. Synthesize the statement x:= expr. 3. Find the wp of the assignment statement with respect to the postcondition, R.

2.3 Iteratives

A quanti ed expression speci es one or more conditions existing over a range of values. Analogously, an iterative statement performs a set of operations over a range of values. The iterative statement in our target language is of the form do B1! S1 || B2! S2 .. .. . !. || Bn! Sn , od.

Figure 2: Synthesizing an assignment statement For speci cations involving arrays, we consider arrays as functions [9]. Using this approach, a[i] represents the application of function a to a parameter i. Following this line of reasoning, the assignment a[i]:= e rede nes the function a: at point i the value 2

For a postcondition R, where R R1 _ : : : _ Rn, 1. Find an expression Ei that can be satis ed by an executable statement Si in disjunct Ri; 2. Let the guard Bi be the wp of Si with respect to Ri ; 3. Simplify and remove tautologies 4. Repeat Steps 1-3 until all disjuncts have been processed. Figure 3: Rules for synthesizing alternative statements

is satis ed, where G is true when at least one guard Bi is true. As an example of the synthesis of an iterative, alternative, and assignment statements, consider the following speci cation of a sort program: R : (8i : 0 i < n ? 1 : (8j : 0 j < n ? i : c[j] c[j + 1])) ^ perm(c; c0), where perm(c; c0) indicates that the array c is a permutation of the original array c0. We will assume that the permutation property is satis ed for the entire development process and will not be explicitly included in subsequent expressions. Because the speci cation involves a generally quanti ed expression, the rules for synthesizing an iterative statement are applied. First, Seed develops the invariant from R. (8i : 0 i < n ? 1 : (8j : 0 j < n ? i : c[j] c[j + 1])): The system replaces the constant expression n ? 1 with variable k to obtain the expression (8i : 0 i < k : (8j : 0 j < i : (c[j] c[j + 1])) Because the new variable k was introduced, a range must be given for the new variable; the range is (0 k n ? 1). Next, the guard for the iterative statement is developed; the guard k 6= n-1, represents the inequality of the newly introduced variable (k) and its upper bound (n ? 1). The nal expression for the invariant P is P : (8i : 0 i < k : (8j : 0 j < i : c[j] c[j + 1])) ^ (0 k n ? 1) Figure 4 contains the nal results for the sort program. (The quanti ers 8 and 9 are denoted as A and E, respectively, in Seed's synthesis results.)

where Bi ! Si again represents a guarded command. The iterative statement, like the alternative statement, uses guarded commands; at least one guard Bi must be true in order to enter the loop. The synthesis process begins with the application of rules that we developed to nd an invariant expression from the original speci cation. An invariant P describes the conditions of an iterative statement that exist before entry and upon exit. We can develop an invariant by deleting a conjunct Ci from a postcondition C1 ^ : : : ^ Ci ^ : : : ^ Cn, then the corresponding guard is the negation of the deleted conjunct (:Ci) and the invariant P becomes C1 ^ : : : ^ Ci?1 ^ Ci+1 ^ : : : ^ Cn: The loop stops when Ci is true. We can also develop an invariant by replacing a constant with a variable in the postcondition, then the corresponding guard is the inequality between the constant and the variable. Thus, for a postcondition of the form (8i : 0 i < n : C(i)) ^ Rj , we form the invariant by replacing the constant n with some variable k and de ne the range for the new variable. Then the invariant P is of the form, (8i : 0 i < k : C(i)) ^ Rj ^ (0 k n); and the guard B is the expression k 6= n. The loop stops when the guard is not true. After developing the invariant and the guards for the loop, we must show that the invariant P is initially true; this may require the synthesis of one or more statements. Next, we construct a bound function t, the value of which must decrease with each iteration of the loop structure thus imposing a bound on the number of loop iterations. In order to decrease the bound function, we must synthesize a statement S that changes a variable involved in the de nition of the bound function. The wp of the statement S may not imply the invariant; for those cases, more statements need to be synthesized to ensure the invariant is satis ed before and after execution of the loop. Termination occurs when none of the guards is true and the relationship P ^:G ! R

3 Procedures and Functions

Until now, we have considered programs built from simple statements. For a program synthesis tool to handle more sophisticated problems, it must also provide support for procedural abstraction, which allows the programmer to extend the virtual machine de ned by a programming language through the addition of new operations. Procedural abstractions can either be procedures or functions, the dierence being that a procedure returns values through its parameters and a function returns a single value. The term routine is used when the discussion is applicable to both procedures and functions. A synthesized routine may be invoked to satisfy future speci cations. This section describes two dierent aspects of routines. Sections 3.1 and 3.2 describe the synthesis of non-recursive and recursive routines, respectively. Section 3.3 discusses the veri cation of the use of previously synthesized routines for new speci cations; just as selections of primitive program statements

3

2. Each disjunct must have two or more conjuncts. 3. One of the conjuncts in each disjunct must be an equality expression of the form f(x1 ; : : :; xn) = expr; (1) where f is the name of the function being speci ed, and expr de nes the value of f for the list of formal parameters x1; : : :; xn, where there exists a well-founded ordering (wfo ) between f(x1 ; : : :; xn) and expr. 4. There must exist at least one disjunct in which expr is primitive, that is, containing no unde ned function values. An unde ned function value is a reference to a function value that has not been de ned. The group of conjuncts within this disjunct is called the termination condition. 5. The remaining conjuncts of each disjunct must be predicates that pertain to the parameters x1; : : :; xn. For instance, the speci cation for the factorial function in canonical form is Q : true R : (x = 0) ^ fact(x) = 1) _ (2) ((x > 0) ^ fact(x) = x fact(x ? 1); (3)

f trueg k := 0 ; fP: (A i:

0 =< i < k-1:(A j: 0 =< j k := k + 1 ; kk := 0 ; fP2: (A j: 0 =< j < kk:(c[j] =< c[j+1]) & (0 =< kk =< n-k) g do kk n-k -> kk := kk + 1 ; if c[kk] =< c[kk+1] -> skip || c[kk] > c[kk+1] -> c[kk],c[kk+1] := c[kk+1],c[kk] fi; fc[kk] =< c[kk+1]g od; od; fR: (A i: 0 =< i < n-1: (A j:0 =< j < n-i:c[j] =< c[j+1])) g

Figure 4: Sort program generated by Seed must be veri ed with respect to their speci cations, so must the routines selected to satisfy new speci cations be veri ed.

3.1 Non-Recursive Routines

A speci cation for a routine consists of three parts: a precondition, a postcondition, and a header declaration. The precondition describes the conditions of the variables prior to the execution of the routine whose behavior is described by the postcondition. The header declaration of a routine contains the routine name, the names and types of the formal parameters of the routine and, in the case of a function, the type of the value returned. In our programming language, formal parameters may be of type value, value-result, or result.1 A procedure is invoked by its name instantiated with a list of actual parameters that correspond to the formal parameters. The synthesis of non-recursive routines involves simply generating the body of the routine using rules discussed in Section 2, enclosing the results in begin and end, and attaching the header declaration.

where x is a value parameter, Q and R are the pre- and the postconditions, respectively, and the underlined conjuncts are equations involving the function. The most important consideration when developing the rules for the synthesis of recursive functions is to guarantee termination of the program. Termination involves two problems. The rst problem is to nd the termination conditions, that is, those conditions that must hold when the routine does not make a recursive reference. Termination conditions establish a bound on the number of recursive invocations. For speci cations in the form given in De nition 1, termination conditions are found by satisfying Item 4. In the factorial example, disjunct (2) contains an equation involving the fact function only on the LHS. The expression x = 0 is therefore the termination condition; an invocation of the function fact with x equal to 0 will not involve recursion. Once the termination conditions are found, the second part of guaranteeing termination is to ensure that the termination conditions are eventually reached. We use equational logic reasoning [5, 6] in proving termination properties for recursive de nitions and their corresponding programs. Speci cally, for recursive functions, proving progress towards termination requires examination of equations that contain references to the function being de ned on both sides of an equation. In order to ensure that recursive references do not lead to in nite invocations, there must be a well-founded ordering (wfo) [5] between the LHS and the RHS of these equations.

3.2 Recursive Functions

Some functions are naturally expressed recursively. For instance, the factorial function is de ned as fact(0) = 1 ^ (fact(x) = x fact(x ? 1) ^ x > 0): There is only one argument for fact. In order to specify the de nition in terms of the parameter, we use the following de nition to produce a canonical form for recursive function speci cations.

De nition 1

The speci cation for the postcondition of a recursive function has the following format: 1. It must contain a disjunctive expression of two or more disjuncts. 1 We adopt the usual meanings for these terms; see [1] for more information.

4

In order to facilitate the determination of a wfo, we use the partial ordering on symbols given in Section 2.1 and user-supplied precedences for newlyintroduced symbols, such as new functions, that obey the wfo. Referring to the factorial function, disjunct (3) contains an equation that has references to the factorial function on both the LHS and the RHS. In order to guarantee progress towards termination, we must show that fact(x) wfo x fact(x ? 1); where wfo represents a well-founded ordering. Thus, we must show that fact(x) is greater than the expression x fact(x ? 1). Because there is a recursive reference to fact, using only the partial ordering on symbols is not sucient to determine a well-founded ordering, thus, we use the recursive path ordering (rpo) [5] to determine the existence of a wfo. Referring to the factorial example, we know fact is true from the partial ordering on functions given above, so we must show that fact(x) rpo x and fact(x) rpo fact(x ? 1). But from the partial ordering on functions, we also know that fact rpo x, and using the second item in the de nition of rpo and axioms of natural numbers, which indicates that x is greater than x ? 1, we know that fact(x) wfo fact(x ? 1). The speci cation for a recursive function has the same syntactic structure as a speci cation that is satis ed by an alternative statement, that is, a set of guarded commands where each guarded command implements one disjunct of the speci cation. Recall our discussion concerning array assignments in Section 2. We consider the name of the function to be a variable that can be assigned the value of an expression determined within the body of the function with respect to a speci c set of parameters. See [4] for relevant theorems and proofs. The nal results for the synthesis of the factorial function are given in Figure 5.

f g f

satisfy a new speci cation, we must rst verify that the routine satis es the new speci cation. Most stored speci cations will not directly match a new speci cation; thus, there is a need for one or more predicates to relate the old and the new speci cations. Martin [12] used an adaptation predicate A to relate the speci cation of a stored routine to a new speci cation and is independent of the value-result and result parameters. For instance, consider a swap routine and its speci cation given in Figure 6, where x and y are valueresult parameters. Given a new speci cation with the precondition f(a = b0) ^ (b = a0)g and the postcondition E, f(a = a0) ^ (b = b0)g, where z0 represents the initial value for a variable z, we want to nd the wp of a call to the swap procedure. Unless otherwise stated, we use the notational convention: x represents a formal value parameter in the called procedure, y represents a formal value-result parameter, z represents a formal result parameter, a is the actual value parameter, b is the actual value-result parameter, and c is the actual result parameter.

De nition 2

Given a procedure S with a precondition U, a postcondition V that satis es the relation U ! wp(S; V ), a postcondition E of a new speci cation, the adaptation predicate A satis es the expression b;c : A ^ V ! Ey;z (4) The RHS of expression (4) is the postcondition E after the actual value-result and result parameters b and c, respectively, have been textually substituted with the respective formal parameters y and z. The value parameters are not substituted into E in expression (4) because only value-result and result parameters can change in E. For our swap example,

f (x = y0) ^ (y = x0) g procedure swap((val-res, [x,y])) begin x,y:= y,x; end f (x = x0) ^ (y = y0) g

((x = 0) | (x > 1)) & (x = 0) -> wp(fact(x):=1,R) & (x > 1) -> wp(fact(x):=x * fact(x-1),R)

Figure 6: Program and speci cation for swap

function fact(x) : if x = 0 -> fact(x):= 1 || x > 0 -> fact(x):= x * fact(x-1) fi R: (fact(x) = 1) & (x = 0) | (fact(x) = x * fact(x-1)) & (x > 0) g

we have the expression A ^ (x = x0) ^ (y = y0) ! ((a = a0) ^ (b = b0))a;b x;y ; where the adaptation predicate relates the variables x0 and y0 with the values a0 and b0, respectively. The wp of procedure call S(a; b; c) for (4) gives wp(S(a; b; c); A ^ V )xa ! wp(S(a; b; c); E); (5) where x represents the formal value parameter in the procedure S, and a is the actual value parameter. Expression (5) is simpli ed to obtain expression (6),

Figure 5: Factorial function synthesized by Seed

3.3 Veri cation of Invocations

Using the rules described thus far, Seed is able to synthesize primitive programming structures and routines to encapsulate these structures. If a previously synthesized routine is to be used to 5

4 Concluding Remarks

Algorithm 1

We have presented a system of rules that are used in the automated synthesis of primitive programming structures and procedural abstractions from speci cations. The rules take as input speci cations written in predicate logic and generate imperative programs. Our future investigations will continue the design and implementation of tools that will support the use of formal methods in software development.

Given a problem speci cation E , where

E E 1 ^ E2 ^ : : : ^ E n and the speci cation V for a previously synthesized

procedure, where

V V1 ^ : : : ^ V m ; the algorithm returns a set of uni ers such that V E , where represents the factor relationship

References

[1] Alfred V. Aho and Jerey D. Ullman. Principles of Compiler Design. Addison-Wesley, Reading, MA, 1977. [2] Chin-Liang Chang and Richard Char-Tung Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press, 1973. [3] Betty H.C. Cheng. Synthesis of Data Abstractions from Formal Speci cations. In Figure 7: Algorithm to nd Adaptation Predicate Proceedings of Irvine Software Symposium, 1991. [4] Betty Hsiao-Chih Cheng. Synthesis of Procedural which shows that (A ^ U)xa implies the wp for the and Data Abstractions. PhD thesis, University procedure call S(a; b; c) with respect to E: of Illinois at Urbana-Champaign, 1304 West Spring eld, Urbana, Illinois 61801, August 1990. (A ^ U)xa ! wp(S(a; b; c); E): (6) Tech Report UIUCDCS-R-90-1631. We developed an algorithm, amenable to automa[5] Nachum Dershowitz. Orderings for termtion, given in Figure 7 to nd an adaptation predicate. rewriting systems. Theoretical Computer Science, The algorithm determines if a stored speci cation V b;c ), where V 17(3):279{301, 1982. subsumes the current speci cation E 0 (Ey;z subsumes E 0 if there exists a substitution such that [6] Nachum Dershowitz and Jean-Pierre Jouannaud. V E 0. First, following step (1) of Algorithm 1, Rewrite systems. In J. van Leeuwen, editor, we apply the uni cation algorithm [2] to0 nd a most Handbook of Theoretical Computer Science, general uni er such that V E . For the chapter 6. North-Holland, Amsterdam, 1990. In swap example, is determined to be the predicates press; available as Rapport 478, LRI,Univ. Parisunify(x0; a0) and unify(y0; b0), where x0 ; y0 will be Sud,France. replaced by a0; b0, respectively. Next, following step (2) of Algorithm 1, we nd a [7] Edsger W. Dijkstra. A Discipline of Program0 but not in V , that set C of conjuncts that are in E ming. Prentice Hall, Englewood Clis, New is, C = E 0 ? V . Jersey, 1976. In the swap example, we nd that C is the empty set from the expression [8] Susan L. Gerhart. Applications of formal methods: Developing virtuoso software. IEEE x 0;y 0 C = fx = a0; y = b0g ? fx = x0; y = y0ga0;b0 : Software, pages 7{10, September 1990. [9] David Gries. The Science of Programming. Continuing with the swap example, we instantiate (6) Springer-Verlag, 1981. in order to obtain [10] Deepak Kapur, Paliath Narendran, and Hantao (unify(x0; a0) ^ unify(y0;x;yb0)^ Zhang. Automating Inductionless Induction (x = y0) ^ (y = x0))a;b ! using Test Sets. J. Symbolic Computation, 11:83{ wp(swap(a; b); (a = a0) ^ (b = b0)). 111, 1991. The value parameters are substituted into the LHS [11] Nancy G. Leveson. Formal Methods in Software of the implication and the substitutions speci ed in Engineering. IEEE Transactions on Software the unify predicates are performed to obtain the Engineering, 16(9):929{930, September 1990. precondition that implies the wp of the call swap(a; b). The nal expression is [12] Alain J. Martin. A General Proof Rule for Procedures in Predicate Transformer Semantics. (a = b0)^(b = a0) ! wp(swap(a; b); (a = a0)^(b = b0)): Acta Informatica, 20(Fasc. 4):301{313, 1983. for clauses. 1. Using the uni cation algorithm [2]. nd a most general uni er such that V E is true. and 2. C is made up of predicates in E , but not in V . (9C : C E : C 6 V ^ (vars(C)\ fvariables(value-result,result)g = ;))

6

[13] Mark Moriconi, editor. International Workshop on Formal Methods in Software Development, Napa, California, May 1990. ACM SIGSOFT. [14] Jeannette M. Wing. A Speci er's Introduction to Formal Methods. IEEE Computer, 23(9):8{24, September 1990.

7