An Overview of the Extended Static Checking System - Semantic Scholar

2 downloads 0 Views 57KB Size Report
Nov 28, 1995 - David L. Detlefs. Digital Equipment Corporation, Systems ..... David Guaspari, Carla Marceau, and Wolfgang Polak. Formal verification of Ada ...
An Overview of the Extended Static Checking System David L. Detlefs Digital Equipment Corporation, Systems Research Center 130 Lytton Ave, Palo Alto CA 94301 [email protected] http://www.src.dec.com/SRC/personal/David_Detlefs/home.html November 28, 1995 1. Introduction The Extended Static Checking system (henceforth ESC) is a checker aimed at statically detecting simple errors in programs; e.g., NIL dereferences, out-of-bounds array indices, or simple deadlocks or race conditions in concurrent programs. ESC attempts to achieve these fairly modest goals using a quite general program verification framework. The user annotates the program being checked with specifications; a verification condition generator transforms the program and specification into a logical formula whose validity ensures the absence of the errors being considered. This formula is passed to an automatic theorem prover (called Simplify) developed expressly for ESC. If the prover is unable to prove that the errors do not occur, it returns (roughly) an assignment of values to program variables that falsifies the formula. This information can be presented to the programmer, giving information about the error somewhat akin to what a debugger provides when examining a core file left by a runtime occurrence of the error. We hope that this kind of limited verification will be seen in the future much as type-checking is viewed today. After all, a type represents a simple predicate that holds of all its instances; ESC specifications can be viewed as providing a richer language for expressing such predicates. This goal leads to several requirements on our system. The annotations must be simple, or programmers will not add them. The checking must be fast, or programmers will find the extra safety outweighed by the inconvenience of using the tool. The checking must also be automatic; type-checking would be much less popular if it required programmer interaction. The rest of this paper is organized as follows. Section 2 presents some related work. Section 3 describes the organization of the system. Section 4 briefly describes the specification language, including some interesting issues that arise when multiple levels of abstraction are present in the system. Section 5 describes the theorem prover. Section 6 describes some of the uses to which ESC has been put. Finally, section 7 presents conclusions and future directions.

2. Related Work There are several classes of work related to ESC. Program verification is one; research in this field has a long and distinguished history. The work of Floyd [4], Hoare [9], [10], and Dijkstra [4] (to name just a few) has provided theoretical underpinnings. Our approach is most closely related to the work of Dijkstra. Systems such as AFFIRM [19], the Stanford Pascal Verifier [24], the Ford Aerospace Pascal verification system [20], the Boyer-Moore prover [1], and more recently Penelope [7] have made important contributions. Our system generally contrasts with these in the following way: the others are intended to verify full functional correctness of programs written in simple languages invented for the system, or in tractable subsets of existing languages. Our system is attempting verifications of less ambitious properties, but for programs written in Modula-3 [21, 18], a modern, full-featured objectoriented language. We do subset the language, but only in that we restrict our attention to a safe subset already defined by the language.1 This subset enforces, for example, disciplined use of pointers, as the subset that most applications programmers should use. LCLint [5] and Aspect [11] are two different, and quite successful, attempts at using formal specifications to provide useful error checking. Each concentrates on specifications of what modifications are permitted by a procedure (using this information, however, in quite different ways). The emphasis in both cases is on tractability of checking, at the deliberate expense of soundness and completeness. That is, neither tool guarantees the absence of the errors it checks for, or that the errors reported are genuine. One of the applications of ESC is locking-level verification, where it verifies that each thread acquires locks in ascending order according to some partial order on locks, and that variables declared as protected by one or more locks are read or written only while the appropriate locks are held. WARLOCK [25] and Sema [13] are two systems with similar goals, albeit with somewhat more ad-hoc, less complete, techniques. Finally, tools such as Purify [23] and Third Degree [17] instrument programs so that many runtime errors are caught and reported as soon as they occur, rather than causing confusing behavior millions of instructions later. These tools are obviously very useful, but can be used only for the fixed class of errors they catch. Also, like all testing tools, they only catch errors that occur with the inputs tested. 3. The ESC system This section describes the organization of the Extended Static Checking system. In many ways, ESC is like a compiler: it accepts an annotated program as input, and produces error messages as output. The compiler part of ESC is provided by M3TK [12], a front-end that compiles Modula-3 into an abstract syntax tree representation. If the compilation succeeds, ESC then processes the specifications in the module, which appear as distinguished comments. If this succeeds, then ESC translates each procedure to be verified into a guarded command representation. This is similar to Dijkstra's language of guarded commands [4], augmented somewhat to model exceptions [16]. It is important to note that the guarded command translation represents the semantics of a procedure, and does not necessarily correspond to executable code. In particular, a call to procedure Q in procedure P is translated as a guarded command 1

To be complete, we also disallow the use of nested procedures, and do not effectively support procedure arguments. The first restriction could easily be lifted, but the latter restriction is more basic: we do not understand how to precisely specify the effects of a higher-order procedure, especially what variables it might modify.

that “miraculously” achieves the postcondition given in Q’s specification. If Q does not actually meet this specification, P’s verification may succeed, but Q’s will fail. Thus, our methodology is modular, so we need only consider how the system scales with the size of individual procedures, not with the size of an entire program. Runtime errors are represented in the guarded command translation as a new command wrong; the translation asserts that error conditions such as dereferencing a null pointer lead to wrong, just as a compiler might generate runtime checks for the same conditions. ESC then uses weakest precondition equations to calculate a verification condition whose validity implies that the procedure, if it terminates, achieves its postcondition and does not execute wrong. ESC then records some information about the correspondence between names in the program and names used in the verification condition, and passes the verification condition to the Simplify theorem prover, which attempts to prove it. If the proof succeeds, the procedure met its specification; if the proof fails, Simplify produces a counterexample context, and ESC uses the saved naming information to print this context in a way the programmer can understand. 4. Specification Language The ESC specification language is in many ways a fairly standard specification language, bearing many syntactic similarities to a Larch [8] interface language. A procedure specification includes a list of variables the procedure may modify, a precondition, and a postcondition; the postcondition relates the values of modified variables in the states before and after execution of the procedure, and may specify both normal and exceptional outcomes. The predicates used in these specifications are general first-order formulas over terms denoting Modula-3 values; the syntax for these formulas attempts to mimic Modula3 syntax as much as possible. Unlike Larch, ESC does not provide a facility for specifying underlying mathematical models in a separate programming-language independent form. Since our system is aimed at specifications of fairly simple properties, we expect that a few built-in mathematical concepts, especially functionally updatable maps, will suffice for most purposes. If they do not, the user can define and axiomatize new function symbols, but these appear in the same file as the user code, along with the rest of the specification. In the kinds of simple verifications we have attempted, we seldom use this facility. The specification language also allows several kinds of invariants. Our methodology for generating verification conditions requires the programmer to annotate most loops with loop invariants, though we hope to infer many of these automatically and have made some progress toward this end. Users may enter assertions that a predicate must hold at a particular point in a program. (The programmer may also direct ESC to assume without proof that a predicate holds at some point; this can be used to turn off error messages that the programmer believes to be spurious.) Finally, an invariant expresses a predicate about some state variables that must be preserved by all procedures that modify those variables. The more novel aspects of the specification language deal with abstraction. Modula-3 is a modular language; its interfaces are supposed to hide implementation details, allowing a range of implementation choices and protecting clients from implementation changes. But if an interface hides the state that is actually manipulated, in what terms can its procedures be specified? We have chosen the fairly standard approach of allowing the interface to declare abstract variables that represent the manipulated state, and to specify procedures in terms of these abstract variables. The implementation’s specification will give an abstraction function to connect the abstract variables with the concrete implementation. One thorny issue concerns the interpretation of the modifies clause of a procedure specification, the part of the

specification that limits the set of variables that an implementation may modify. A specification appearing in an interface must necessarily describe permissible modifications in abstract terms, but the implementation will modify concrete variables. To do sound verifications, we must define a precise correspondence between abstract and concrete modifies clauses. Things get even more complicated when, as is often the case for real programs, an abstract variable and its concrete representation are visible in the same scope. Consider, for example, an implementation of some method m1 of an object type that invokes some other method m2 of the type; m2 is specified as changing abstract variables, but the corresponding concrete variables are visible in the implementation of m1. Leino's thesis [14] arose from such considerations and is the current basis for our approach; work on these issues continues. Finally, the specification language contains constructs supporting locking-level verification. Axioms can declare a partial order on locks, and variables can be declared to be protected by a set of locks. The methodology supports single writers and multiple readers: holding any lock in a locking set allows reading of a protected variable, but all locks in the set must be held to write the variable. 5. Theorem Proving Our goal of making ESC verification “feel” much like type checking to programmers has imposed stringent requirements on the underlying theorem prover; it must be both automatic and fast. In addition, we were somewhat surprised by the extent to which even simple properties require fairly powerful proving. In particular, the prover must sensibly handle quantified formulas: predicates such as ∀i: 0 ≤ i < n: a[i] ≠ NIL

,

where a is an array of pointers, might reasonably appear even when one is only trying to prove the absence of NIL dereferencing. Thus, much of the actual effort in the ESC project has been devoted to theorem-proving research. We knew of no extant prover that met our requirements. Many interactive provers are quite powerful, but require the user to develop proof scripts or lemma libraries. Making effective use of such systems is a delicate art, and we didn't want to make acquisition of such skill a prerequisite to the use of our system. Existing automatic provers are often resolution-based, and cannot effectively incorporate decision procedures for important domain theories, such as satisfiability of linear inequalities. Since an important goal of our system is array bounds checking, we feel that such specialized algorithms are crucial for adequate performance. The ESC theorem prover is called Simplify; it is an interesting prover in its own right, and will be described fully in a separate document. We believe that it has been very useful to have a theorem prover that could be tailored to the ESC project. We would have been willing to adjust the interface of the prover to support functionality specific to ESC, but this turned out to be unnecessary; predicate calculus is a quite general interface. The more important advantage of having an in-house prover is that we could optimize its performance for the verification conditions ESC needed to prove. The results of such efforts have at times been striking — for example, we recently succeeded in reducing the time required for a particular proof from about 24 hours to about 5 minutes. 2 The prover may be considered to have two parts: a set of cooperating decision procedures [22] that encapsulates knowledge about important theories, and a search procedure that manages the search for a 2

Using a 233 Mhz Digital Alpha workstation with sufficient memory to prevent paging; the maximum process size is approximately 50 MBytes.

proof. Simplify contains decision procedures for several theories, including equality with uninterpreted function symbols, linear inequalities over integer variables, and partial orders. The internal interfaces have been designed to make this set extensible: each theory implements one or more subtypes of literal representing assertions relevant to the theory, and the rest of the system treats literals abstractly. The theory of partial orders was added after the others to speed up a particular application of the prover, providing some evidence that this extensibility works usefully. To prove a formula F valid, the prover shows by exhaustive search that ¬F is unsatisfiable. In essence, the formula to be satisfied is transformed into conjunctive normal form (a conjunction of clauses, where a clause is a disjunction of literals), and every conjunction containing one literal from each clause is shown to be inconsistent. Of course, optimizations and heuristics are used to decrease the cost of this exponential search. If we have to satisfy (A1 ∨ A2) ∧ (B1 ∨ B2) ∧ (C1∨ C2) and have asserted literals A1 and B1 from the first two clauses and found their combination unsatisfiable, there is no need to assert a literal from clause C. Instead, the prover can move on to another combination, perhaps A1 and B2. To make this backtracking effective, the decision procedures must provide efficient procedures to save and restore their state. The search we have described would imply an exponential algorithm; in general, of course, the satisfiability of first-order formulas is undecidable. The undecidability arises from quantified formulas, since, for example, ∀x P(x) is satisfiable only if P is satisfiable for every possible instantiation of the quantified variable x. The prover treats each universally quantified formula as a literal of a matching theory. Asserting a matching literal for a quantified formula never yields a contradiction directly; instead, it adds one or more matching rules derived from the formula to a set of enabled rules. The search procedure periodically searches for instantiations of the enabled rules. These instances yield new literals or clauses that must be satisfiable if the original formula is satisfiable. Thus, the prover cannot be complete, since matching may produce an infinite set of instantiations (consider ∀x f(x) = f(f(x)) ). It stops after a heuristic limit on effort is reached, and cannot distinguish between an invalid formula and a valid one that would be proved if more effort were applied. In practice, the heuristic limit is quite effective, and we rarely see valid formulas that we cannot prove because of the limit on matching effort. Finally, we note an important heuristic used in the prover. There is no reason that the clauses in the conjunctive normal form of a formula have to be examined in any particular order, and some orders might be much better than others. This observation is particularly important in the presence of matching, since some matching rules may produce non-unit clauses when instantiated, implying further case splits. It is not uncommon to find that matching produces a set of as many as 100 clauses, in which one clause is unsatisfiable. If that clause were examined first, a contradiction would be found quickly; if examined last in a naive manner, the proof could take an eternity. This problem is reminiscent of the way in which variable order can affect the efficiency of algorithms based on binary decision diagrams [2]. Simplify copes with clause ordering by adaptively discovering important clauses, reordering the clause list so that clauses that have produced contradictions on previously explored branches of the search tree are preferred. In practice, this often greatly improves the performance of the search, especially when compared to the worst-case exponential behavior.

6. Examples of Use ESC has been used on a number of examples. This section will briefly describe a few of these. It is interesting to note that while our goal is to use program verification technology for simple checks, the framework is general enough to allow more complete verification, limited only by the power of the theorem prover. For some simple modules we have verified full functional correctness. It may be necessary to do full verification of some low-level types such as container types in order to do “ESC verification” of their clients. For example, a client of a stack of pointers may need to prove preservation of an invariant that all elements of the stack are non-NIL in order to prove the safety of a dereference of a pointer obtained by popping the stack. The source code (unfortunately, not yet including the ESC specifications) for the sequence and streams packages discussed below are obtainable through the Modula-3 home page [18]. 6.1 Sequence Sequence is an interface of the standard Modula-3 library that implements a dynamically expandable array, much like the standard array type of CLU [15], allowing operations like “add or remove an element from the high or low end of the array.” This example is notable because ESC discovered bugs in publicly released code. The author of the code had been quite conscientious, building a detailed test suite. Unfortunately, this suite used sequences in a stack-like manner, matching addhi's and remlo's. One of the detected bugs only arose with queue-like use of the data type. This is a reminder of the qualitative difference between testing and verification. Another point is that this bug was detected with a simple verification using fairly rudimentary specifications. This specification was later enhanced to a full specification, when theorem prover performance improvements made verification of the more detailed specification practical. This suggests that ESC can effectively support different levels of aspiration. 6.2 Simplex We wished to attempt verification not only of library components, but also of applications. ESC and Simplify have been convenient and interesting sources of examples. The decision procedure for linear integer inequalities uses the simplex algorithm [3], and the module implementing it is therefore called Simplex. The basic operation of the simplex algorithm is a pivot, the exchange of a dependent and independent variable. We verified that the Pivot procedure implementing this operation does not dereference NIL or violate array bounds. This proof requires the assumption of a detailed validity condition on the data structures used by the simplex algorithm; we also prove that Pivot and many of the other procedures of the module maintain this condition, and that it is established by Simplex's explicit Init procedure. This proof is notable because it has stretched the scale of what we can do. Pivot is a moderate-sized procedure, perhaps 60 lines in length, and the validity condition has 13 conjuncts of varying complexity. (This is the example mentioned in Section 5.2, where the time for the verification has been decreased from 24 hours to about 5 minutes.)

6.3 ParseSpec With the ParseSpec module we attempted a rather different flavor of verification. ParseSpec is the module that parses the ESC specification language. It accepts text strings containing specification language input and produces either syntax error messages or an S-expression-based representation of the abstract syntax of the input. The comments of the ParseSpec interface describe, in fairly formal English, the syntax of the specification language and the structure of the S-expressions that result from parsing them. We decided to completely formalize some of these comments, defining a predicate for the wellformedness of an S-expression representing specification language abstract syntax. This well-formedness condition closely parallels the grammar for the language. The specification for the main parsing procedure then ensured that it returned a well-formed output if it didn't raise an error exception. Modula-3 provides S-expressions via a library package; the use of this package relies on runtime type discrimination to determine, for example, whether the car of a cons cell, whose static type is the generic pointer type REFANY, is actually an atom, an integer, or another cons cell. Thus, the well-formedness condition makes statements about the runtime types of objects. Since our translation captures the full semantics of the language, including type information, we can verify such assertions, despite the fact that they seem fairly removed from our original targets of NIL dereferences and array subscript errors. Somewhat surprisingly to us, this verification detected some subtle bugs. The common form of one of these errors was that the grammar defined a non-terminal as a keyword followed by a nonempty list of instances of another non-terminal, but the implementation did not return an error in the case of an empty list. The formalization of the well-formedness of an abstract syntax tree S-expression included the requirement that the list was nonempty, and the prover detected that the implementation could return an empty list. This is another argument for verification over testing; it is easy to imagine that disciplined testing of a parser might exercise all productions on correct inputs, but it is harder to see a general testing strategy that would have detected a failure to produce a syntax error, as in this example. 6.4 Locking-level verification of readers and writers The last verification we mention is that of the IO streams package of Modula-3. In this package, we attempted to verify only that the implementations obeyed a locking-level discipline: that each thread acquires locks in an ascending order, and that variables are accessed only when appropriate locks are held. These proofs are fairly easy, but the extensible object-oriented nature of the streams package, and complexity of some of the stream implementations, have presented methodological and performance difficulties that we are only now coming to grips with. For example, this verification led us to work hard on the problem of dynamic dependencies. It is often the case that some abstract component of an object A cannot be represented directly by concrete fields of A, but must instead be represented indirectly by pointer to some other object B. It is now the case that abstract state of A may be changed either by changing the state of B or by changing the pointer in A to point to a different object. Dynamic dependencies provide a methodology for stating the relationship between A and B precisely enough for the theorem prover to interpret.

7. Conclusions We believe our progress with ESC shows great promise for the use of program verification techniques in proving the absence of simple “bookkeeping” errors in programs. Advances in machine speed and theorem-proving techniques are making automated approaches that would have been too costly ten years ago much more attractive today. We hope that such tools will become part of every working programmer’s tool kit in the not-too-distant future. Acknowledgments This paper summarizes contributions by a number of people over several years. Greg Nelson has been the project leader; contributors other than myself include Damien Doligez, Rustan Leino, Jim Saxe, Steve Glassman, François Bourdoncle, and George Necula. Apologia Due to length considerations, this paper is necessarily an incomplete description of the ESC system. Since it is also the only paper we have published about the system at this time, we cannot yet direct the reader to others sources for more details. In-depth descriptions of various parts of the system are in preparation; we trust that the interested reader will bear with us until we can complete those longer treatments. Bibliography [1]

Robert S. Boyer and J Moore. A Computational Logic. Academic Press, New York, NY, 1979.

[2]

R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers, C-35(8):677-691, August, 1986.

[3]

George Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, New Jersey, 1963.

[4]

Edsger W. Dijkstra. A Discipline of Programming. Prentice Hall, Englewood Cliffs, NJ, 1976.

[5]

David Evans. Using specifications to check source code. MS thesis, Massachusetts Institute of Technology, 545 Technology Square, Cambridge, MA 02139, June 1994.

[6]

Robert Floyd. Assigning meanings to programs. Mathematical Aspects of Computer Science, XIX:19-32, 1967. In Proceedings of Symposia in Applied Mathematics, AMS, Providence, RI.

[7]

David Guaspari, Carla Marceau, and Wolfgang Polak. Formal verification of Ada programs. Technical Report TR-90-0007, Oddysey Research Associates, 301 Dates Drive, Ithaca, NY 14850, April, 1990.

[8]

J. V. Guttag, J. J. Horning, and J. M. Wing. Larch in five easy pieces. Research Report 5, Digital Equipment Corporation Systems Research Center, July 1985.

[9]

C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576-580,583, October 1969.

[10]

C. A. R. Hoare. Proof of correctness of data representations. Acta Informatica, 1(1):271-281, 1972.

[11]

Daniel Jackson. Aspect: A Formal Specification Language for Detecting Bugs. PhD thesis, Massachusetts Institute of Technology, 545 Technology Square, Cambridge, MA 02139, June 1992.

[12]

Mick Jordan. An extensible programming environment for Modula-3. In Proceedings of the Fourth ACM SIGSOFT Symposium on Software Development Environments, ACM, ACM Press, 1990.

[13]

Joseph A. Korty. Sema: a lint-like tool for analyzing semaphore usage in a multithreaded UNIX kernal. In Usenix Winter Technical Conference Proceedings, pages 113-123, Berkeley, CA, 1989. Usenix Association.

[14]

K. Rustan M. Leino. Toward Reliable Modular Programs. PhD thesis, California Institute of Technology, Pasadena, California, January 1995.

[15]

B. Liskov, R. Atkinson, T. Bloom, E. Moss, C. Schaffert, R. Scheifler, and A. Snyder. CLU Reference Manual, volume 114 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, 1981.

[16]

M. S. Manasse and C. G. Nelson. Correct compilation of control structures. Technical report, AT&T Bell Laboratories, September, 1984.

[17]

Louis Monier and Jeremy Dion. Third Degree home page. http://www.research.digital.com/wrl/projects/om/third.html.

[18]

Modula-3 home page. http://src-www.pa.dec.com/SRC/external/modula-3/html/home.html

[19]

D. R. Musser. Abstract data type specifications in the AFFIRM system. In Proceedings of the Specification of Reliable Software, pages 47-57, April 1979.

[20]

John Nagle and Scott Johnson. Practical program verification; automatic program proving for embedded software. In Conference Record of the Tenth ACM Symposium on Principles of Programming Languages, pages 48-58, ACM, January 1983.

[21]

Greg Nelson, editor. Systems Programming in Modula-3. Prentice-Hall, Englewood Cliffs, NJ, 1991.

[22]

Greg Nelson, Techniques for Program Verification, PhD thesis, Stanford University, 1981. Available as Xerox Palo Alto Research Center Research Report CSL-81-10, June, 1981.

[23]

Pure Software, Sunnyvale, California. Purify Version 1.1 Beta A User Manual. 1992.

[24]

Stanford Verification Group. Stanford Pascal Verifier user manual. Technical Report STAN-CS79-731, Stanford University Computer Science Department, March 1979.

[25]

Nicholas Sterling. WARLOCK -- A static data race analysis tool. Conference Proceedings, pages 97-106, San Diego, CA, Winter 1993.

In USENIX Technical