A Hierarchial Approach to Formal Semantics With Application to the

0 downloads 0 Views 964KB Size Report
In any language, the meaning of some features depends only inessen- tially on the ... PL/CS by giving a detailed definition of the meaning of the value-producing expressions .... schemes. (For the moment, we ignore the possibilities of parameter passing; this ... with the following additions to allow procedure call and return:.
A Hierarchial Approach to Formal Semantics With Application to the Definition of P L / C S ROBERT L. CONSTABLE and JAMES E. DONAHUE Cornell University

We describe a means of presenting hierarchically organized formal definitions of programming languages using the denotational approach of D. Scott and C. Strachey. As an example of our approach, we give the semantics of PL/CS, an instructional variant of PL/I. We also discuss the implications of this approach to language design, pointing out some cases where the wrong choices may cause the hierarchy to collapse into chaotic rubble.

Key Words and Phrases: programming language semantics, denotational semantics, recursive functions,PL/I, PL/C, PL/CS C R Categories: 4.20,4.22,5.24

1. INTRODUCTION

One of the important goals of research in the formal semantics of programming languages is to develop techniques for improving the language design process. In particular, formal semantics may serve as a proscriptive tool, warning the language designer of unexpected complexity introduced by unfortunate compositions of language constructs. This paper describes one approach to structuring language definitions so that unnecessary complexity can be exposed, and it shows the application of this technique to the definition of an instructional variant of PL/I, P L / C S [4]. The programming language P L / C S was designed as a simple P L / I dialect that would be suitable for teaching introductory programming. Thus many features of P L / I or P L / C have been omitted and others restricted, particularly in those cases where syntactic restrictions would improve the chance of error repair. (For example, all variables in P L / C S programs must be declared at the beginning of each procedure.) A full list of P L / C S statements is given in Figure 1. We believe that a simple language like P L / C S should have a simple semantics. Using the denotational approac h of Scott and Strachey [13], we tried to produce Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee a n d / o r specific permission. This research was supported in part by the National Science Foundation under Grants DCR-74-14701 and MCS-76-14293. Author's address: Department of Computer Science, Cornel[ University, Upson Hall, Ithaca, NY 14853. © 1979 ACM 0164-0925/79/0700-0098 $00.75 ACM Transactions on Programming Languages and Systems, Vol. 1, No. 1. July 1979, Pages 98-114.

A Hierarchial Approach to Formal Semantics

99

numeric variable = arithmetic expression; [string variable -- string expression;] [SUBSTR (s, f, n) = string expression;] [array = array;] [array -- constant;] [bit variable = (condition);] [IF (condition) T H E N statement;] IF (condition) T H E N statement 1; ELSE statement 2; DO; body; END; [select-name: SELECT; WHEN (condition 1) statement 1; W H E N (condition 2) statement 2; OTHERWISE statement n; END select-name;] loop-name: DO WHILE (condition); body; END loop-name; [loop-name: DO U N T I L (condition); body; END loop-name;] [loop-name: DO index vat = expr 1 TO expr 2 BY expr 3;

body; E N D loop-name;] L E A V E loop-name; [GOTO target-label; oo.

target-label:;] [ON ENDFILE G O T O D A T A E N D ; °o.

D A T A E N D :;] CALL procedure-name; CALL procedure-name (arguments); RETURN; R E T U R N (expression); [GET LIST (variables); G E T D A T A (variables); G E T DATA; G E T EDIT (variables)(formatitems);] [PUT SKIP (expr)LIST (expressions); P U T SKIP (expr)D A T A (variables); P U T SKIP (expr)EDIT (values)(formatitems);] [ASSERT (condition); A S S E R T (cond)F O R ALL index var = expr 1 T O expr 2 B Y expr 3; A S S E R T (cond)F O R S O M E index vat = expr 1 T O expr 2 B Y expr 3;] Fig. 1.PL/CS statements a simple, but complete and rigorous, definition of P L / C S . (This semantics is given in our paper [3].) In this paper, w e describe the structure of the P L / C S semantics; space limitations prevent us from presenting the entire language. Those statements enclosed in brackets in Figure 1 are not discussed in this paper. M a n y of ACM

Transactiorts on P r o g r a m m i n g Languages and Systems, Vol. 1, No. 1, July 1979.

100

R.L. Constable and J. E. Donahue

them are variations of statements for which we do supply semantics (e.g. IF (Cond) T H E N Stmt; is a variant of IF (Cond) T H E N Stmt; ELSE Stmt;) and others require more detailed descriptions than we can provide in this paper; we refer the reader to [3] for more complete details. That we were able to define P L / C S in a formal manner is hardly surprising; others (Beki~ et al. [1], Walk et al. [16], ECMA [8]) have produced definitions of either full P L / I or P L / I subsets much larger than PL/CS. However, these definitions were written primarily for an audience with knowledge of both P L / I and formal methods. The intended audience of our P L / C S definition was assumed to be far less knowledgeable in both areas; thus we have been concerned not only with "getting the definition right," but with presenting it in a pedagogically sound manner. The organization of our definition was based on the following "hierarchy principle." In any language, the meaning of some features depends only inessentially on the meaning of others. For example, the meaning of procedures probably does not depend on the arithmetic operations provided in the language. In terms of a formal semantics, this allows us to give definitions which leave some details unspecified in much the same way that well-structured programs allow some parts to be written independently of others. If our language is "simple" enough (perhaps "well-structured" is more appropriate here), then these levels of detail can be arranged in levels, so that we can proceed by defining ever more detailed sublanguages, eventually producing the definition of the complete language. This is the approach we followed in the P L / C S definition and which we describe below. Our definition requires two basic sets of values (or domains): (1) The set of states manipulated by statements in PL/CS. The meaning of each P L / C S statement is defined as a "state transformation function." (2) The set of values produced by expressions in PL/CS. The meaning of each expression will be defined by a "value-producing" function {relative to the current state, of course). Our hierarchy of language constructs is based on the degree to which these domains need to be specified in order to define a particular statement. As we see below, the simplicity of the language is captured by the degree to which we have been able to separate levels of semantic detail and avoid a "spaghetti definition." Following the details of the definition presented in Section 2, we conclude by discussing some design alternatives that, had they been chosen, would have made this "hierarchical" approach to semantics impossible. 2. THE P L / C S SEMANTIC HIERARCHY 2.1 Organization

As noted above, P L / C S can be defined by isolating several important sublanguages and presenting, in turn, the semantics of each of these sublanguages. The order we use is the following. We begin with the language of while schemes. Essentially, while schemes include the control constructs basic to structured programming: sequencing, binary selection, and iteration. This subset has a particularly elegant and easy to ACM Transactions on Programming Languages and Systems, Vol. 1, No. 1, July 1979.

A Hierarchial Approach to Formal Semantics

101

understand semantics. The next subset is the P L / C S flowchart schemes, formed by extending the while schemes with the simple (forward) transfers of control allowed in PL/CS. The introduction of control transfers causes the first refinement in the underlying state. As we describe below the simple jumps of P L / C S have a much more intuitive semantics and require a less complicated state than the more complex transfers of control of full P L / I (or PL/C). We next introduce procedure declaration, call, and return to form the language of P L / C S recursion schemes. To define the semantics of recursion schemes, we are forced to give a still more detailed description of the underlying "state" manipulated by PL/CS programs and to define the "meanings" of identifiers in programs. In the next P L / C S sublanguage, we introduce the notion of program variables, causing a further refinement of the "state" to include a "machine store" component. In this sublanguage we define the meanings of variable declarations and parameter passing to PL/CS procedures. Finally, we complete our description of PL/CS by giving a detailed definition of the meaning of the value-producing expressions in the language, as well as the semantics of assignment and function call and return. 2.2 While Schemes

The simplest P L / C S sublanguage is the language of while schemes. P L / C S while schemes are defined by the following syntax1: Stmt = I I I I

SimpleStmt IF (Ct,~ =l) THEN Stmt; ELSE Stmt; DO WHILE (Cond); Stmt; END; Stmt; Stmt; DO; Stmt; END;

For the present, we leave the chss of simple statements unspecified. However, we note in passing that simple statements compose the basic actions of programs in the language, including assignment, procedure call, and input/output commands. The meaning of each of the statements in this language is defined in terms of a function Ms which maps each statement in the language into its associated state transformation, i.e. Ms has functionality

Ms: Stmt ~ [S --,, S]. At present, because of the simplicity of this sublanguage, none of the details of the eventual structure of S need be known to give the definition. As in [15], the definition of Ms is given by presenting a clause of the definition for each syntactic category of the class of statements. (The brackets ~and~ are used to bracket the syntactic arguments as an aid to the eye.) Additionally, because Boolean expressions appear as the conditions in while schemes, we will need some minimal

J We modify t h e p r e s e n t a t i o n of p r o d u c t i o n s in a g r a m m a r by u s i n g a n a m e with a n initial capital letter, s u c h as Cond, to denote t h e class of all conditions (Boolean expressions). W e t h e n write "cond," a nonterminal, to denote an e l e m e n t of t h e class. A C M Transactions on Programming Languages and Systems, Vol. 1, No. 1,July 1979.

102

R.L. Constable and J. E. Donahue

knowledge of Me: Exp --* S --* V,

the meaning function for expressions. At present, we simply need t h a t Boolean expressions can produce Boolean values, i.e. t h a t there exists a function Meco.d: Cond --* S --~ {true, false}. T h e clauses of M s follow: M s ~ simple_stmt; ~ = some particular f: S --* S associated with each simple statement M s ~ IF (cond) THEN strut1; ELSE stmt2;D (s) = ifMeco,d [cond~ (s) then M s [stmtl;] (s) else M s ~stmt2;] (s) M s ~ DO WHILE (cond); strut; END;~ (s) ffi ifMeco,d [cond] (s) then M s [DO WHILE (cond); stmt; END;~ (Ms [stmt;] (s)) else s M s ~ strut1; stmt2;~ (s) -M s [stmt2;] (Ms [stmtl;~ (s)) M s ~ DO stmt; END; ~ (s) = M s [stmt; ] (s)

This semantics relies only on simple notions of Boolean expression, function compositions, and recursive function definitions. T o ensure t h a t the recursive definition of the meaning of w h i l e defines a function, we assume t h a t w h a t e v e r internal structure S has it is a d o m a i n (as defined in [15]). T h e n we can appeal to the least fixed point construction of [15, sec. 4.3] to give a functional meaning to the w h i l e clause. (We could also view this recursion equation as having a computational meaning in t e r m s of r e p l a c e m e n t rules [12] or we could give an inductive definition of its graph as in [2].) 2.3 F l o w c h a r t S c h e m e s

T h e simple definition of M s ~ ~ in subsection 2.2 fails as soon as we add explicit transfers of control, like the P L / C S L E A V E or GOTO. T h e difficulty t h e n is t h a t s t a t e m e n t composition is not function composition. T h i s is immediately obvious for GOTO 1; stmt; b u t is more insidious in the case of DO WHILE (cond); stmt~; END; stmte in which the stmtl contains a GOTO. In P L / C S , all transfers of control (GOTOs and LEAVEs) are restricted to be forward only, m a y not effect a transfer into a composite s t a t e m e nI t (e.g. into the arms of a conditional), and m a y not involve crossing a declaration b o u n d a r y (e.g. jumping out of a procedure). (Indeed, P L / C S has no block structure, b u t for our purposes the previously stated restrictions are sufficient.) Because of this simple form of jumping, a minor r e f i n e m e n t of our earlier S is sufficient to allow a straightforward definition. T h e basic idea is to add to the state a c o m p o n e n t specifying the context in which the program is evaluated; this context gives the meaning of all the ACM Transactions on Programming Languages and Systems, Vol. 1, No. 1, July 1979.

A Hierarchial Approach to Formal Semantics

103

s t a t e m e n t s to which control could be transferred. In particular, it provides the meaning of the next s t a t e m e n t in sequence to be executed. T h e interested reader will note t h a t this m e t h o d is a simple form of continuations described by S t r a c h e y and W a d s w o r t h [14] and by T e n n e n t [15]. T h e syntax of the new P L / C S s t a t e m e n t s allowing transfers of control is Stmt = Id: DO WHILE (Cond); Stmt; END; I LEAVE Id; T h e first rule allows loops to be labeled, the second provides the syntax for specifying exits from labeled loops. T o define the language of P L / C S flowchart schemes, we extend the meaning function Ms defined above to take an additional argument, the context in which G O T O s t a t e m e n t s are to be evaluated. T h e new meaning function is of type 2

Ms: Stmt --* Context ~ S --* S where the second a r g u m e n t is of type Context = [Id + {rest} ] --* [S ~ S]. Given some c E Context, (1) c {[id] gives the meaning of'the r e m a i n d e r of the program beginning after the loop labeled id, i.e. the meaning of the program to which a L E A V E id would transfer control; (2) c ~rest] gives the meaning of the r e m a i n d e r of the program after the current statement. T h e clauses of the definition are given below. T h e meaning of conditional s t a t e m e n t s does not change, so it has been omitted. T h e major changes involve the meaning of s t a t e m e n t composition and the meaning of while loop (because of the need to r e c o r d the label).

Ms ~simple_stmt;] (c) (s) = (c ~rest~ (f(s)) where f: S --* S is the state transformation associated with simple_stmt. So the meaning of a simple statement, simple_stmt, in a context c on state s, is the meaning of the rest of the program applied (c ~rest]) to the state produced by executing the simple s t a t e m e n t on s. T h e way in which the context is provided will be explained by subsequent equations. In these equations we use the notation c [idl