1 This is CRML - CiteSeerX

4 downloads 0 Views 120KB Size Report
Feb 25, 1994 - 1] Andrew W. Appel and David B. MacQueen. A Standard ML compiler, August 1987. Distributed as documentation with the compiler. 2] Je rey ...
Meta-Programming tools for ML Revision:

1.8

Tim Sheard and James Hook Paci c Software Research Center February 25, 1994 Compile-time re ective ML (CRML) is an extension to Standard ML providing a metalanguage for program manipulation[10]. The original motivation was the automatic generation of map and fold like combinators for arbitrary datatypes[4, 9]. The mechanisms developed to achieve this are far more general than we at rst realized, providing the foundation for a rich class of language extensions expressed as source to source translations. Any programming task where the construction of a program can be described by an algorithm can be captured directly and naturally in CRML. In such a paradigm the object language (the language of the constructed programs) and the meta language (the language in which the algorithm is expressed) can be two distinct languages. In CRML both these languages are ML. The object language is encoded in data structures which are prede ned by the system. CRML supplies a user level interface (syntactic sugar) so that the object language programs look like the programs they represent. It is the right language for expressing programs that manipulate programs. We have used CRML to implement \just plain macros", ML language processors, and experimental ML language extensions[2, 3]. An example, described later, is the extension of ML to support attribute grammar computations over ML datatypes. The class of ML extensions that can be rapidly prototyped with CRML is limited only by your imagination. The meta-programming paradigm supported by CRML supplies more than just macro-like transformations. CRML can be used to generate programs that require semantic as well as syntactic information, and to encode new abstractions not possible in ordinary functional languages. CRML is implemented as an extension to the Standard ML of New Jersey compiler[1]. 1

This is CRML

Like the original ML and its French cousins, CRML uses an \object-language" facility to support re ection. Unlike these other tools, however, the object-language is just another instance of CRML (the meta-language). There is a distinguished set of datatypes, including pat, exp and typ, that encode the abstract syntax of the object-language. CRML uses \object brackets" (written >) to construct object-language expressions with an embedded parser. Objectlanguage expressions can escape to the meta-language with an escape (written `), similar to the anti-quotation (,) in LISP. Most meta-programming can be accomplished using only the object bracket notation; detailed knowledge of the distinguished datatypes is often unnecessary. An example use of the object-language escape mechanism is: let val five = in end. This could be written without the escape as . It could also be written using the explicit constructors of the representation datatype as App(App(Var "f", Var "x"), IntC 5), where App, Var and IntC are constructors of exp. All these methods 1

2 are equally valid. Object brackets can also be used in patterns to match against ML object language instances. For example the meta-language statement:

case e of => y | other => other tests whether e is an if expression and returns the then clause if it is;

it returns e otherwise. The syntactic sugar that allows the embedding of the object language of ML using the concrete syntax of ML is an important contribution of CRML because it allows the direct expression of source-to-source transformations.

1.1 Macros

This mechanism is easily extended to allow for simple macros. This is done by annotating the object-bracket notation with a function of type exp -> exp. The annotation is accomplished by placing the name of that function between the opening left angle brackets. Evaluation of a macro expression is accomplished by applying the annotating function to the object language expression inside the brackets and then evaluating the result. For example, if the function incr is de ned as: fun incr e = then the expression > expands to 7+1, and evaluates to 8. Since the meta-language and object-language are really the same, it is possible to invoke the \meta-language" escape while in the meta-language (i.e. an escaped expression not surrounded by object brackets). This is interpreted as invoking a meta-meta-language processor in the top-level environment of the current declaration. If the top-level environment contains the incr function above, > ) `(incr ). This de nes macros in CRML. As illustrated in the next section, top-level escapes are also useful for \splicing" metacomputation results into programs.

1.2 Functions from datatypes

The original motivation of CRML was the systematic construction of functions from datatypes. Given the representation of a datatype we can compute the representation of a function that computes over that datatype (e.g. map, fold, equality). Using the escape mechanism, this computed function de nition can then be \spliced" back into the program and used like any other top-level function. The only weak link in the chain is obtaining the datatype representation in the rst place. This is done with a limited form of rei cation supported by CRML. For example, in CRML we can de ne a meta-program computeMap that calculates map functions. The application (computeMap "list") yields:

Thus, the map function for lists can be declared: val maplist = `(computeMap "list")

Rei cation allows computeMap to obtain the datatype declaration from the string "list".1 The techniques for de ning computeMap are given in earlier work[4]. CRML's ability to introduce new modes of abstraction into ML, not traditionally associated with either macros or parametric polymorphism, is CRML's most important characteristic. For example the construction of control combinators is a new mode of abstraction|parametric not over a type or a value, but over the de nition of the type constructors declared by datatype declarations[8]. 1

The use of strings here is ad hoc and we expect to change.

3

1.3 Experimental language extensions

Mutually recursive datatypes may be naturally regarded as grammars. Since Knuth rst introduced them, the attribute grammar formalism has been one of the most natural notations for expressing relations in grammars. An experimental extension to ML in which computations are expressed over a datatype in terms of \inherited" and \synthesized" attributes has been developed. This is expressed in CRML by transforming a set of attribute calculation speci cations into a monolithic, recursive, ML function. The attribute grammar speci cations are clear, concise, and more easily maintained than a direct expression of the calculation in ML. datatype exp = Var of string | App of exp * exp | Abs of string * exp | Int of int

attr

E

of

V arB (x)(if x B then [x] else [];[]) j AppB (xB(f1;n1 ); y(fB2;n2 ))(f1@f2 ; n1 @n2 ) (x :: B) j AbsB (x; e(f;n) )(f;n) j IntB (n)([]; [n])

For the simple datatype which represents a lambda calculus with integer constants (above left) we encode the attribute calculation (above right) which given an initial inherited attribute, (a list of bound variables B ), computes a pair of synthesized attributes. The rst attribute of the pair is the list of free variables in the expression and the second attribute is the list of all the integer constants in the expression. The speci cation consists of annotating each constructor and its arguments with patterns and expressions representing its inherited (superscript) and synthesized (subscript) attributes. For constructors the inherited attributes are patterns, and the synthesized attributes expressions. For the arguments to the constructor the situation is reversed2 . The meaning of the speci cation is a function which when applied to an initial inherited attribute list computes the pair for E . To implement the extension, the parser for CRML (a sml-yacc/lex speci cation written in CRML which produces CRML object language) is extended to capture the new syntax. The extension encodes the new syntax into the object language, and this object language is piped downstream to the rest of the compiler like any other ML program parsed by CRML. This paradigm is very exible, allowing the direct (unencoded) expression of source to source translation as means of de ning language extensions. 2

Macros-done-right

The traditional problem with macros is that their expansion is always interpreted in the environment of use|not the environment of de nition. Since the selection of the environments of interpretation and evaluation is somewhat ad hoc, we refer to this technique as \macros-donewrong". CRML currently has macros-done-wrong. An analysis of the semantics of CRML[5] raised several interesting issues about the environments of evaluation of meta-computations. These results suggest making environments rst-class entities, much as continuations are in SMLNJ. When the environment of de nition of a meta-program can be captured, meta-computation evaluation can then be done in the appropriate environment. We are currently pursuing the integration of these ideas into CRML. There appear to be interesting interactions between these ideas and the module system. In addition any variable introduced in a pattern to the left may be used in an expression to the right. This is a consequence of the left to right computation order of our implementation. 2

4 3

Typing issues

Pfenning and Lee have shown that the re ective polymorphic lambda calculus is untypable[7], so why even try? CRML does not have a decidable type system, however, meta-computations are type correct (i.e., they always produce values of type exp) and all program fragments produced by meta-computations are type checked in context by the SMLNJ typechecker. Thus, when compilation terminates the resulting program is type correct; it will not exhibit dynamic type errors. This is no more a problem in practice than the complexity of ML type inference. We have restricted \re ection" to compile-time to achieve this notion of type security. A more traditional, no-holds-bared re ection could be achieved in an ML dialect extended with dynamic types[6]. 4

Conclusion

The open architecture of the SMLNJ compiler provides a rich environment for language experimentation. CRML takes this one step further, allowing direct expression of source-to-source translation in a natural meta-language. Even those who might nd the idea of macros in ML repulsive may appreciate the additional levels of abstraction provided by compile time re ection. These abstractions include the automatic generation of algebraically well-behaved control combinators, the rapid prototyping of language extensions, and an object-language for the de nition of source to source transformations. References

[1] Andrew W. Appel and David B. MacQueen. A Standard ML compiler, August 1987. Distributed as documentation with the compiler. [2] Je rey M. Bell and James Hook. Defunctionalization of typed programs. Technical report, Department of Computer Science and Engineering, Oregon Graduate Institute, February 1994. [3] L. Fegaras, D. Maier, and T. Sheard. Specifying rule-based query optimizers in a re ective framework. In Proc. Third Intl. Conf. on Deductive and Object-Oriented Databases, December 1993. [4] James Hook, Richard Kieburtz, and Tim Sheard. Generating programs by re ection. Technical Report 92-015, Department of Computer Science and Engineering, Oregon Graduate Institute, July 1992. [5] James Hook and Tim Sheard. A semantics of compile-time re ection. Technical Report 93-019, Department of Computer Science and Engineering, Oregon Graduate Institute, November 1993. [6] Xavier Leroy and Michel Mauny. Dynamics in ML, March 1991. [7] Frank Pfenning and Peter Lee. Metacircularity in the polymorphic -calculus. Theoretical Computer Science, 89:137{159, 1991. [8] Tim Sheard. Type parametric programming. Technical Report 93-018, Department of Computer Science and Engineering, Oregon Graduate Institute, November 1993.

5 [9] Tim Sheard and Leonidas Fegaras. A fold for all seasons. In Proceedings of the conference on Functional Programming and Computer Architecture, Copenhagen, June 1993. [10] Timothy Sheard. Guide to using CRML: Compile-time Re ective ML. Technical report, Department of Computer Science and Engineering, Oregon Graduate Institute, November 1992.