Compiler Support for Extensible Languages - Semantic Scholar

Compiler Support for Extensible Languages Jan Bosch University of Karlskrona/Ronneby Department of Computer Science and Business Administration S-372 25 Ronneby, Sweden [email protected] http://www.pt.hk-r.se/~bosch Abstract The use of a rigid general purpose language is, by an increasing part of the software engineering community, no longer considered to be the optimal solution. The requirements on a programming language are not static, but change over time and depend on the application domain in which the language is used. To address this, one can recognise two main approaches, i.e. application domain languages and extensible language models. In this paper, we describe our experiences with our extensible object-oriented language, i.e. the layered object model (LayOM). As part of an ongoing research project, we investigate language extensions required for the expression of object-oriented frameworks. In this paper, we present two such language extensions, i.e. a design pattern and a concept from the process control system domain. We have identified that compiler support for extensible languages should fulfil requirements of managing complexity, maintainability, extensibility and reusability. Our LayOM compiler is based on the delegating compiler object approach, providing a structural decomposition dimension in addition to the traditional functional decomposition. This approach has proven to be very beneficial and fulfils the aforementioned requirements rather well.

1 Introduction Although the way software systems are constructed is changing constantly, many underlying principles have remained constant. One example of such a principle is the use of a single general purpose programming language for programming the complete software system. Software engineers considered it most productive to work within the context of a single language model for all subsystems, independent of the particular characteristics of the subsystems. Independent of the domain for which the software system was constructed, the same programming language is used. Lately, one can recognise a development in which the use of a rigid general purpose language is no longer necessarily considered to be the optimal solution. The use of application domain languages is increasing in domains where general purpose languages clearly lack expressiveness, such as graphical user interfaces and robot languages. The application domain language is designed such that the main concepts in the application domain have equivalent concepts in the programming language. Each application domain, or even science domains, has an associated paradigm that is used by the experts working in that domain. A paradigm [Kuhn 62] can be defined as a set of related concepts with underlying semantics [Bosch 95c]. For the domain experts, the paradigm provides the ‘language’ to talk about phenomena in the domain. The concepts represent relevant abstractions in the domain. A paradigm is not a static entity, but a dynamic complex of related concepts that is changing constantly, in the number of concepts but also in the semantics of existing concepts. When constructing executable specifications of applications in an application domain, the software engineer has to convert the concepts of the application domain into the concepts (or constructs) supported by the programming language. The difficulty of the translation process is depending on the ‘conceptual distance’ between the domain and programming language concepts, i.e. the semantic gap. An important lesson learned by the software engineering community is that minimising the semantic gap is highly beneficial and improves the understandability and maintainability of software. Application domain languages are a logical consequence of this conclusion, since these languages aim at a one-to-one relation between the concepts in the application domain and the programming language, but also other approaches exist. When one tries to decrease the semantic gap between the domain and programming language, two fundamental approaches are available, the revolutionary and the evolutionary approach. The revolutionary approach discards the concepts in the general purpose programming language and starts from scratch, basing the language design solely on

1

the concepts in the application domain, i.e. application domain languages. The evolutionary approach starts with a general purpose language model that is extended with application domain concepts. Thus, the language model is extensible with constructs that, among others, may represent application domain concepts. In [Bosch 95c], we introduced the notion of paradigm extensibility to refer to this principle. Both approaches have advantages and disadvantages. An important advantage of the evolutionary approach is that software developed using the extensible language model can relatively easy be integrated with other software developed using the same basic language model, but with different extensions. Since virtually all software systems cover multiple application domains, integration is an important property. A disadvantage, however, is that extensions of the language model are required to uniformly extend the semantics of the existing language constructs. This limits the possible extensions of the language model. Our experience with extensible language models is concentrated around the layered object model, an extensible object model that uniformly extends the traditional object model with new components. We have defined extensions of the object model to address several types of lacking expressiveness. In this paper, two object model extensions related to object-oriented framework specification are described. One extension is application domain independent provides support for the representation of design patterns, whereas the second extension is concerned with the representation of the control relation between a controller and a process in the domain of process control, i.e. an application domain specific extension. The remainder of this paper is organised as follows. In the next section, the requirements on compilation techniques used for application domain languages and extensible language models that we experienced to be important are defined. In section 3, the layered object model and the aforementioned extensions to the layered object model are described. Section 4 discusses the implementation of these and other extensions using the delegating compiler objects approach. The paper is concluded in section 5.

2 Requirements on Compilation Techniques On compilation techniques that are used for the specification of application domain languages and extensible language models stronger requirements have to be put concerning the ‘software engineering’ aspects of the techniques. The specification for these types of languages tends to change more frequently and in an environment where one application domain language is defined more are likely to follow. From our experience with compilation techniques with compiler techniques for extensible language models, we have recognised the following requirements to be important.

• Managing complexity: In our experience, it is primarily the perceived, conceptual complexity of the compiler specification has to be managed. The layered object model (LayOM) consists of six primary components and several secondary components. Organising the specifications of the various components in such a way that the lexical, grammatical and code generation specifications can be found for each component and the relation between the different specifications is clear we have experienced to be potentially complex.

• Maintainability: Maintaining components of LayOM draws on the same issues as the previous point. When maintaining, for example, the code generation of a LayOM method, the software engineer is concerned with the grammar specification for the method, the parse graph node classes potentially instantiated when parsing a method and the output stream(s) to which the output has to be written. All these specifications have to be organised in such a way that it is easy to access and distinguish them from specifications for different parts of the compiler specification.

• Extensibility & reusability: New language extensions or application domain languages very often contain parts that have already been defined for existing extensions or languages. It is highly beneficial to be able to reuse these existing parts in the new specification and the compilation techniques should provide support for that.

3 Layered Object Model In this section, we present the layered object model (LayOM), an extensible object model meaning that the object model itself can be extended with new components. The extensibility of the object model can be used for the representation of new concepts, such as inter-object relations, design patterns and object communication patterns. In this paper we present some extensions for object-oriented frameworks. These extensions exist in two categories, general extensions for the representation of object-oriented frameworks and application domain specific extensions for the domains for 2

which frameworks are developed. In the next section, the general properties of the layered object model are briefly described. In section 3.2, some example extensions are described.

3.1 Layered Object Model The layered object model is an extended object model, i.e. it defines additional components next to the traditional object model components such as layers, states and categories. In figure 1, an example LayOM object is presented. The layers encapsulate the object, so that messages send to or by the object have to pass the layers. Each layer, when it intercepts a message, converts the message into a passive message object and evaluates the contents to determine the appropriate course of action. Layers can be used for various types of functionality. Layer classes have been defined for the representation of relations between objects [Bosch 95a], design patterns and object communication patterns. In this paper, the use of layers for presenting multiple client interfaces is discussed.

Figure 1. A LayOM object A LayOM object contains, as any object model, instance variables and methods. The semantics of these components is very similar to the conventional object model. The only difference is that instance variables, being objects, can have encapsulating layers adding functionality to the instance variable. A state in LayOM is an abstraction of the internal state of the object. In LayOM, the internal state of an object is referred to as the concrete state. Based on the object’s concrete state, the software engineer can define an externally visible abstraction of the concrete state, referred to as the abstract state of an object. The abstract object state is generally simpler in both the number of dimensions, as well as in the domains of the state dimensions. We refer to [Bosch 96a] for a more detailed description of abstract object state. A category is an expression that defines a client category. A client category describes the discriminating characteristics of a subset of the possible clients that should be treated equally by the class. The client categories are, among others, used by the layers that present the interfaces to different clients. If the sender is a member of the client category, the message is subject to the semantics of the specification of the layer. The methods, states and categories defined by an object o , either directly or through inheritance and delegation, form the interface I o = { M o ∪ S o ∪ C o } where M o is the set of methods of object o , S o the set of abstract states and C o the set of client categories defined by o . The object interface forms the externally visible part of the object. Depending on the object, the interface may consist of several elements. A layer, as mentioned, encapsulates an object and intercepts messages that are sent to and sent by the object. Depending on its type, a layer can perform several kinds of behaviour, either in response to a message or pro-actively. For example, a number of layer types have been defined for the representation of structural relations between objects. Structural relation types define the structure of an application. A class uses the structural relations to extend its behaviour and the class can be seen as the client, i.e. the class that obtains functionality provided by other classes. Generally, three types of structural relations are used in object-oriented systems development: inheritance, delegation and part-of. Orthogonal to the relation types one can recognise two additional dimensions of describing the extended behaviour of an object, i.e. conditionality, and partiality. Conditionality indicates that the reusing object limits the

3

reuse to only occur when it is in certain states. Partially indicates that the reusing object reuses only part of the reused object. Several structural relation layer types have been defined, among others, the partial inheritance layer. In figure 2, the implementation of this semantics is illustrated using a part of an industrial control system containing a valve. The Valve class inherits from class Actuator. In LayOM this is represented by the partial inheritance layer which is the second layer of class Valve in figure 2. There is the most outer layer, shown around layer pin which is shown around Valve. All inheritance layers create an instance of the inherited super-class. In figure 2, an instance of class Actuator is shown. The layer contains a message handler, that, for each received message, determines whether the message is passed on inwards or outwards or that it is redirected to the instance of class Actuator. In the figure, an incoming message is shown. The message is reified and handed to the message handler. The message handler will read the selector field of the message and compare it with the (partially) inherited interface of class Actuator. If the selector is part of the set of interface elements, the message is redirected to the instance of class Actuator (situation (b) in figure 2). If the selector does not match with the interface of class Actuator, it is not redirected but forwarded to the next layer (situation (a) in figure 2).

Figure 2. The PartialInherit layer For an more extensive description of the semantics of the other relation types we refer to [Bosch 95a, Bosch 95b].

3.2 Example: Extensions for Object-Oriented Frameworks Currently, we are involved in a research project in cooperation with two industrial partners, i.e. EC-Gruppen and Europolitan, to investigate the requirements on programming languages representing object-oriented frameworks. The project is based on the observation that traditional object-oriented languages are unable to express several important aspects of object-oriented frameworks. The required expressiveness can be divided into two categories:

• Framework concepts: An object-oriented framework consists of a set of cooperating classes and can be reused and specialised by the user of the framework. However, a framework cannot be used arbitrarily. The user has to comply to certain rules in order not to violate the assumptions based on which the framework was designed. However these restrictions (sometimes referred to as conventions) cannot be expressed in traditional object-oriented languages. A second issue is the increasing use of design patterns to describe frameworks. Although design patterns are very expressive and useful during analysis and design, they can generally not be expressed using a traditional object-oriented language. Both the language extensions for framework conventions and the design patterns are general in the sense that frameworks for various domains can make use of them.

• Application domain concepts: An application domain contains the concepts used to discuss and model systems existing in that domain. Most of those concepts can be expressed using traditional object-oriented concepts such as classes, inheritance, objects, etc. However, certain concepts cannot be expressed and require application domain specific language extensions. This leads to a description of the application domain that consists of a set of cooperating classes and a set of language extensions that are used to describe the semantics of the classes in the framework. In the following sections, we present one example from each of the aforementioned categories. In the first section, we describe the representation of the facade design pattern using a layer type whereas in the second section, we present the representation of the controls relation using LayOM.

4

3.2.1 The Facade Design Pattern The Facade design pattern is used to provide a single, integrated interface to a set of interfaces in a subsystem. Facade defines a higher-level interface that simplifies the use of the subsystem. The structure of a subsystem incorporating the Facade design pattern often looks as in figure 3. The subsystem is defined as a class containing the classes that are part of the subsystem. The function of the subsystem class is basically twofold. The first is the coordination between the classes in the subsystem, whereas the second function is to provide an integrated interface to clients of the subsystem. By defining the subsystem as a class, the first function can be dealt with in an acceptable manner. However, the second function, which often is the forwarding of messages to objects within the subsystem is problematic. The traditional approach is to define a method in the subsystem class that forwards a message sent by a client to the appropriate object inside the subsystem. The disadvantage of this approach is that for every message send by a client the subsystem class has to define a method. The number of methods might easily grow very large and defining these methods is much work for just forwarding a message. A second disadvantage is that the traceability of the design pattern in the implementation is lost.

facade

Figure 3. Structure of the Facade design pattern As a solution to the identified problems associated with the traditional implementation approach one can, within the layered object model, make use of the Facade layer type. The Facade layer type provides the functionality of forwarding messages to objects that are part of the subsystem. A layer of type Facade is defined as follows: : Facade(forward + to , forward + to , ...);

The behaviour of a Facade layer is the following. It matches the selector of the message with the message selectors defined in +, the message will be forwarded to the object . The Facade layer can contain several forwarding elements, allowing the subsystem class to forward to several objects using a single Facade layer. A subsystem class using a Facade layer can be defined as follows: class facade layers face : Facade(forward mess1, mess2 to PartO1, forward mess3 to PartO2); PartO1 : PartOf(ClassOfO1); PartO2 : PartOf(ClassOfO2); ... end; // class facade

As described in section 3.1, the layered object model models relations between objects as layers. These relations include the part-of relation, which is implemented as the PartOf layer type. The Facade layer forwards messages matching mess1 and mess2 to PartO1, which is an object contained in the subsystem. Messages matching mess3 are forwarded to object PartO2.

The use of the Facade layer type has two main advantages over the traditional implementation techniques. The first advantage is that the software engineer does not have to define a, possibly large, number of trivial methods that just pass on a message to one of the objects in the subsystem. The second advantage is that there is a direct correspondence between the design pattern that is used at the design level and the Facade layer defined in the implementation.

3.2.2 Control Systems Domain: Controls Relation In process control, the system, i.e. the controlled and the controlling part, is build by using two types of components: process and controller. The process has one input, the desired value x, and one output, the actual value y. The controller has two inputs, the desired value r and the measured value y and one output, the set value x. The set value of the controller is connected to the desired value of the process. 5

In figure 4, a design model of a part of a control system framework is shown. The most important classes in this application are the controller and the process. Both inherit from class transformer. Between the process and the controller, a relation of type controls exists. Class transformer encapsulates functionality for transforming an input to an output. The process contains a sensor object to interact with the real-world process it represents, whereas the controller has an actuator object as a part to control the process.

Figure 4. Control system design model A controls relation can be of several types that all behave differently, i.e. proportional control (P), integrating proportional control (PI), differentiating proportional control (PD) and integrating and differentiating proportional control (PID). Here, we only discuss the PID control relation type. PID control combines the advantages of integrator control, i.e. zero statistical deviation, and of differentiator control, i.e. quick response, are required to obtain satisfactory process control. This PID control relation is mathematically represented by H c = K c ⋅ ( 1 + 1 ⁄ ( jωτ i ) ) ⋅ ( jωτ d + 1 ) . The PID control relation is expressed in the language model using the layer type PID-control. The syntax for this layer type is the following:

: PID-control( , , , , , )

In this definition, represents the factor K c , represents τ i and represents τ d . The class Controller may use a layer of type PID-control as shown

below: class Controller layers c1 : PID-control( setInputValue, pp.setInputValue, pp.outputValue, 100, 5, 0.74 ); /* Structural relations */ inh : Inherit(Transformer); anA : PartOf(Actuator); ... end; /* Controller */

The use of the PID-control layer type has several advantages over the conventional implementation. If the control behaviour would be implemented as part of class Controller the control specific behaviour would be mixed with the other functionality of the class. Also, the functionality of the relation would have become part of the object and the explicit modelling relation between the analysis and design model and the implementation model would have been lost.

4 Implementing LayOM using Delegating Compiler Objects The implementation of the layered object model is based on the notion of delegating compiler objects (DCOs) [Bosch 96b]. A DCO is an object that compiles a part of the syntax and, as shown in figure 5, consists of one or more lexers, one or more parsers and a parse graph. The nodes in the parse graph have the ability to generate code for itself. In case of the LayOM class compiler (see also figure 6), it consists of a class DCO, method DCO, state DCO, category DCO and a DCO for each layer type. Each DCO definition results in a class and a DCO object can instantiate another DCO and delegate control to it. The delegated DCO will perform its task and return control to the delegating DCO when it is finished. The LayOM compiler DCOs generate C++ output code. LayOM code is either a class or an application. A

6

LayOM class is compiled into a C++ class and a LayOM application is compiled into a C++ main program. The generated C++ class can be incorporated in any C++ program or compiled independently to an executable application.

Figure 5. Overview of a DCO-based compiler An advantage of using the DCO approach is that it supports maintainability and extensibility of the language rather well. When the language should be extended with a new concept, a DCO defining the syntax and code generation information of that particular concept is defined. The new DCO is added to the set of DCOs in the existing compiler and it can be used. Except for some minor modifications in the DCOs that should instantiate the new DCO, no changes are required.

Figure 6. DCO-based LayOM compiler The DCO approach is supported by a tool set (LETOS) that provides automated support for generating compilers based on DCOs. A DCO, as mentioned, consists of one or more lexers, one or more parsers and a parse graph. Lexical and grammatical specifications can be reused and modularised through the use of lexer delegation and parser delegation, respectively. For more information on DCOs and the LayOM compilers we constructed using DCOs we refer to [Bosch 95b, Bosch 96b]. When evaluating the DCO approach based on the requirements described in section 2, we can conclude that the requirements, up to a large extent, are met. Practical experiences with the DCO approach in students projects have shown that the complexity of compiler construction is reduced by use of DCOs, due to the improved modularisation. Maintainability is also improved, partially due to the modularisation but also because most changes affect a single DCO, restricting the changes to the specifications in that particular DCO. Reusability and extensibility are improved for lexical and grammatical specifications as a result of the lexer delegation and parser delegation.

5 Conclusion One can recognise a development within software engineering in which the use of a rigid general purpose language is no longer considered to be the optimal solution. It becomes generally recognised that the requirements on programming languages are not static, but change over time and depend on the application domain for which the software system is developed. One can recognise two approaches to address this problem, i.e. the revolutionary and the evolutionary approach. The use of application domain languages can be considered as revolutionary, since it discards

7

the concepts used in general purpose languages and start from the concepts in the application domain. The evolutionary approach starts from a general purpose language and extends that language model with application domain specific concepts; also defined as paradigm extensibility. A disadvantage of application domain languages is that integrating the specifications in that language with specifications from other application domains (and associated languages) tends to be rather complex. A disadvantage of the extensible language model approach is that the application domain extensions have to uniformly extend the semantics of the existing language model. In our research, we have focused on an extensible object-oriented approach, i.e. the layered object model (LayOM). LayOM extends the conventional object model with components like states, categories and layers. Currently, we are involved in a research project involving industrial partners to search for language extensions required to deal with expressing object-oriented frameworks. Two types of extensions can be identified, i.e. extensions concerning the framework itself, such as design patterns and framework conventions, and application domain specific extensions. We have presented an example from each type of extension, i.e. the Facade design pattern and the controls relation from the process control domain. While constructing compilers for LayOM, we have recognised some requirements on compiler techniques used for implementing extensible language models and application domain languages, such as managing complexity, maintainability, extensibility and reusability. The LayOM compiler is constructed using the delegating compiler object (DCO) approach. The DCO approach allows for structural decomposition of the compiler in addition to the traditional functional decomposition. The LayOM compiler consists of a DCO for each main language construct and extending the language can be done by adding a DCO. In our experience, the DCO approach and its supporting tool set LETOS fulfil the aforementioned requirements rather well.

References [Bosch 95a]. J. Bosch, ‘Relations as Object Model Components,’ accepted for publication in Journal of Programming Languages, 1995. [Bosch 95b]. J. Bosch, ‘Layered Object Model - Investigating Paradigm Extensibility,’ Ph.D. dissertation, Department of Computer Science, Lund University, November 1995. [Bosch 96a]. J. Bosch, ‘Abstracting Object State,’ accepted under revision for publication in Object-Oriented Systems, 1996. [Bosch 96b]. J. Bosch, ‘Delegating Compiler Objects - An Object-Oriented Approach to Crafting Compilers,’ in proceedings of Compiler Construction ‘96, 1996.

8