Representing Software Engineering Knowledge - Springer Link

3 downloads 97315 Views 220KB Size Report
analysis and redesign of business processes (Yu et al., 1996), based on the ideas .... company to be willing to pay according to a body shop's repair estimates, since .... researchers at AT&T Bell Laboratories (Devanbu et al., 1991; Devanbu, ...
P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

KL426-Mylopoulos

April 7, 1997

14:16

Automated Software Engineering 4, 291–317 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. °

Representing Software Engineering Knowledge JOHN MYLOPOULOS Department of Computer Science, University of Toronto, Toronto, Canada M5S 3H5

[email protected]

ALEX BORGIDA [email protected] Department of Computer Science, Rutgers University, New Brunswick, NJ 08903, USA ERIC YU Faculty of Information Studies, University of Toronto, Toronto, Canada M5S 3G6

[email protected]

Abstract. We argue that one important role that Artificial Intelligence can play in Software Engineering is to act as a source of ideas about representing knowledge that can improve the state-of-the-art in software information management, rather than just building intelligent computer assistants. Among others, such techniques can lead to new approaches for capturing, recording, organizing, and retrieving knowledge about a software system. Moreover, this knowledge can be stored in a software knowledge base, which serves as “corporate memory”, facilitating the work of developers, maintainers and users alike. We pursue this central theme by focusing on requirements engineering knowledge, illustrating it with ideas originally reported in (Greenspan et al., 1982; Borgida et al., 1993; Yu, 1993) and (Chung, 1993b). The first example concerns the language RML, designed on a foundation of ideas from frame- and logic-based knowledge representation schemes, to offer a novel (at least for its time) formal requirements modeling language. The second contribution adapts solutions of the frame problem originally proposed in the context of AI planning in order to offer a better formulation of the notion of state change caused by an activity, which appears in most formal requirements modeling languages. The final contribution imports ideas from multi-agent planning systems to propose a novel ontology for capturing organizational intentions in requirements modeling. In each case we examine alterations that have been made to knowledge representation ideas in order to adapt them for Software Engineering use. Keywords: knowledge representation, software knowledge bases, languages

1.

Introduction “...The ultimate goal of artificial intelligence applied to software engineering is automatic programming...” (Rich and Walters, 1986)

The role of Artificial Intelligence (hereafter AI) in Software Engineering (SE) has traditionally been one of offering concepts, tools and techniques for building intelligent systems which can perform—or assist in the performance of—software engineering tasks. Examples of such systems can be found as far back as Cordell Green’s seminal contributions on the application of theorem proving to automatic program generation (Green, 1969), the longterm and influential Programmers Apprentice project at MIT (Rich and Waters, 1988), or Douglas Smith’s impressive work on the synthesis of divide-and-conquer algorithms (Smith, 1985). This type of research has seen applications of theorem proving, natural

P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

292

KL426-Mylopoulos

April 7, 1997

14:16

MYLOPOULOS, BORGIDA AND YU

language and knowledge-based systems techniques, among others, to tasks such as programming and program verification, transformations of informal program specifications to formal ones, or the development of “intelligent assistants”. An influential statement on this direction of research can be found in the Report on a Knowledge-Based Software Assistant, (Green et al., 1983) and the area has been surveyed several times over the years (notably, in (Green et al., 1983; Mostow, 1985; Rich and Waters, 1986; Barstow, 1987) and (Lowry and Duran, 1989)). In short, this line of research has had impressive successes, has influenced the SE community at-large and characterizes much of todays research on AI&SE, as exemplified by the programmes of KBSE conferences through the years. This paper focuses on another tangible and important contribution AI can make to SE, even if it promises no “intelligent” systems/tools what-so-ever. This contribution lies in the adoption/adaptation of knowledge representation techniques in order to capture, record, organize and retrieve software engineering knowledge. A knowledge base built on that basis can be used as “corporate memory” (Lowry and Duran, 1989) for a software system, facilitating the work of developers, maintainers and users. After all, software engineers spend considerable time trying to understand software systems (Soloway et al., 1988; Corbi, 1989; Devanbu, 1994). Moreover, much of this effort is dedicated to the process of recovering unrecorded knowledge. Such knowledge includes (but is not limited to): Domain knowledge—e.g., patients, nurses, treatments, admissions for a hospital registration system; Requirements knowledge—what is the system intended for?... what functions will it perform?... what information will it handle? Design knowledge—system architecture, conventions, what does each component do? Implementation knowledge—implementation details; Programming knowledge—about the programming language, data structure and algorithms used; Quality factors—what expectations did the customer have with regard to performance, reliability, portability,..? how have these influenced the design and implementation of the system? Design rationale—why were decisions made?... how do they relate implementation with design, design with requirements? Historical knowledge—who built and who maintained the system?... what where their software engineering habits, strengths and weaknesses? Of course, software knowledge capture and representation is necessary for building intelligent tools as well. The difference between this “software knowledge management” perspective and the “intelligent assistants” one that prevails in AI&SE research today is that the knowledge captured is to be used primarily by people in the former case, rather than by a (knowledge-based ) system in the latter case. This distinction has profound consequences on the nature of the representations used and the scope and coverage of the knowledge bases built. In particular, AI’s contribution to the capture and representation of software knowledge can be in areas such as:

P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

KL426-Mylopoulos

April 7, 1997

SOFTWARE ENGINEERING

14:16

293

New concepts (ontologies)—for representing some of this knowledge; New notations/languages—for specifying software at some level of abstraction (e.g., requirements or design), or for representing some of the other types of software knowledge listed above; Better semantics—for existing requirements or design languages; Software knowledge repositories—which capture useful knowledge about a particular software system throughout its lifetime, organized for easy human access and supporting basic retrieval mechanisms to facilitate its use. For the development of intelligent assistants, on the other hand, emphasis rests with the characterization of software engineering tasks (e.g., requirements acquisition, algorithm design), the identification of relevant formal and heuristic knowledge, and the construction of systems that can adequately perform these tasks. The rest of the paper expands on this central theme of using knowledge representation ideas to improve software knowledge management. The next three sections review three contributions of knowledge representation research to SE originally reported in (Greenspan et al., 1982; Borgida et al., 1993; Yu, 1993) and (Chung, 1993b). The first involves the language RML, designed on a foundation of ideas from frame- and logic-based knowledge representation schemes to offer a novel (at least for its time) formal requirements modeling language. The second contribution adapts solutions of the frame problem originally proposed in the context of AI planning in order to offer a better formulation of the notion of state change caused by an activity, which appears in most formal requirements modeling languages in one form or another. The final contribution imports ideas from multi-agent planning systems to propose a novel ontology for capturing organizational intentions in requirements modeling. It is important to note that, although this paper provides an overview of three lines of research in some detail, the purpose of this paper is not to present or review these works per se. Rather, the presentation in Sections 2 to 4 are offered as concrete examples to support the thesis of this paper, namely, the distinguishing features and potential benefits of the “software knowledge management” approach to bringing AI techniques to bear on software engineering. 2.

Requirements modeling in RML

Language design has been a central theme in Software Engineering research throughout its history. Programming languages, for example, and their associated methodologies have contributed greatly to increased programmer productivity. The emergence of formal requirements modeling languages was the logical next step in providing linguistic, methodological and tool support for the early phases of the software lifecycle, for very much the reasons first articulated in (Bell, 1975). However, the subject matter of programming and specification languages is software systems—objects that are man-made, bounded and objectively known, while this is not the case with requirements. A corollary of this is that designers of requirements modeling languages need to turn to research in areas other than core computer systems and programming languages in search of ideas and research results that offer an

P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

294

KL426-Mylopoulos

April 7, 1997

14:16

MYLOPOULOS, BORGIDA AND YU

intellectual foundation for their designs. To put it another way, it is unwise to try to design requirements modeling languages by merely adopting programming language ideas. This section reviews some of the premises and features of the Requirements Modeling Language (hereafter RML) first proposed in (Greenspan et al., 1982). While developing the requirements for a software system, the requirements engineer needs to understand the application domain, including the organizational environment within which the proposed system will eventually function. It has been argued as far back as the mid-’70s (Ross and Schoman, 1977b) that it is imperative to capture explicitly as much of this understanding as possible, in order to support communication between the various “stakeholders” (customers, users, developers, testers) for a software development project—this is, after all, the primary function of the requirements document. An explicit model is also useful in supporting continuity in face of inevitable staff turnover and other organizational change. However, a model of a social organization or of the natural world is not likely to be prescriptive—natural kinds and notions like ‘consumer satisfaction’ have no definitions in terms of necessary and sufficient conditions. Hence we view requirements not as specifications1 but as models of the application domain. Moreover, these models have to be structured in ways which are consistent with cognitive principles of mental models and memory structure (e.g., (Norman, 1988)), since they are to be used (and hence understood) by people. Requirements engineering activities are defined as model construction, management and analysis tasks. The case for world modeling is articulated eloquently by Jackson (1978, 1983), whose methodology starts with the development of a “model of reality with which [the system] is concerned”, prior to system design. The logical conclusion of these observations is that we should be developing conceptual models, expressed in terms of symbols which denote (concrete or abstract) entities, activities, and other phenomena in the world. Moreover, symbols are structured and organized according to principles of conceptual organization, such as “classes and instances,” “parts and wholes,” and “specializations and generalizations.” Others, including (Bubenko, 1980) and (Solvberg, 1979), also advocated conceptual modeling for requirements modeling, or built requirements languages on top of knowledge representation substrata (e.g., GIST (Balzer et al., 1982)). The issue of conceptual modeling was considered by a number of participants at the 1980 Pingree Park Workshop (Brodlie, 1981, 1984), particularly by researchers working on data modeling for databases. Objects (with intrinsic identity) form the pearl-seeds around which knowledge about the domain is grouped. Of course, the field of Knowledge Representation has a long-standing involvement with this subject matter, and has served for us as a primary source of ideas. The above principles defined a foundation for the RML proposal presented originally in (Greenspan et al., 1982) and subsequently in more detail in (Greenspan, 1984; Greenspan et al., 1986). Implicit in this was the notion that some kind of formal language would be used to express requirements models. The advantage of any such formalism is that descriptions which adopt it can be assigned a well-defined semantics using formal logic. The advantages of clear semantics include adjudicating among different interpretations of a given model, and offering a basis for various ways of reasoning with models, either through consistency checking (the foundation of useful tools) or by supporting question-answering. Of course, the appeal and usability of some techniques may be largely due to their relative simplicity

P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

KL426-Mylopoulos

SOFTWARE ENGINEERING

April 7, 1997

14:16

295

and flexibility derived from informality. We note that the use of a formal requirements modeling language does not preclude the concurrent use of informal notations. In fact, the original RML proposal envisioned early use of an informal notation, such as SADT, and a transformation process from an informal SADT model into a formal RML one.2 RML views a model as consisting of objects of various kinds: individuals, or tokens, grouped into classes, which are in turn instances of metaclasses. Classes and metaclasses can have definitional properties, which specify what kinds of information can be associated to their instances through factual properties. For example, if the class Person has name as a definitional property, then each instance of Person can have a factual property associating a specific name to it. The requirement that every factual property must be induced by a corresponding definitional property is called the Property Induction Constraint, and offers a form of type checking. A subclass relationship between classes (and between metaclasses) asserts that every instance of the subclass is an instance of the superclass, and moreover, every definitional property of a class is a definitional property of its subclasses (i.e., inheritance). The class descriptions of figure 2.1 define the activity class named AdmitPatient and the entity class Patient. The former is intended to convey the idea that the activity of admitting a new patient (to a hospital) involves three sub-activities which, respectively, obtain

Figure 2.1.

P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

296

KL426-Mylopoulos

April 7, 1997

14:16

MYLOPOULOS, BORGIDA AND YU

standard information about the patient, including blood pressure (document), assign her to a bed (AssignBed) and record the admission (recordAdmission) in some computeror paper-based file. The first three properties of AdmitPatient identify through properties other objects that must be present for every instance of the class (participants properties); these may be thought of as analogues of procedural “formal parameters” and “locals”. The next three properties (document, checkIn, record) are classified under parts, and specify sub-activities of AdmitPatient. The following three properties (canAdmit?, isThereRoom?, patientAlready?) define preconditions, which must be true every time AdmitPatient is instantiated. The activity also has two postconditions, which specify respectively that the effects of the activity include making the person pt a Patient, and incrementing the count of how many people reside on the ward. Likewise, the Patient entity class describes instance entities in terms of a number of properties. First, patients have an associated medical record which is necessary (i.e., must be there for every Patient instance), unique and a part (i.e., if a patient is removed from an RML model, so is her medical record). Second, patients have three association properties, which associate respectively a location, a room and a physician. Moreover, patients are produced by AdmitPatient activities, are modified by AssessPatient activities and are “consumed” by (i.e., cease to be patients because of) the DischargePatient activity. Finally, instantiation of the Patient class is only possible if the patient does not have unpaid bills (startClean?). According to RML’s view of the world (what we shall call its ontology), there are three types of things to be talked about: entities, activities and assertions (i.e., every individual token in the model is an instance of exactly one of the classes Entity, Activity, or Assertion). The notions of entity and activity were chosen because they are ubiquitous in modeling aspects of a real world, and match well corresponding notions in other requirements languages; assertions were introduced in order to help structure the requirements model itself. Each object category is specified by listing (using meta-classes) the property categories (kinds of definitional properties) that can be associated with those kinds of classes. For example, as we saw in figure 2.1, activity classes have, among others, participants, parts and pre/post-condition properties. Entity classes, on the other hand, admit unique, necessary, parts, associations, etc., properties. Note that in the RML framework, an object can be an instance of multiple classes and, likewise, a property can belong to multiple property categories (see record property). Each object category is formalized in the semantics of RML in terms of axioms that capture its essence; for example, activities have axioms which state that their start time must precede their end time, or that all precondition properties must be true at the start of a new activity instance, while postconditions will be true at the end. The definition of “instance”is construed so that an activity token’s instancehood in an activity class corresponds to the occurrence of the activity according to the formal properties associated with the class. Just as for entities, RML activity classes are organized into specialization/generalization hierarchies. Organizing activities in this way is a step beyond classical object-oriented software engineering approaches, in which objects (corresponding to RML entities) have attached procedures, but the procedures are themselves not subject to organization by hierarchies of classes.

P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

KL426-Mylopoulos

SOFTWARE ENGINEERING

April 7, 1997

14:16

297

Figure 2.2.

Assertion (formula) “objects” are the most novel part of RML. They provide a formal language for specifying otherwise informal information. Among their roles, they are associated as preconditions and postconditions on activities, and as invariants on entities. Treating assertions as objects makes them subject to the same structuring/organizing principles as other objects, but the meaning is specific to the logical nature of the assertions. For example, the assertion class IsTreatedWith has properties which define its arguments (p, a patient, and t, a treatment) and ones which define its component sub-assertions (parts). This assertion class indicates that a patient p receives treatment t if and only if the treatment is available (Available is another assertion class) and has been Recommended. In general, an assertion class’s properties classified under the arguments category are taken to be free variables of an open formula, while the induced factual argument properties are taken to be the values bound to those variables to close the formula. Thus, assertion class instances represent closed formulas, which are also stipulated by the semantics of the language to be true. Other types of properties of assertions allow the structuring of assertions in terms of their parts, as for other object categories, but in this case parts are interpreted as logical conjuncts. The resulting representation is somewhat akin to, but semantically richer than, decision tree representations of complex formulas. A formal semantics is given for RML by defining a mapping from RML descriptions into a set of assertions in FOPC (Greenspan et al., 1986). These include all RML framework axioms as well as predicates and axioms associated with the specific classes defined by the modeler. Assertions translate into corresponding expressions in FOPC. However, the notation of FOPC provides no structuring/organization principles or other support for building and maintaining large theories (the essence of Software Engineering)—a defect intended to be addressed by RML and its data modeling cousins. The representation of time is essential for languages intended to model dynamic applications, if one is to prevent an implementation bias toward imperative programming style. RML assumes a linear model of time points and encourages history-oriented modeling of an application, which consists of describing possible histories for an entity or activity (or assertion, for that matter). Accordingly, there is a time argument in every predicate appearing in an RML assertion. To summarize, RML is an object-centered modeling language, in the sense that a model is built by repeatedly describing classes and individuals, related by binary relationships. The objects have intrinsic identity and act as anchors around which information is grouped. From its ancestors in knowledge representation, RML preserves the belief in the significance

P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

298

KL426-Mylopoulos

April 7, 1997

14:16

MYLOPOULOS, BORGIDA AND YU

of notions such as “class membership” and “subclass” (with concomitant forms of “inheritance”) in their more general form: an individual may belong to multiple classes, a property may belong to multiple property categories, a class may have many superclasses. What has been omitted from semantic network and frame representations are ideas such as various kinds of defaults, procedural attachments, and support for representing indefinite/incomplete/partial information about individuals. The result is a greatly simplified and, we believe, a more easily learnable/useable language; the price paid is the inability of the model to perform automatically inferences of various kinds, especially about individuals. This trade-off seems to be the right one to make, since the requirements model is mostly about the generic concepts in the domain, and much less about the individuals occurring in it any particular moment or our knowledge of them. Similarly, the time model is quite simple, in contrast with alternatives involving branching time, or modal operators. Also from Knowledge Representation, albeit the logicist camp, comes the firm belief in the importance of a well-defined, formal semantics, preferably based on a well-understood formalism such as First Order Predicate Calculus. The “value-added” to the ideas in knowledge representation are the emphasis on the specific notions of activity and assertion, as well as the property categories motivated by their empirically frequent occurrence in requirements, and other requirements languages. The modeling framework described above lends itself to a methodology for building requirements models according to a technique that can be characterized as “stepwise refinement by specialization,” (Borgida et al., 1984)—the idea of building descriptions by developing class hierarchies in a systematic and incremental manner. RML should be seen as an early example of languages catering to the requirements engineer. Another early example is the Conceptual Information Model, CIM (Bubenko, 1980), perhaps the first comprehensive proposal for a formal requirements modeling language. Its features include an ontology of entities and events, and an assertional sublanguage for specifying constraints, including complex temporal ones. The GIST specification language (Balzer et al., 1982), developed at ISI over the same period as RML, was also based on ideas from knowledge representation and supported modeling the environment; it was influenced by the notion of making the specification executable, and by the desire to support transformational implementation. It has formed the basis of an active research group on the problems of requirements description and elicitation (e.g., (Johnson et al., 1992)). ERAE (Dubois et al., 1986, 1992) is another early effort that explicitly shared with RML the view that requirements modeling is a knowledge representation activity, and had a basis in semantic networks and logic. Finally, the KAOS project constitutes another significant research effort which strives to develop a comprehensive framework for requirements modeling and requirements acquisition methodologies (Dardenne et al., 1993). The language offered for requirements modeling provides facilities for modeling goals, agents, alternatives, events, actions, existence modalities, agent responsibility and other concepts. Moreover, KAOS relies on a meta-model to provide a self-descriptive and extensible modeling framework. RML was conceived at an early stage in the development of the object-oriented paradigm, and has been recognized as having been influential in the development of other requirements

P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

KL426-Mylopoulos

April 7, 1997

14:16

SOFTWARE ENGINEERING

299

modeling languages. Since their original development more than ten years ago, the ideas in RML have gone through several incarnations, and are still evolving. The Telos language (which replaces the fixed ontology of RML with an extensible one) has been adopted for use by a number of research groups worldwide, and has been implemented by at least three independent groups. The ConceptBase implementation (Jarke et al., 1995) is most complete and most widely used. Experiences in the use of RML and Telos has been reported in the literature, and are cited in (Greenspan et al., 1994). RML has also served as the starting point for developing more specialized ontologies, such as the i ∗ framework described in Section 4. A new language (to be called Tropos) is being developed by extending Telos to incorporate some of these ontologies (Yu et al., 1996).

3.

Semantics of activities and the frame problem

There are two principal arguments in favour of formal requirements modeling languages. Firstly, a formal account of such a language could be used to resolve differences in interpretation of a requirements model, at least sometimes. Secondly, such a formal account could be used as basis for formally analyzing the model, to establish consistency or otherwise prove interesting properties about it, thereby helping people understand it. We focus in this section on the semantics of activities, and point out that the semantics offered by RML and other languages in the same family have (hidden) difficulties in the way they deal with the frame problem (McCarthy and Hayes, 1969). In addition, we describe an adaptation to formal specifications of a solution to the frame problem proposed in (Reiter, 1991), which has been reported in (Borgida et al., 1993). To bring this issue into focus, let us consider a particular kind of consistency check that we might wish to perform on requirements models: In RML, and many of the abovementioned requirements modeling languages, one can associate with classes of entities “integrity constraints”—assertions of invariance that are supposed to hold at all times in (the model of) the world. In the earlier example involving hospitals, typical assertions would be “The number of patients assigned to a ward cannot exceed the capacity of that ward ” and “The number of nurses on duty on a ward must be more than one” In RML, these could be expressed as invariant assertion properties associated with wards, for example. Then one kind of consistency check of our model would be to verify that the descriptions of activities are consonant with these invariants, in the sense that if a condition held at the time the activity started, then it also held at the end of the activity.3 Consider a simplified description of the AdmitPatient activity, using more standard logical formulae for assertions, where properties are treated as functions while classes are treated as unary predicates. (This representation is closer in form to that of many other requirements modeling languages, such as ERAE (Dubois et al., 1986).)

P1: STR/MVG/ASH

P2: ICA

Automated Software Engineering

KL426-Mylopoulos

April 7, 1997

300

14:16

MYLOPOULOS, BORGIDA AND YU


1) [admitted#(wrd) < capacity(wrd) ¬Patient(pt) CanAdmit(d,wrd)] [admitted#'(wrd) = admitted#(wrd) + 1 Patient'(pt)] ==> ∀x. (Ward'(x) ==> nursesOnDuty#'(x) > 1)]
- admitted#'(x)


∀α∀p[¬Patient(p) ∃w.α = AdmitPatient(p, w)]

indicating that only AdmitPatient can turn a thing into a patient. Likewise, the axiom for admitted# is
1)


x=y)]. We emphasize that here it is the responsibility of the modeler to write down the change axioms, and that these may need to be modified as new activities are added or old ones are altered. For example, if we now were to add an activity DischargePatient, with its own description DischargePatient(pt,w)