Combining Case-Based Reasoning and ... - Semantic Scholar

1 downloads 303 Views 49KB Size Report
Reasoning in Software Design. Paulo Gomes, Francisco C. Pereira, Nuno Seco, Paulo Paiva, Paulo Carreiro,. José L. Ferreira, Carlos Bento. CISUC – Centro ...
Combining Case-Based Reasoning and Analogical Reasoning in Software Design Paulo Gomes, Francisco C. Pereira, Nuno Seco, Paulo Paiva, Paulo Carreiro, José L. Ferreira, Carlos Bento CISUC – Centro de Informática e Sistemas da Universidade de Coimbra. Departamento de Engenharia Informática, Polo II, Universidade de Coimbra. 3030 Coimbra {pgomes, camara}@dei.uc.pt, {nseco, paiva, carreiro}@student.dei.uc.pt, {zeluis, bento}@dei.uc.pt http://rebuilder.dei.uc.pt

Abstract. Designers use several types of knowledge and reasoning mechanisms during the creation of new artefacts. In order to cope with this cognitive characteristic of design, an intelligent design tool able to help a designer must integrate several reasoning mechanisms and knowledge formats. Case-based reasoning and analogical reasoning are usually considered as two distinct mechanisms, though they are also considered to be in the same cognitive axis, casebased reasoning being in one extreme, and analogy in the other. Both are important reasoning mechanisms in the generation of new designs, but they both reflect different ways of exploring the design space. In this paper we present a way of combining both techniques, showing how it was integrated in an intelligent software design tool. Experimental results are presented and discussed, showing the advantages and limitations of this approach.

1

Motivation and Goals

Design is a complex activity that is generally ill-structured and requires a great amount of different types of knowledge. The design process generates a structural description that complies with a set of functional specifications and constraints. The design outcome is a set of components and relations between them, making the design structure. The design process involves three main types of knowledge about the domain: functional, behavioural and structural. The mapping between these three types of knowledge is central to the design process. In order to provide a design system with several reasoning mechanisms, we have developed a system that integrates case-based reasoning [1] and analogy reasoning [2]. These two reasoning mechanisms provide support for the inexperienced designer, showing and allowing the reuse of several relevant past designs, and it can also help the more experienced designer to explore new solutions far away from the problem domain. The Holographic Reduced Representations approach [3] combines semantic and structural similarity. The semantic similarity is based on the common semantic primitives of objects, while the structural similarity is accessed using the various roles of

each object. An approach using constraint satisfaction (ARCS – Analogical Retrieval by Constraint Satisfaction [4]) applies three similarity types: lexical similarity, structural similarity, and pragmatic similarity. RADAR developed by Crean and O’Donoghue [5], was created to retrieve semantically distant analogies in a computationally tractable manner, and is based only in structure mapping. Déjà vu [6, 7] is a CBR systems for code generation and reuse using hierarchical CBR. Déjà Vu uses a hierarchical case representation, indexing cases using functional features. Maiden and Sutcliffe [8] research work is centred on the analogy in software specification reuse. Their approach is based on the reasoning with critical problem features, retrieving analogous specifications of the same category. In our approach we have combined Case-Based Reasoning (CBR) with analogy using the retrieval phase of both reasoning methods as a common point. Retrieval of relevant and useful knowledge from a design repository is a crucial phase in both CBR and analogical reasoning. While CBR centers in semantic similarity, analogy approaches commonly use structural similarity. Another usual cited difference is that the goal of analogy is to establish a mapping between different domains, while CBR maps concepts between the same domain. We use a general ontology to establish the connection between both mechanisms, and specifically object categorization as a mean of retrieving semantic similar concepts for CBR, and distant semantic concepts for analogy. We have develop a software design system named REBUILDER based on this theory. Our goals are to develop an intelligent Computer Aided Software Engineering (CASE) tool for software design capable of being a development tool at the corporate level, thus interacting with different types of designers. In the next section we describe ReBuilder, showing its architecture, ontology used, CBR module, and analogy module, and then the experimental work is presented.

2

ReBuilder

REBUILDER is a software design CASE tool, with the goal of providing intelligent help for software developers. There are two types of users interacting with the system. The software designers, using REBUILDER as an UML editor and design support tool, and the system administrator with the main task of keeping the knowledge base (KB) updated. In our approach we use one of the most worldwide used software design languages: the Unified Modelling Language, best known as UML. Figure 1 shows the architecture of ReBuilder. It comprises four main modules: UML editor, KB management, KB, and CBR engine. The UML editor is the system front-end for the software designer, comprising the working space for design. The KB management is the interface between the KB and the system administrator. It allows the administrator to add, delete, or change knowledge from the KB. The KB comprises four sub modules: data type taxonomy, case library, case indexes, and WordNet knowledge. The data type taxonomy provides is-a relations between data types used in UML. This structure is used when the system has to compute the similarity between data types. The case library stores the design cases, each one

representing a software design. They are stored in UML files created by the editor. The case indexes are used for the case retrieval, allowing a more efficient retrieval. WordNet is a lexical reference system [9], used in REBUILDER as a general ontology that categorizes case objects. The next subsection describes in more detail the knowledge used from WordNet. Retrieval

CBR Engine Adaptation

Verification

Case Library WordNet

Analogy

UML Editor

Learning

Knowlegde Base Case Indexes

Data Type Taxonomy

Knowledge Base Manager

Software Designer

KB Administrator

Figure 1 - The architecture of ReBuilder. The CBR engine performs all the inference work in ReBuilder. There are five sub modules: retrieval, analogy, adaptation, verification, and learning. The retrieval module searches the case library for designs or objects similar to the query. Retrieved designs are presented to the user, allowing the user to reuse these designs (or pieces of them), or they can also suggest new ideas to the designer, helping him to explore the design space. Analogy uses the retrieval module to select the most similar designs from the case library, and then maps them to the query design. The resulting mapping establishes how the knowledge transfer from the old design to the query design is done. The adaptation module can be used to adapt a past design (or part of it) to the query design. The verification module checks the current design for inconsistencies. The learning module acquires new knowledge from the user interaction, or from the system reasoning. This paper focus on the retrieval and analogy modules. 2.1 WordNet WordNet is used in REBUILDER as a common sense ontology. It uses a differential theory where concept meanings are represented by symbols that enable a theorist to distinguish among them. Symbols are words, and concept meanings are called synsets. A synset is a concept represented by one or more words. If more than one word can be used to represent a synset, then they are called synonyms. But there is also another aspect important for WordNet: the same word can have more than one different meaning (polysemy). WordNet is built around the concept of synset. Basically it comprises a list of synsets, and different semantic relations between synsets. The first part is a list of words, each one with a list of synsets that the word represents. The second part, is a set of semantic relations between synsets. In REBUILDER, we use the word synset list and four semantic relation: is-a, part-of, substance-of, and member-of. REBUILDER uses synsets for categorization of software objects. Each object has a context synset which

represents the object meaning. In order to find the correct synset, REBUILDER uses the object name, and the names of the objects related with it, which define the object context. The object’s context synset can then be used for computing object similarity, or it can be used as a case index, allowing the rapid access to objects with the same classification. 2.2 Case Retrieval REBUILDER uses a CBR framework to manage the different reasoning modules. Cases are represented in UML, which can comprise several diagrams, describing different aspects of a software system. REBUILDER allows the creation and manipulation of all kinds of diagrams, though only class diagrams are used for reasoning purposes. A class diagram comprises three types of objects (packages, classes and interfaces) and different relations between objects (associations, generalizations, dependencies, and realizations). A case is an UML class diagram, which has a main package named root package (since a package can contain sub packages). REBUILDER can retrieve cases or pieces of cases, depending on the user query. In the first situation the retrieval module returns packages only, while in the second one, it can retrieve classes or interfaces. The retrieval module treats both situations the same way, since it goes to the case library searching for software objects that satisfy the query. The retrieval algorithm has two distinct phases: first uses the context synsets of the query objects to get N objects from the case library. Where N is the number of objects to be retrieved. This search is done using the WordNet semantic relations that work like a conceptual graph, and with the case indexes that relate the case objects with WordNet synsets. The second phase ranks the set of retrieved objects using object similarity metrics. In the first phase the algorithm uses the context synset of the query object as entry points in WordNet graph. Then it gets the objects that are indexed by this synset using the case indexes. Only objects of the same type as the query are retrieved. If the objects found do not reach N, then the search is expanded to the neighbour synsets using only the is-a relations. Then, the algorithm gets the new set of objects indexed by these synsets. If there is still not enough objects, the system keeps expanding until it reaches the desired number of objects, or there is no where to expand. The result of the previous phase is a set of N objects. The second phase ranks these objects by similarity with the query. Ranking is based on object similarity, and there are three types of object similarities: package similarity, class similarity, and interface similarity. Package similarity is based on four aspects: type similarity, sub-package similarity, diagram similarity, and dependencies similarity. Type similarity evaluates the distance between the query synset and the retrieved object synset, measured in number of semantic relations between synsets. The second aspect returns the similarity between the query sub-packages and the retrieved sub-packages, which is a recursive call to the package similarity metric. Diagram similarity is based on the similarity between the UML objects of both packages. As both diagrams are graphs, this metric computes the

graph similarity between diagrams, taking into account the types of nodes (classes or interfaces). The last part of the package similarity is the external similarity computed using the package dependencies, which are UML relations between packages. Class similarity is based on three features: type similarity, intra-class similarity, and inter-class similarity. The type similarity is the same as in package similarity, evaluating the objects’ synset distance. The intra-class similarity is based on the attributes and methods of the classes. The inter-class similarity is based on the classes’ relations. Interface similarity is equal to class similarity, except that the intra-interface similarity is based only on the interface methods, since interfaces do not have attributes. 2.3 Analogical Reasoning Analogical reasoning is used in REBUILDER to suggest class diagrams to the designer, based on a query diagram. The analogy module has three phases: identifying diagrams candidate for analogy, mapping candidate diagrams, and creation of new diagrams. Candidate selection is the first phase, and the most important one, because if the selected candidates are not appropriate, the whole mapping phase can be at risk. Most of the analogies that are found in software design are functional analogies, that is, the analogy mapping is performed using the functional similarity between objects. Since functional similar objects are categorized in the same branch (or near) in the WordNet is-a trees, the retrieval algorithm uses these semantic relations to retrieve objects. The analogy module uses the retrieval module for this first phase. Thus, the analogy module benefits from a retrieval filtering based on functional similarity. The second phase of analogy is the mapping of each candidate to the query diagram, yielding an object list correspondence for each candidate. This phase relies on a relation-based mapping algorithm, which uses the UML relations to establish the object mappings. It starts the mapping selecting a query relation based on an UML heuristic that selects the relation connecting the two most important diagram objects. This heuristic is specific to UML and evaluates the independence of each object in the class diagram. Then it tries to find a matching relation on the candidate diagram. After it finds a match, it starts the mapping by the neighbour relations, spreading the mapping using the relations. This algorithm maps objects in pairs corresponding to the relation’s objects, satisfying the structural constraints defined by the UML diagram relations. The final phase is the generation of new diagrams using the established mappings. For each set of mappings the analogy module creates a new diagram, which is a copy of the query diagram. Then using the mappings between the query objects and the candidate objects, the algorithm transfers knowledge from the candidate diagram to the new diagram. This transference has two steps: first there is an internal object transference, and then an external object transference. In the internal object transference, the mapped query object gets all the attributes and methods from the candidate object that were not in the query object. This way, the query object is completed by the internal knowledge of the candidate object. The

external object transference, transfers neighbour objects and relations from the mapped candidate objects to the query objects of the new diagram, expanding it.

3

Experimental Results

The Knowledge Base used comprises a case library with 60 cases, distributed in five domains. Each case comprises a package, with 5 to 20 objects. Each object has up to 20 attributes, and up to 20 methods. Three sets of problems were specified, based on stored cases. Problem set P20, comprises 25 problems, each problem is a case copy with 80% of its objects deleted, attributes and methods are also reduced by 80%. The other sets comprise the same problems but with 50% (P50) and 20% (P80) deletions. The goal of the experiments is to compare the candidate selection criteria. For each problem we applied the analogy mapping to all cases of the KB, thus serving as a base line for evaluation. The other candidate strategies are: CBR+Analogy using type similarity, CBR+Analogy using structural similarity, and CBR+Analogy using both type similarity and structural similarity. For each problem four runs were made: C1 - Analogy Mapping C2 – CBR + Analogy C3 - CBR + Analogy C4 - CBR + Analogy

Similarity type used Structural Structural (0.5) + Type (0.5) Structural (1) + Type (0) Structural (0) + Type (1)

For each problem run the best five new cases were analysed, based on the mappings performed by the analogy algorithm. The data gathered was: percentage of mappings in the new case considered as correct, number of mappings, distance of mapped objects, depth of the Most Specific Common Abstraction (MSCA) between mapped objects, and computation time. The results obtained for the first five solutions generated by the analogy module are presented in Table 1. The values presented are the average results of the three problem sets (P20, P50 and P80). These results show a clear trade-off between computation time and quality of results. Analogy by its own generates better solutions, which can be seen by the percentage of correct mappings. This reflects also in the shorter semantic distance between objects, and the greater depth of the MSCA. Configurations which use case retrieval are faster then C1. Average of Computation Time (milliseconds) Average of Percentage of Correct Mappings (%)

C1

C2

C3

C4

2903

2497

2479

2279

64

52

53

45

Average of Number of Mappings

3.15

3.02

3.02

2.97

Average Distance of Mapped Objects (is-a links)

2.94

3.48

3.49

3.85

Average Depth of MSCA (is-a links) 3.56 Average Number of Mappings x Correct Percentage 2.01

3.32

3.31

3.11

1.57

1.59

1.33

Table 1. Experimental results obtained for the first five solutions presented by REBUILDER, and for the three problem sets (average of values for P20, P50 and P80).

If we analyse only the results for the first generated solution (see Table 2), it can be inferred that the trade-off difference is smaller. For instance, the computation time for the first solution (the best one) is almost the same between C1 and C2, and the difference in the percentage of correct mappings is also small, only 5.13%. Probably the best alternative is choosing configuration C3, which only loses 5.52% of accuracy and is more or less 300 milliseconds faster (these are average values for the 75 problems). C1

C2

C3

C4

Average of Computation Time (milliseconds)

4502

4433

4239

4287

Average of Percentage of Correct Mappings (%)

96.22 91.09

90.63 86.18

Average of Number of Mappings

5.01

5.01

5.01

4.90

Average Distance of Mapped Objects (is-a links)

0.47

0.81

0.90

1.47

Average Depth of MSCA (is-a links) 4.79 Average Number of Mappings x Correct Percentage 4.82

4.60

4.56

4.32

4.57

4.54

4.23

Table 2. Experimental results obtained for the first solution presented by REBUILDER, and for the three problem sets (average of values for P20, P50 and P80).

Analysing the difference between the problem sets used (see Table 3), results show that the gain in computation time over C1 can range from 8% to 33%. The higher gains are shown in the P80 set, which is explained by the high number of objects in relation to P20 or P50. So, the difference of computation time is proportional to the number of objects in the problem diagram. Percentage of gain in computation time over C1

C2

C3

C4

P20 (%)

8

6

16

P50 (%)

11

9

26

P80 (%)

23

27

33

Table 3. Percentage of gain in computation time of configurations C2, C3 and C4 over C1 (CBR+Analogy vs. Analogy).

Doing the analysis of the percentage of correct mappings by problem set (see Table 4), we reach the conclusion that C3 is the configuration which presents the best results. Results get worst with the increase of objects in the problem. The percentage of correct mappings is proportional to the number of objects in the problem. Percentage of loss in the % of Correct Mappings over C1

C2

C3

C4

P20 (%)

-20

-17

-37

P50 (%)

-21

-19

-44

P80 (%)

-26

-26

-44

Table 4. Percentage of gain in the item percentage of correct mappings of configurations C2, C3 and C4 over C1 (CBR+Analogy vs. Analogy).

4

Discussion

This paper presents an approach for combining CBR retrieval with analogical retrieval using a general ontology for object categorization. This method has been implemented in ReBuilder, an intelligent CASE tool for software design and reuse. One of the advantages of combining CBR and Analogy in a general CASE tool is that different types of designers can choose the reasoning most fit to their purposes. For instance, novice designers will certainly profit from CBR suggestions that help the exploration of semantic similar designs. On the other hand, experienced designers want more bold solutions, giving preference to optimising ones. This can be obtained by analogical reasoning, which helps the exploration of distant domains. From the method point of view, using case retrieval to get the analogue candidates is a way of optimising the analogy retrieval. Experiments have shown that selection of analogue candidates based on a case retrieval algorithm with type and structural similarity is a good compromise solution between computing time and mapping quality.

Acknowledgments This work was supported by POSI - Programa Operacional Sociedade de Informação of Portuguese Fundação para a Ciência e Tecnologia and European Union FEDER, under contract POSI/33399/SRI/2000, and by program PRAXIS XXI.

References 1. Kolodner, J., Case-Based Reasoning. 1993: Morgan Kaufman. 2. Gentner, D., Structure Mapping: A Theoretical Framework for Analogy. Cognitive Science, 1983. 7(2): p. 155-170. 3. Plate, T., Distributed Representations and Nested Compositional Structure, in Graduate Department of Computer Science. 1994, University of Toronto: Toronto. 4. Thagard, P., et al., Analog Retrieval by Constraint Satisfaction. Artificial Intelligent, 1990. 46: p. 259-310. 5. Crean, B.P. and D. O'Donoghue. Features of Structural Retrieval. in IASTED - International Symposia, Applied Informatics. 2001. Innsbruck, Austria. 6. Smyth, B. and P. Cunningham. Déjà Vu: A Hierarchical Case-Based Reasoning System for Software Design. in 10th European Conference on Artificial Intelligence (ECAI'92). 1992. Vienna, Austria: John Wiley & Sons. 7. Smyth, B. and M. Keane. Experiments on Adaptation-Guided Retrieval in Case-Based Reasoning. in International Conference on Case-Based Reasoning (ICCBR'95). 1995. Sesimbra, Portugal: Springer-Verlag. 8. Maiden, N. and A. Sutcliffe, Exploiting Reusable Specifications Through Analogy. Communications of the ACM, 1992. 35(4): p. 55-64. 9. Miller, G., et al., Introduction to WordNet: an on-line lexical database. International Journal of Lexicography, 1990. 3(4): p. 235 - 244.