Using Ontologies to Resolve Semantic Heterogeneity ...

2 downloads 0 Views 1024KB Size Report
From a pragmatic point of view (as it is treated in this thesis) a community is ..... gua library of ontologies. The two logical ...... In Aiello, L. C., Doyle, J., and Sha-.
Using Ontologies to Resolve Semantic Heterogeneity for Integrating Spatial Database Schemata Dissertation zur Erlangung der naturwissenschaftlichen Doktorwürde (Dr. sc. nat.)

vorgelegt der Mathematisch-naturwissenschaftlichen Fakultät der Universität Zürich

von Farshad Hakimpour aus dem Iran

Begutachtet von Dr. sc. techn. Sabine Timpf Prof. Dr. K. Brassel Prof. Dr. K. Dittrich

Zürich, 2003

Die vorliegende Arbeit wurde von der Mathematisch-naturwissenschaftlichen Fakultät der Universität Zürich auf Antrag von Prof. Dr. Kurt Brassel und Prof. Dr. Klaus R. Dittrich als Dissertation angenommen.

Acknowledgments

I wish to express my gratitude to Dr. Sabine Timpf, Prof. Kurt Brassel and Prof. Klaus Dittrich for their support, guidance and patience during my work, particularly for giving me the opportunity to experience working in two different research groups. I am very much in debt to Dr, Andreas Geppert for his guidance. I would also like to acknowledge the support of Swiss National Science Foundation (SNF) for my research work. I wish to express my appreciation to many colleagues in both Informatics and Geography departments in University of Zurich for their support, including Xiangru Yuan, Bernhard Schneider, Dirk Jonscher, Anca Vaduva, Martin Schönhof, Athanasios Vavouras, Ross Purves and Patrick Ziegler. In particular, I wish to thank two exceptional ladies, not only for their significant help during my research work, but also, because Ruxandra Domenig always had a pair of ears to listen to my complains patiently and because of the great time I had sharing an office with Daniela Damm. I am (and those who possibly read this thesis are) thankful to Alistair Edwardes for proof-reading the thesis. I would like to also thank Hans Chalupsky for the help and support I received from him for my work with PowerLoom system. Also, I am very grateful to my parents and my brother for their moral support during last four years.

Abstract

System interoperability and data integration are becoming ever more important issues as both, the amount of available data and the number of data producers are growing. Reusing data produced by other sources is the main motivation of integration. Enterprises tend to reduce their investment for producing data by integrating outsourced data. However, integration as a precondition of data reuse demands its own costs. Data integration has to resolve the differences in data structures, as well as, solving semantic heterogeneities. Semantics refers to the meaning of data in contrast to syntax, which solely defines the structure of schema elements (e.g., classes and attributes) in databases. This thesis contributes to handling semantic heterogeneity during database schema integration. The approach aims at reducing the cost of generating or regenerating global schemata for tightly coupled federated databases. The focus of the work is on the semantics related to the terms used as identifiers in schema definitions. The solution does not rely on the names of schema elements or the structure of the schemata. Instead, we use ontologies consisting of intensional definitions of terms presented in a logical language. The presented approach integrates schemata from different communities, where each community is using its own ontology. The approach is based on similarity relations amongst intensional definitions in different ontologies. Similarity relations are formally defined based on intensional definitions in ontologies. The thesis shows how similarity relations are discovered by a reasoning system using higher-level ontologies. To this end, criteria for evaluating suitability of reasoning systems for processing formalized ontologies are introduced. Also, issues related to building ontologies and formalizing them in a logical language are discussed. The similarity relations are used to derive an integrated schema in two stages. First, we show how to use similarity relations to generate the class hierarchy of the global schema. Second, we explain how to enhance the classes definitions with attributes. We propose a semiautomatic method and clarify the cases where the integration process required supervision. The resulting integrated schema can be used as the global schema in a federated database system.

iii

iv

Zusammenfassung

Aufgrund der wachsenden Menge verfügbarer Daten und der zunehmenden Anzahl von Organisationen, die Daten bereitstellen, werden Probleme der Interoperabilität von Systemen und der Datenintegration immer wichtiger. Die Wiederverwendung von Daten, die von externen Quellen stammen, ist die Hauptmotivation für Datenintegration. Unternehmungen tendieren in diesem Zusammenhang dazu, ihre Aufwendungen und Investitionen zu reduzieren, indem sie externe Datenbestände mit ihren eigenen integrieren. Integration stellt jedoch eine Voraussetzung für die Wiederverwendung von Daten dar und bringt ihrerseits Kosten mit sich. Während der Integration von Daten müssen Unterschiede in Datenstrukturen sowie semantische Heterogenität aufgelöst werden. Im Gegensatz zur Syntax, die einzig und allein die Struktur von Schemaelementen (z.B. Klassen und Attribute) in Datenbanksystemen definiert, bezieht sich Semantik auf die Bedeutung der Daten. Diese Doktorarbeit stellt einen Beitrag dar zur Behandlung semantischer Heterogenität während der Integration von Datenbankschemata. Der vorgestellte Ansatz zielt darauf ab, den Aufwand bei der erstmaligen und wiederholten Erzeugung globaler Schemata für eng gekoppelte föderierte Datenbanksysteme zu reduzieren. Das Hauptaugenmerk dieser Arbeit liegt auf der Semantik der Begriffe, die als Bezeichner in Schemadefinitionen benutzt werden. Die gewählte Lösung hängt weder von den konkreten Namen der Schemaelemente noch von der Struktur der Schemata ab. Es werden formale Ontologien benutzt, die aus angestrebten Begriffsdefinitionen bestehen, die mit Hilfe einer Logiksprache dargestellt werden. Der vorgestellte Ansatz integriert Schemata von verschiedenen Benutzergruppen, von denen jede ihre eigene Ontologie benutzt. Basis dieses Ansatzes sind Ähnlichkeitsrelationen zwischen Definitionen aus verschiedenen Ontologien. Die Ähnlichkeitsrelationen sind - gestützt auf angestrebte Begriffsdefinitionen aus formalen Ontologien - formal definiert. Diese Doktorarbeit zeigt, wie Ähnlichkeitsrelationen durch ein System gefunden werden können, das logische Schlussfolgerungen ziehen kann und umfassendere Ontologien auf höheren Ebenen benutzt. Darüber hinaus werden Kriterien zur Beurteilung derartiger Systeme bezüglich ihrer Eignung zur Verarbeitung formalisierter Ontologien eingeführt. Daneben werden Probleme im Zusammenhang mit der Erstellung und Formalisierung von Ontologien in Logiksprachen besprochen. Die Ähnlichkeitsrelationen werden benutzt, um in zwei Schritten ein integriertes Schema abzuleiten. In einem ersten Schritt wird gezeigt, wie Ähnlichkeitsrelationen benutzt werden können, um eine Klassenhierarchie für ein globales Schema zu erzeugen. Im zweiten Schritt wird erklärt, wie die erzeugten Klassen mit Attributen ergänzt werden können. Die vorgestellte Methode ist halbautomatisch und der vorliegende Text stellt diejenigen Fälle dar, in denen der Integrationsprozess beaufsichtigt werden muss. Das resultierende integrierte Schema kann als globales Schema in einem föderierten Datenbanksystem benutzt werden.

v

vi

Contents

CHAPTER 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2. Problem Definition . . . . . . . . . . . . . . . . . . . . 3 1.2.1. Heterogeneity . . . . . . . . . . . . . . . . . . . . . 4 1.2.2. Semantics . . . . . . . . . . . . . . . . . . . . . . . 5

1.3. Ontologies as a Solution. . . . . . . . . . . . . . . . . . 6 1.4. Research Objectives . . . . . . . . . . . . . . . . . . . . 8 1.5. Research Methodology . . . . . . . . . . . . . . . . . . 8 1.6. Thesis Structure . . . . . . . . . . . . . . . . . . . . . . 10 CHAPTER 2

Related Work . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2. Semantic Issues in GI Domain . . . . . . . . . . . . . . 13 2.3. Heterogeneity and Integration. . . . . . . . . . . . . . . 15 2.4. Ontology-Based Integration Projects . . . . . . . . . . . 17 2.5. Building Ontologies . . . . . . . . . . . . . . . . . . . . 19 2.5.1. Methodologies for Building Ontologies 2.5.2. OntoClean . . . . . . . . . . . . . . . 2.5.3. ONTOLINGUA. . . . . . . . . . . . . 2.5.4. (KA)2 . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

19 20 20 21

vii

Contents

2.5.5. Affordance. . . . . . . . . . . . . . . . . . . . . . . 21

2.6. Semantic Issues on the Internet . . . . . . . . . . . . . 21 2.6.1. SHOE . . . 2.6.2. On2Broker. 2.6.3. RDF . . . . 2.6.4. OIL . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. 22 . 22 . 23 . 23

2.7. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 24 CHAPTER 3

Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 27 3.2. Syntax and Semantics in Schema Definitions . . . . . . 28 3.2.1. Semantics of Symbols . . . . . . . . . . . . . . . . . 31 3.2.2. Role of Schema Definitions in Organizing Databases 32

3.3. Conceptual Schemata . . . . . . . . . . . . . . . . . . 34 3.4. Ontologies . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.1. What is Ontology . . . . . . . . . . 3.4.2. Communities . . . . . . . . . . . . 3.4.3. Difficulties of Applying Ontologies . 3.4.4. Why Ontologies . . . . . . . . . . . 3.4.5. Conceptual schemata vs. Ontologies

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. 38 . 41 . 43 . 44 . 45

3.5. Approaches towards Ontologies . . . . . . . . . . . . . 46 3.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 47 CHAPTER 4

Logics for Formalizing Ontologies . . . . . . . . . . . . . 51 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 51 4.2. Formalisms Requirements . . . . . . . . . . . . . . . . 52 4.3. Description Logic . . . . . . . . . . . . . . . . . . . . 55 4.3.1. Concept Definition . . . . . . . . . . . . . . . . . . 55 4.3.2. Relation Definition . . . . . . . . . . . . . . . . . . 57 4.3.3. Discussion. . . . . . . . . . . . . . . . . . . . . . . 58

4.4. Frame-based Logic . . . . . . . . . . . . . . . . . . . . 60 4.4.1. Concept Definition . . . . . . . . . . . . . . . . . . 60 4.4.2. Relation Definition . . . . . . . . . . . . . . . . . . 62 4.4.3. Discussion. . . . . . . . . . . . . . . . . . . . . . . 62

viii

Contents

4.5. Comparison . . . . . . . . . . . . . . . . . . . . . . . . 64 CHAPTER 5

Ontology-based Integration. . . . . . . . . . . . . . . . . 67 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2. Overview of the Architecture and the Solution . . . . . . 70 5.3. Semantic Similarities . . . . . . . . . . . . . . . . . . . 72 5.4. Finding Similarity Relations . . . . . . . . . . . . . . . 77 5.5. Integration of schemata . . . . . . . . . . . . . . . . . . 78 5.5.1. Class Integration . . . . . . . . . . . . . . . . . . . 78 5.5.2. Filling Classes with Attributes . . . . . . . . . . . . 83

5.6. Data Mapping . . . . . . . . . . . . . . . . . . . . . . . 86 5.7. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . 88 CHAPTER 6

The Solution in Practice and the Prototype . . . . . . . . 93 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 93 6.2. Building and Formalizing Sample Ontologies . . . . . . 95 6.2.1. Methodology for Building Ontologies . 6.2.2. Importance of Extracted Statements . . 6.2.3. Using GDF and ATKIS Ontologies . . 6.2.4. Inter-Ontology Relations. . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. 95 . 98 . 99 . 102

6.3. Overall Functionalities of the Prototype System . . . . 104 6.4. Technical Specification of the Prototype . . . . . . . . 107 6.4.1. Integration Process . . . . . . . . . . . . . . . . . 109 6.4.2. Detecting Similarity Relations by PowerLoom . . . 113

6.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 115 6.6. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . 117 CHAPTER 7

Discussion and Conclusion. . . . . . . . . . . . . . . . . 119 7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 119 7.2. Summary and discussion . . . . . . . . . . . . . . . . 119 7.3. Results and Contributions . . . . . . . . . . . . . . . . 122 7.4. Position of the Work with Respect to State of the Art . 123 ix

Contents

7.5. Directions for Further Research . . . . . . . . . . . . .124

APPENDIX A Formalized Ontologies. . . . . . . . . . . . . . 127 APPENDIX B Implementation. . . . . . . . . . . . . . . . . . 153 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

x

List of Figures

Fig. 3.1 Classification of symbols or terms in a grammar. . . . . . . . . . . . . . 29 Fig. 3.2 Part of INTERLIS 2 grammar shown by Syntax Diagram. . . . . . . . . 30 Fig. 3.3 This example shows how schema definition in Table 3.1 can be considered as grammar for a language that states the propositions of a model. . . . . 32 Fig. 3.4 Interpretation of schemata as a grammar that specifies the language in which the models of the mini-world are expressed. . . . . . . . . . . . . . . . 33 Fig. 3.5 Mapping (H) from a state of mini-world to its model. . . . . . . . . . . 36 Fig. 3.6 Methods should comply with their counterparts in the world. . . . . . . 37 Fig. 3.7 Role of intensional relations in a conceptualization. . . . . . . . . . . . 40 Fig. 3.8 A thesaurus introduces the relations between terms at the linguistic level.41 Fig. 5.1 Models are projections of world structures built by some constraints according to the requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Fig. 5.2 One common ontology shared between two databases. . . . . . . . . . . 69 Fig. 5.3 Database schemata based on different ontologies. . . . . . . . . . . . . 69 Fig. 5.4 Global schema generation based on a common ontology produced by integration of domain ontologies. . . . . . . . . . . . . . . . . . . . . . . . . . 71 Fig. 5.5 On the fly integration, with local queries committing to a domain ontology and no global schema or global query. . . . . . . . . . . . . . . . . . . 72 Fig. 5.6 Levels of similarity among intensional definitions [Egenhofer and Herring, 1991]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Fig. 5.7 Schema p1 and the taxonomy tree of the ontology p. . . . . . . . . . . . 79 Fig. 5.8 Schema q1 and taxonomy tree of ontology Q. Note that definition of Transportation_Path is adopted from transportation ontology. . . . . . . 80 Fig. 5.9 Result of merging ontologies by finding specialization similarity. . . . . 81 Fig. 5.10 Occurrence of a redundant subclass relation while establishing a new subclass relation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 xi

Fig. 5.11 Occurrence of a redundant subclass relation while maintaining existing subclass relation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Fig. 5.12 Final global schema generated by the proposed approach. . . . . . . . . 84 Fig. 5.13 Example of a relation between a concept (street) and subconcepts of another concept (pavement) rather than its instances.. . . . . . . . . . . . . . . 85 Fig. 6.1 Actions to be taken for the integration by the proposed approach. . . . . 94 Fig. 6.2 Kinds of ontology from [Guarino, 1998a]. . . . . . . . . . . . . . . . . 100 Fig. 6.3 Geographic features classified in three main classes. . . . . . . . . . . 101 Fig. 6.4 Geographic features are represented by either point or line or area. . . . 102 Fig. 6.5 Organization of ontologies P and Q and their high-level ontologies (arcs illustrate the include relations). As explained in Section 6.2, we also build a set of axioms extracted from GDF and ATKIS standards. Some of the axioms are used during our experience with the prototype. . . . . . . . . . . . . . 103 Fig. 6.6 A use case diagram to show the functionalities of the prototype. . . . . 105 Fig. 6.7 The class diagram of the system (See APPENDIX B for details). . . . . 108 Fig. 6.8 An illustration of the result of integration. Dashed lines show the correspondence link, according to Figure 6.7 . . . . . . . . . . . . . . . . . . . . 112 Fig. 6.9 Detecting of similarity relations between two terms by Powerloom. . . 114

xii

CHAPTER 1

Introduction

1.1. Background This thesis contributes to enlightening and resolving problems created in the semantic integration of geographic data. The main goal is to reduce semantic conflicts when Geographic Information System (GIS) users integrate data from different sources into their systems. Providing a user with huge amounts of geodata from different sources across networks or internetworks in a short time is a common occurrence today. Information systems work on such networks to provide users with appropriate searching and analyzing abilities. Providing these abilities requires systems to communicate and work with each other. This is known as interoperability. Interoperability helps ensure the use and reuse of geodata across a wide range of activities. This not only helps avoid wastage of assets in the public domain, but also helps pay-off part of the cost of investment in capturing the data. In the domain of GIS interoperability, any differences in data sources, disciplines, tools and repositories can cause heterogeneity [Alonso and Abbadi, 1994]. Interoperability has to overcome the complexity of the data conversion and integration process. There is a long way from data transfer and data format conversion to full system interoperability. Interoperability issues not only refer to different structures and models of data sets, but also to the different methods and operations applying to the data. Nowadays, interoperability at the hardware level (e.g. signaling) and software level (e.g. operating system, network protocol, etc.) is no longer a major issue. However, data needs to be provided to users in such a way that they can use

1

Background

large amounts of it quickly and simply. Users of spatial data expect information systems to help them to search and process data by supporting them with necessary information and knowledge about the data. They also expect to have homogenous interfaces for managing, modeling and processing data from different sources. A standard or uniform model for the representation of spatial data (such as the canonical model suggested in [Worboys and Deen, 1991]) was long considered a step towards a solution to the problem of heterogeneity in data modeling. Nevertheless, low flexibility and high complexity for administration may have caused more problems than it has solved. Besides, it is unlikely that all software developers can be forced to use one specific modeling approach, no matter how good that approach may be [Goodchild et al., 1997]. The Open GIS Consortium (OGC) abstract specifications (such as [OGC, 1999a]) aim to solve the problem of heterogeneity at the spatial modeling level. Besides modeling homogeneity, presenting users with a uniform set of interfaces also requires a common view of the interpretation of data (semantics). Along with its other initiatives, the OGC started an initiative for specifications aimed at resolving semantic problems [OGC, 1999b]. Slow progress on this topic, however, shows its difficulty (see also [Goodchild et al., 1997]). As with modeling heterogeneity, where the idea of a standard and uniform model did not provide a successful solution, a standard and uniform semantics for the spatial features and properties is also unlikely to turn out to be a proper solution to the problem of semantic heterogeneity. Forcing a community to adopt a new set of vocabularies and respective semantics is not a feasible solution. On the contrary, giving communities the freedom to have their own semantics (interpretation of data) is the approach suggested in this thesis. Inevitably, the approach should provide tools and methodologies to avoid semantic heterogeneity. The following issues are discussed in this chapter. Section 1.2 illuminates the relevance and importance of the semantic heterogeneity problem to the GI community. Section 1.3 explains why ontologies are suitable as a solution to semantic heterogeneity problems (ontologies are discussed in full detail in Chapter 3). Section 1.4 presents the main goals of the thesis. Section 1.5 describes methodology taken during the work to approach the solution and finally Section 1.6 briefly introduces chapters in the thesis.

2

Introduction

1.2. Problem Definition Institutions and companies that produce all the necessary geodata for their applications are rare. A good many GIS applications use geodata from outside sources. As a consequence, there has been a major investment in the production of public domain geodata by state organizations: “In every EU country, the largest single component of the PSI [Public Sector Information] investment total is the geographical sector. … This sector takes over 37% of the total investment in PSI in France, 41% in Sweden and over 57% in the United Kingdom.” [European Commision, 2000b, Page eight] A vast and growing amount of geodata is available for institutions or companies. As a result integration of geodata from outside sources (mainly public domain) is inevitable. Integration of externally sourced data has raised many technical problems including modeling heterogeneity and semantic heterogeneity. The main concern in a solution to semantic heterogeneity is that a geodata provider and the users of its geodata share the same understanding of data. The challenging obstacle of semantic heterogeneity, which is the main concern of this thesis, is the lack of sufficient specifications, causing misinterpretation of geodata, where users have only their common sense to rely on to make the interpretation. Information communities [OGC, 1999b, Bishr et al., 1999] of geodata users and providers with their own data sets already exist. Each of them is using its own vocabulary and semantics. “Geographic information (GI) is an example of publicly held information with high potential. …, a number of practical issues make the exploitation of public sector information in Europe problematic. … The need to translate and the absence of a common terminology puts an extra burden on the European content firms.” [European Commision, 2000a, Page eight] An example here clarifies different types of the heterogeneity problems encountered in communication between interoperating systems. Two systems sharing data representing streets may confront any of the following potential heterogeneity problems:

3

Problem Definition

1. Heterogeneity in the conceptual modeling: One system represents a street as

an object class and the other as a relation. 2. Heterogeneity in the spatial modeling: Streets can be represented by poly-

gons (or a segment of pixels) in one system, while being represented by lines in the second system. 3. Structure or schema heterogeneity: Both systems hold the name of a street

whist one keeps information about the sidewalk and the other one the width. Or, lines representing the street in one system use the DXF format while in the second system they use the IGES format. 4. Semantic heterogeneity: One system may consider only the paved part as a

street while the other considers the paved part and the sidewalk as a street. One may define a street as a paved way used by automobiles with sidewalk and the other defines street as any type of way used by automobile inside a residential area. In the above list only the last differences show examples of semantics conflicts. This thesis concentrates on the conflicts caused by semantic differences rather than representation or modeling differences. (Section 3.2 specifies different issues in semantic heterogeneity in detail.) [Hammer and McLeod, 1993] considers all the differences listed above as part of semantic heterogeneity together with differences in tools and low-level data formats. They refer to the topic of this work as object comparability. The problem of semantic heterogeneity rises in different contexts. The aim of this work is to give a solution for semantic heterogeneity during the schema integration. Schema integration and its problems has already been explored in several works such as [Hammer and McLeod, 1993, Kim et al., 1993, GarciaSolaco et al., 1996, Sheth, 1998]. This thesis presents an approach for semiautomated schema integration that relies on definition of terms in an ontology. The advantage is not only to reduce the cost of the integration process but also to improve the reliability of the result. That is due to the use of explicit and detailed definition of terms in the ontologies. 1.2.1. Heterogeneity As the number of data providers and amount of data increases, integration and interoperability techniques are attracting attention. Data integration refers to combining data and interoperability refers to interaction between information systems and databases from different sources. A common and important issue in both approaches is logical consistency. A data set is considered consistent if 4

Introduction

all user defined consistency constraints are satisfied by all the data items in the data set [Grefen, 1992] —a data set can be the result of user combining data from several different sources or a user imposing a single integrated database schema on top of various underlying data sources. Two types of conflicts that are often mixed up should be clarified here: logical inconsistency and semantic heterogeneity. Many consistency constraints are derived from the semantics (e.g., [Uitermark et al., 1999a]). This fact causes confusion of logical inconsistency with semantic heterogeneity. Consistency constrains are used to validate the data in terms of data items in the data set and relations between them. An example of a consistency constraint in a spatial database is: “railroads do not cross houses”. If all the railroad and building objects in the data set obey the constraint one can consider the data set consistent according to this constraint. However, the data provider may consider railroads that pass through tunnels under the ground still as railroad; while, a user of the data might classify such railroad as a “subway”. In this case, there is an example of semantic heterogeneity caused by the different interpretations of the data producer and the user. However, the data set may be qualified as consistent according to the constraint mentioned above. This thesis addresses semantic heterogeneity, rather than logical consistency. 1.2.2. Semantics Semantics is people’s interpretation of data (i.e., relating data to what it represents) according to their understanding of the world. We refer to different interpretations of data as semantic heterogeneity. Therefore, any data exchange requires a unique interpretation of data. That calls for an exchange of knowledge which expresses semantics. We define semantics as what determines how the constants and the variables are associated with things in the application domain. It refers to the meaning of schema elements (e.g., classes, attributes and methods) and it is often used in contrast with syntax. Syntax refers to the definition of the structure of schema elements. We consider syntax to be context independent. On the other hand, semantics is people’s interpretation (of the computer representation) according to their understanding of the world and is therefore, context dependent. Users always abstract the real world according to their own needs (i.e. they neither observe nor represent all the details of the real world); therefore, any user or application has their own semantics (or way of interpreting computer representation of the real world). Different interpretations of data cause semantic heterogeneity. Relying on common sense is a criti-

5

Ontologies as a Solution

cal source of semantic heterogeneity and the explicit definition of how the data should be interpreted is a solution to this problem. In the database domain, schemata are the definitions of logical structures (or patterns) that convey the data and are the result of the database design phase. Schemata are expressed in a language known as the Data Definition Language (DDL). Part of the semantics is based on the interpretation of the DDL syntax —i.e., keywords, operators and their orders. That is, when encountering such keywords or operators a computer program takes a standard action or a human would have a standard interpretation. Another part of semantics is related to the names (or terms) one uses for identifiers in the DDL. Items in schema definitions such as: attributes, classes, methods, data types and relations are declared by specified names (or terms) and possibly some descriptions as metadata. Such verbal descriptions used to be the way to specify semantics of identifiers in schemata. Semantic heterogeneity is distinct from differences in modeling approaches used to design or represent schemata (e.g., relational and object oriented, or topological and spaghetti spatial modeling approaches). The part of semantics related to terms (terminological semantics) is the focus of this thesis, while the other part (e.g., heterogeneity of OODB schemata and RDB schemata) is not in the domain of this work.

1.3. Ontologies as a Solution Explicit representation of semantics in metadata can help in automatically detecting and resolving problems of semantic heterogeneity. The key issue is a formalism that can convey the detailed specification in a way that leaves the minimum interpretation to the users' common sense. It requires an understanding of the terms and definitions in the metadata (e.g., schema definitions) by both the data set provider (i.e., source) and the user (i.e., target) information communities. Such understanding can be achieved by specifying the understandings of different communities. An important potential solution to semantic heterogeneity problems, which has been attracting attention, is an ontology. This has led many researchers to use ontologies in their work [Guarino, 1998b, Welty and Smith, 2001] —including work by the author [Hakimpour and Geppert, 2001, Hakimpour and Geppert, 2002]. The formalization of the semantics of terms used in database schema definitions is a way to deal with semantic heterogeneity. Ontologies are

6

Introduction

considered more than schema definitions in the database domain —this thesis adopts the definition by Guarino [Guarino, 1998a]. In the domain of artificial intelligence, an ontology is an explicit specification of conceptualization [Gruber, 1993]. In this domain, ontologies have been used for reusing and sharing knowledge between agents with the emphasis on formalizing the specification of concepts and relations used by them. In the domain of philosophy, ontology explains the nature, essential properties and relations between all beings (Webster's Unabridged Dictionary) and is based on the truth and the nature of the beings independent of one’s present perspective of the world (present knowledge). As the primary property of all beings is their existence, ontology refers to philosophical investigation of existence or being. It can concern questions such as “What exists” and “What general sort of things are there.” Database schemata are more concerned with the structural aspects of data representation and many researchers have focussed on the syntactical aspects of schema integration. While schemata are used for organizing data in databases they are based on the ontology definition. Ontologies are also concerned with members of communities’ understandings. Further discussion on this issue is presented in Chapter 3. This thesis is also discussing existing approaches and tools to prepare ontologies and use ontologies during the integration of data. Applying ontologies to resolve semantic heterogeneity does not imply defining a global, unique and robust definition of terms for all communities and compelling them to have exactly such interpretation. Forcing communities to adopt a uniform set of vocabularies and respective semantics is not feasible. This approach implies imposing one single conceptualization on all communities. By using ontologies, our goal is to provide communities with the freedom to communicate based on their own defined ontologies. The main advantage of ontologies is that they help us to be independent of the implicit background knowledge (or common sense) of a community or at least have the minimum dependency on such knowledge. To that end, the notion of higher level ontology is introduced to capture common ground between ontologies of different communities. The higher level ontologies are being used as a basis to relate terms in the different ontologies. This solution has a high degree of flexibility for future use as the number of geodata providers in different contexts increases —while some also disappear. This topic is discussed in detail in Section 5.1.

7

Research Objectives

1.4. Research Objectives The important questions to be approached by this thesis are as follows: 1. How can semantics be expressed and used?

The need to specify the interpretation of data entails the need for a type of formalism to convey the semantics. Such a formalism should provide the capability to express semantics, as well as, make available a reasoning system to analyze semantics presented using such a formalism. The formalism should convey those aspects of knowledge needed for the common understanding of shared data between Geographic Information Systems. 2. What type of conflicts are caused by semantics?

Semantic heterogeneity or other related terms, such as semantic mismatch or semantic similarities, are terms that need clear definitions and understanding. Semantic heterogeneity has been analyzed based on the differences in the schema and approached by mapping between schemata. However, what provides the first incentive to compare two schema components is similarity between them —i.e., if there is absolutely no similarity between two entities, then comparing their schema definitions (if possible) cannot be meaningful. It is important to know what type of conflicts are caused by semantic heterogeneity. Classification of semantic heterogeneity based on resolution methods is a contribution of this work. 3. How can ontologies be used to resolve semantic heterogeneity?

There is a need to specify the kind of tools provided or services supported by systems taking part in semantically interoperable systems. We need to explore system components and features that play an important role in semantically interoperable systems in the domain of GIS. This work aims to propose an architecture for semantically interoperable systems. The main focus of this thesis is to provide tools to apply ontologies to resolve semantics problems, but not to build a thorough and complete ontology for the GI community. However, we use example ontologies mainly extracted from the ATKIS [ATKIS 1998] and GDF [GDF 1995] standards.

1.5. Research Methodology 1. Analyzing the Capabilities of Existing Formalisms. Two logical formaliza-

tions, Description Logic [Baader et al., 2003] and Frame-based Logic [Kifer et al., 1995], are evaluated in terms of their expressive abilities and also in terms of the reasoning abilities of systems implementing them. Other alterna-

8

Introduction

tives such as ER or UML diagrams, Conceptual Graphs [Sowa, 2000] or RDF schemata [Decker et al., 2000] are not considered (irrespective of their expressiveness), because, at this time there are no reasoning systems to analyze the knowledge represented by them. • Objectives: This study approches the first question in Section 1.4. A formalism and an implementation of a suitable reasoning system is selected to be used in further implementation work. 2. Working on Sources to Extract and Build Ontologies. The study proposed

here is to specify a sample set of common concepts which is used in one geoinformation community or a geographic standard and attempt to represent that set of concepts as an ontology described with a formalism. A study on existing approaches to build ontologies is performed and an approach to build and evaluate ontologies is selected and established. A set of ontological definitions is extracted from a geodata standard. A next step is to find the minimal definition of concepts and relations that two (or more) geoinformation communities (extracted from their standards) can agree upon —i.e., the minimum restrictions for the information communities committed to a common ontology. • Objectives: By this study, we can approach questions 2 and 3 addressed in Section 1.4. Since ontology is an important means to represent semantics, ontologies can help to formalize part of semantics related to the terms used in a community. Types of semantic similarities or heterogeneity are addressed in this study. 3. Study on Data. A study on integrating data from different sources (from dif-

ferent disciplines) is performed to understand the nature of the conflicts caused by semantic differences. This is a preliminary step to investigating the problems in practice. An appropriate application (within the GIS domain) area is defined. This step approves the suitability of the solution for integration based on the semantics of data. • Objectives: This case study approaches the objectives 1 and 2 mentioned in Section 1.4. 4. Proposing an Approach for Semantic Integration. Another issue to consider

is how systems can cooperate. Where and how the defined ontologies and semantics in previous case studies can help in a semantically interoperable system. Therefore, a study on existing architectures was performed and one of 9

Thesis Structure

these was adopted to develop a prototype with. A methodology for integration is proposed and the prototype is implemented. • Objectives: This part of work approaches questions 1 and 3 in Section 1.4. Classification of semantic heterogeneity can be made based on different criteria, but following this case study we approach the classification based on the resolution method. 5. Evaluation of the Solution. The prototype developed in last phase is used to

evaluate the capabilities of the proposed solution. The result is evaluated in terms of the problem definition and guidelines for future works are provided. • Objectives. This phase basically relates to all three question and it shows strength and weaknesses of the solution. We explore those functionalists and services needed for a total solution. The result of this phase can be considered also as guidelines for any future work.

1.6. Thesis Structure The structure of the thesis is as following: • CHAPTER 2 Related Work. This chapter reviews projects in domain of semantic interoperability with a focus on work based on ontologies. Types of problems they address and their approach to solve the problems are discussed. The review specially focuses on how ontologies are applied, while other approches are also discussed. • CHAPTER 3 Ontologies. The main goal of this chapter is to introduce the basic notions that are used and the ideas which are pursued in the rest of the thesis. The chapter starts by clarifying the definition of semantics and elaborating issues such as syntax, semantics and conceptual schema. Afterwards, the chapter suggests applying ontologies as a major solution for the detection and resolution of semantic heterogeneity. A definition of what is meant by ontology in the thesis is given. A discussion on the potential of ontologies is presented. • CHAPTER 4 Logics for Formalizing Ontologies. This chapter focuses on logic as a formalism for ontology representation. Two main options of Description Logic [Baader et al., 2003] and Frame-based Logic [Kifer et al., 1995] are discussed. Both formalisms are briefly described (a basic knowledge of logic is required) and then compared in terms of their expressive abilities. The avail10

Introduction

able implemented reasoning systems based on the above formalisms PowerLoom [MacGregor et al., 1997] and FLORID [May, 2000] are discussed based on their reasoning abilities. The comparison is made based on a set of introduced criteria and a reasoning system is selected for further work. • CHAPTER 5 Ontology-based Integration. This chapter explains how ontologies help to resolve semantic conflicts in spatial data sets and what problems can be addressed by this approach. Then, how ontologies can help to resolve the problems by examples are discussed. Types of similarities are shown here. Integration based on similarities in a federated database system is presented in detail. The relation between ontologies and schemata derived from conceptual models is discussed. • CHAPTER 6 The Solution in Practice and the Prototype. Characteristics of standards and their potential problems are studied and shown by extracting ontologies from two major standards (GDF and ATKIS, also discussed in this chapter). Technical details of the architecture of a prototype system is also illustrated. The logical model for the solution is introduced and justified. Details of how a reasoning system cooperates in the integration of the data based on the logical axioms in the ontology is presented. • CHAPTER 7 Discussion and Conclusion. An evaluation of the achievements is presented here. This chapter summarizes abilities the proposed solution can offer and presents weaknesses. It also presents the possible future improvements along with conclusions.

11

Thesis Structure

12

CHAPTER 2

Related Work

2.1. Introduction This chapter introduces research work related to database integration, as well as, geodata integration. Research work related to semantic integration of heterogeneous data sources is the main focus. We pay special attention to work using ontologies, since this thesis introduces an approach based on ontologies. We also point out some work in domain of spatial data. The next section (Section 2.2) introduces a few works concerned with semantics issues in domain of spatial data. Section 2.3 presents a general view of work that address schema heterogeneity and their integration. Afterwards, Section 2.4 briefly introduces projects addressing semantic problems in system and data integration emphasizing those using ontologies. Section 2.5 presents work contributing to the process of building ontologies. In Section 2.6, we discuss research projects addressing semantic issues on the internet. Finally, Section 2.7 depicts four trend in research work that are related to this thesis.

2.2. Semantic Issues in GI Domain The OGC (Open GIS Consortium) is an organization with members from different application domains. Its main goal is to enable GIS users to use geodata and services over networks and inter-networks. Therefore, the OGC provides specifications for interchanging information and geoprocessing services between systems. Such specifications are produced under the consensus of all OGC members.

13

Semantic Issues in GI Domain

The OGC has paid attention to semantic problems of interoperability between GISs [Buehler and McKee, 1998]. In [Buehler and McKee, 1998], the OGC suggests the use of a tool called a Semantic Translator to transfer a data set from one information community to another, though it is not elaborated on in any further detail. There is a semantic interest group within the OGC which has provided a draft abstract specification [OGC, 1999b] on semantic issues. Topics such as information community and community metadata are introduced and discussed in this specification. The concept of an information community plays an important role in the approach presented in this thesis since our definition of ontology is bound to this notion (see Section 3.4.2). [Uitermark et al., 1999b] introduces an approach for spatial data integration between two data set for updating older data set according to the newer one. The authors define two sets of semantic relations, one set between application ontology and domain ontology (equal and aggregate relations) and the second set between application ontologies (equal, relate and relevant relations). The similarity relations are taken as known and used to relate the objects in two data sets. The spatial location of the objects are also used to refine the result. The final result is used to update the older data set [Uitermark et al., 1999a]. [Bishr, 1997] presented the Semantic Formal Data Structure (SFDS) based on the Formal Data Structure (FDS) [Molenaar et al., 1994]. He illustrates an architecture for semantically interoperable systems, and is mostly concerned with the schematic aspects of heterogeneity similar to [Kim et al., 1993]. Ontologies were also mentioned in this work as solution for semantic heterogeneity. An important heterogeneity issue arises from different approaches in the modeling of space. [Kuijpers et al., 1995] discusses semantics of two linear and topological spatial models in detail. Four categories of conflicts namely classification conflicts, descriptive conflicts, structural conflicts and fragmentation conflicts are introduced in [Parent et al., 1996]. Descriptive conflicts are close to the topic discussed in [Kuijpers et al., 1995] and address representation issues of geographic features. This work addresses mainly classification conflicts but also contributes in structural conflicts. [Casati et al., 1998] depicts the basis of geographic representation. Their concern is semantics of space and a framework for representing objects in space — in contrast to the semantics of objects being represented in a geographic data set. It is important to note that this thesis is concerned with the semantics related to the meaning of concepts such as: class street and attribute width; not

14

Related Work

the semantics of the spatial models or semantics of space and time. However, work such as [Casati et al., 1998] and [Smith and Mark, 1998] are essential for building a higher level (or top level) ontology (see Section 3.4.2 for definition) for spatial objects. INTERLIS is a Swiss standard mainly used for data format conversion. At the schema level, data definition languages such as INTERLIS [Keller, 2000] can facilitate data integration. It has been developed in such a way that one can define the schema of the data set in a Data Definition Language (DDL). There are tools that can convert the data set existing under one schema to another. In the new INTERLIS standard (version 2) the data is kept in the XML format.

2.3. Heterogeneity and Integration In this section, we start by discussing work on schema integration in general and pay particular attention to those using thesauri. We present our rationale for using ontologies as an alternative to using thesauri. Afterwards, research work related to ontologies is presented in Section 2.4. [Sheth, 1998] presents an overview of interoperability issues. He divides the course of development in interoperability into three eras. The first generation is occupied with concerns such as structural heterogeneity, differences in query languages and differences in DBMSs. The second era is concerned with presenting users with uniform representations and access to heterogeneous data sources by means of metadata. The key issue in the third era is resolving semantic heterogeneity. He considers ontologies, contexts, and semantic correlation as important issues in future of interoperable systems. [Kashyap and Sheth, 1998] state that database schemata do not convey sufficient information for the resolution of semantic heterogeneity. They discuss the need for domain specific ontologies, and they present an example ontology formalized in Description Logic [Baader et al., 2003]. [Hammer and McLeod, 1993] define a broad range of conflicts as semantic conflicts and classify them in five categories. We do not share their view and consider object comparability in their terminology as a notion close to semantic heterogeneity —our view of semantics is built up in Section 3.2. [Kim et al., 1993, Kim and Seo, 1991] present a comprehensive study for the classification of schema heterogeneities. Also, solutions for several types of schema heterogeneity in RDBs and OODBs are presented. They address problems of schema heterogeneity but do not distinguish between the structural and semantic issues. UniSQL is a commercial multi-database system based on the results

15

Heterogeneity and Integration

of the same work and concentrates mainly on schema conflicts. [Sheth, 1998] classifies this work with that in the first generation. Another comprehensive study in this area is presented in [Garcia-Solaco et al., 1996]. Both the work of Kim et al. and Garcia-Solaco et al. addresses problems of schema heterogeneity, but neither distinguishes between schematic and semantic issues. This thesis focuses on the semantic problems only. [Bergamaschi et al., 1998], [Palopoli et al., 1999] and [Madhavan et al., 2001] propose schema integration approaches using thesauri. [Bergamaschi et al., 1998] introduces an approach for extracting similarity relations (synonym, hypernym and hyponym) from the schema structure of component databases. The approach is a semi-automatic relation extraction and needs the supervision of a domain expert. They introduce an algorithm to integrate schema definitions into a global homogeneous schema based on the extracted relations. Cupid [Madhavan et al., 2001] proposes an approach for schema matching. This approach takes both the similarity of the terms in the schema definitions (language similarity) and the structure of the schema into account (structural similarity). Cupid improves the thesaurus with a coefficient for every entry in it. It also categorizes schema elements into clusters —which in turn is similar to the approach described in [Bergamaschi et al., 1998]. [Palopoli et al., 1999] relies on the schema definitions of the component databases enhanced by knowledge of domain experts. The approach requires two dictionaries, a synonymy dictionary like a thesaurus, and an inclusion dictionary extracted from schemata or domain experts. Similar to Cupid [Madhavan et al., 2001], domain experts customize both dictionaries by fuzzy coefficients. [Domenig and Dittrich, 2000] also uses a thesauri-based approach for query translation which is suitable for internet search engines searching both structured and semi-structured data. In the above approaches, domain experts customize a thesaurus for an application-domain. Such thesauri neither help communication across application domains, nor foster better intercommunity communication. To allow communication between two different application domains, coefficients of the thesauri entries must be set by an expert in both application-domains. This is an important shortcoming in comparison to our approach utilizing ontologies. By using ontologies we establish the similarity relations (defined in Section 5.3), while by using a thesaurus synonym and hyponym (called inclusion thesauri in [Palopoli et al., 1999]) relations must be provided by domain experts.

16

Related Work

The work of Larson et. al. in [Larson et al., 1989] is close to the approach of this thesis. However, we distinguish and emphasize the difference between the ontological characteristics of attributes and their representational and implementational characteristics. For example, characteristics such as domain, uniqueness and cardinality are present in the relation definitions in an ontology while security constraints or scale are representational characteristics. We also propose an ontology based methodology for resolving the semantic conflicts. [Miller et al., 2000] distinguishes schema integration from schema mapping. They focus on the mapping of data, taking the integrated schema as given. Likewise, we also distinguish phases of schema integration and data mapping. However, this work focuses mainly on schema integration and we only explore the problems with the data mapping and suggest potential solutions. Another effort in the area of data mapping is introduced in [Rosenthal and Sciore, 1995]. It presents an architecture for semantic interoperability. The paper explores different kinds of interoperability problems in a distributed object management environment. The proposed architecture is based on four main constituents: argument-describers (functions to determine the assumption about the meaning and representation of arguments), conversion functions (a library of convertors), a planner (determines a strategy for converting a property-value to another, by considering argument descriptors and available conversion functions) and a request broker. They present an architecture for data mapping and conversion that complements our work very well (presented in Chapter 5) in the data mapping phase.

2.4. Ontology-Based Integration Projects This section introduces four of the major projects using ontologies for database integration that focus on semantic issues. Presented work in this thesis is in part based on result these works. 1. InfoSleuth Project. The InfoSleuth project is based on the results of Carnot,

extending its capabilities [Bayardo et al., 1997, Nodine et al., 1999]. Carnot [Singh et al., 1997] is one of the pioneer projects in the domain of integration and addresses semantic integrity. Carnot uses the Cyc ontology [Lenat, 1995] (called common-sense knowledge base in [Woelk et al., 1996]) in addition to knowledge extracted from schema definitions for semantic integration. InfoSleuth is a multiagent-based solution for information and service retrieval from autonomous and changing sources such as those on the Internet. Its archi-

17

Ontology-Based Integration Projects

tecture relies on different types of agents such as user agents, ontology agents, query agents and so on. A user can select an ontology (via a user agent) from a list of ontologies offered by an ontology agent. The ontologies are built for every user group [Fowler et al., 1999] and InfoSleuth offers the ability to build queries based on a selected ontology and consequently, it matches information or services with the user queries. 2. KRAFT Project. KRAFT [Visser et al., 1999] is a project for the integration

of heterogeneous information, using ontologies to resolve semantics problems. They extract the vocabulary of the community and the definition of terms from documents existing in an application domain. KRAFT uses shared ontology [Jones, 1998] as a basis for mapping between ontology definitions and communication between agents. In [Visser et al., 1999], shared ontology is “chosen to make shared ontology as expressive as the ‘union’ of the ontologies”. However, the definition of the union of ontologies and its similatities or differences with shared ontology is not stated. KRAFT detects a set of ontology mismatches (as described in [Visser et al., 1998]) and establishes mappings between the shared ontology and local ontologies. An important outcome of this project for our work is the methodology to build ontologies presented in [Jones, 1998]. 3. COIN Project. The COIN [Goh et al., 1999] project presents a suitable

architecture for semantic interoperability and inspired parts of work presented in this thesis. The role of the Domain Model in the COIN-architecture can be compared to that of an ontology. The components of the architecture suit an ontology-based approach. Articulation of data based on Domain Model (ontology) and relating the data with Domain Model are important facts considered in their architecture. However, a Domain Model is closer to a conceptual schema than an ontology. In an example in [Goh et al., 1999], one can see that “money amount” is considered a subtype of “semantic number” while number is only a primitive type for representing the value —or “currency type” is a subtype of “semantic string”. However, according to our definition of ontology it is based on the conceptualizations of people in a community. Therefore, “money amount” is an amount or a quantity. Treating “semantic number” as a super-type is the result of influence of application development. While, “money amount” or “currency type” are related to a value of type number or string, respectively, only for representation purpose. 4. OBSERVER. OBSERVER [Mena et al., 1998] uses ontologies to allow que-

ries against heterogeneous sources. It replaces terms in user queries with suitable terms in target ontologies, by means of Inter-Ontology relations. 18

Related Work

OBSERVER uses Description Logic (see Section 4.3) as both ontology definition language and query language and Classic [Brachman et al., 1991] performs the integration task. An interesting module in OBSERVER architecture is the Inter-Ontology Relationships Manager. It keeps the relation between ontological definitions of terms in different ontologies. By means of such inter-ontology relations, OBSERVER replaces terms in user queries with suitable terms in target ontologies. This thesis also proposes the use of similarity relations between ontologies for schema integration. The Inter-Ontology relations are synonym, hypernym and hyponym —the same three relations used in [Bergamaschi et al., 1998], see Section 2.3. The same relations are also used to build every ontology in OBSERVER. [Mena et al., 1998] refers to the translation of terms in a user ontology by the use of inter-ontology relations as ontology integration.

2.5. Building Ontologies Since ontologies are the basis of the approach presented in this thesis, this section briefly introduces the major contributions in this domain for building ontologies. Building a proper ontology is an important prerequisite in the approach we present in this thesis. 2.5.1. Methodologies for Building Ontologies The following work suggest methodologies to build ontologies, some of which we used as guidelines to build our sample ontologies (see Section 6.2): [Visser et al., 1999] illustrates a three-phased approach that they used to extract and build ontologies from an application domain in KRAFT project. It provides a methodology to build ontologies as well as the integration approach. [Jones, 1998] presents a three phase approach to build ontologies from the same project. They build ontologies based on analysis of the technical texts available in the domain. The Enterprise Ontology Project [Uschold, 1996, Uschold and King, 1995, Uschold and Grüninger, 1996] also concluded a methodology for building ontologies. [Uschold and Grüninger, 1996] presents one of the most detailed methodologies for building ontologies. The goal of the TOVE (TOronto Virtual Enterprise) [Grüninger and Fox, 1995] project is to create a generic, reusable enterprise model by providing a shared terminology for the enterprise that each agent can describe and use. TOVE 19

Building Ontologies

defines the meaning (or semantics) of each term in as precise and unambiguous a manner as possible. It prepares the semantics in a set of axioms that will enable TOVE to automatically deduce the answer to many common sense questions about an enterprise. TOVE also provides a symbology for depicting a term or the concept constructed thereof in a graphical context. METHONTOLOGY [Fernandez et al., 1997, Gomez-Perez et al., 1996] is a methodology to build ontologies going from the planning phase to the formalization. It also considers the maintenance and the life-cycle of ontologies. A survey on the methodologies for building ontologies is presented in [Lopez, 1999]. The survey covers most of the above methodologies. 2.5.2. OntoClean A research group on the ontological foundations of knowledge engineering and conceptual modeling is exploring the role of ontology in different fields. Work of Guarino and Welty in this group is one of the major contributions to building and evaluating ontologies. They define notions such as identity, unity, individuality, and rigidity [Welty and Guarino, 2001, Guarino and Welty, 2002]. These notions play an important role in qualification of taxonomy hierarchies. By applying such notions one can evaluate an ontology in term of its explication (i.e., how an ontology reveals implicit assumptions) and its accordance with the conceptualization of a community. 2.5.3. ONTOLINGUA Ontolingua [Farquhar et al., 1997] presents a language to formalize ontologies. The Ontolingua language is originally based on KIF and also provides the Frame-Ontology1 for frame based (object-centered) knowledge representation. Ontolingua is highly expressive. However, such an expressive language has resulted in the fact that no reasoning system is yet supporting the Ontolingua language. The Ontolingua Web site provides users with tools to define their ontologies. Users can translate their ontologies into different languages (e.g., KIF, Loom or Prolog) to be used with existing knowledge based systems. The library of ontologies provided by Ontolingua is a useful reference to find and adopt existing ontologies as higher-level ontologies. For instance, we adopted the ontology of Physical-Quantity in our work.

1. See the documentation of the Frame-Ontology in Ontolingua ontology library at http://ontolingua.stanford.edu. 20

Related Work

2.5.4. (KA)2 (KA)2 [Benjamins and Fensel, 1998, Fensel et al., 1999] is an initiative for building ontologies in the knowledge acquisition community. The goal of the project is to develop an ontology for the participants. The ontology built by this initiative is mainly used in On2Broker (see Section 2.6.2) for searching on the Web. The (KA)2 Ontology is developed using Ontolingua tools (see Section 2.5.3). Participants in (KA)2 need to manually annotate their Web pages based on a specified annotation to facilitate the search process by On2Broker. The annotations relate information on Web pages to the (KA)2 ontology. 2.5.5. Affordance [Kuhn, 2001, Frank and Kuhn, 1998] present a new approach towards defining semantics. Unlike conventional approaches for defining ontologies mainly based on logical phrases, they define semantics based on activities. That is, defining categories of objects based on the actions they can afford (affordance [Gibson, 1986]). An immediate advantage of defining categories of objects according to the activities is reducing the magnitude of the problem of detecting similarities between the categories. That is due to the fact that number of possible activities are much less than the number of categories in a domain. This idea is being used in our logical definitions to some extent. However, [Kuhn, 2001, Frank and Kuhn, 1998] are using a functional language (Haskell [Hudak, 2000]) for formalizing the activities rather than using logical formalisms. Such formalism presents the advantage that one can execute the formalized semantics on an object.

2.6. Semantic Issues on the Internet Semantic problems have attracted attention on the Internet [Fensel, 2001]. Considering that the Internet is a rich and wide spread repository of data, searching data using the semantics of documents has become an important issue. The Semantic Web as described in [Berners-Lee, 1998] is a goal for the W3 Consortium. Search engines on the Semantic Web will be able to discover documents by their formal description, their relations with each other and the description of their contents. The formal descriptions should made in a way that can be processed by reasoning systems. Projects in the framework of Semantic Web such as SHOE [Heflin and Hendler, 2000] and On2Broker [Fensel et al., 1999] are using ontologies to improve the ability to search the World Wide Web. In the following both On2Broker and SHOE are briefly intro-

21

Semantic Issues on the Internet

duced. A comparison of SHOE and On2Broker as well as a comparison to our approach can be found in Section 5.1. RDF and RDF schema as a major result of the Semantic Web are then introduced. Finally, OIL which is a standard for ontology exchange for web is discussed. 2.6.1. SHOE SHOE [Heflin and Hendler, 2000] is a example of a search engine using ontologies that is based on Description Logic (see Section 4.3). It uses a set of extra tags which must be added manually by the authors of the HTML Web pages to annotate their content. In SHOE, ontologies are built as a taxonomy hierarchy and a user queries the system by traversing the hierarchy. SHOE Ontologies have the following elements in their definitions: • Categories (concepts) • Relationships (relations) • Constants are particular instances needed generally for concept definition • Inferences (constraints or rules) • Definitions are verbal documents The author of a Web page can specify any available ontology and, by adding tags defined in the specified ontology, refer to the definitions in that ontology. This contrasts with On2Broker (Section 2.6.2) where specific ontologies such as (KA)2 are available as a basis for search (see for Section 5.1 comparison). A Webcrawler looks for the Web pages with SHOE annotation tags and keeps the extracted information in a knowledge base. This knowledge base is used in turn to reply to a search query. A user should query the system by means of traversing the taxonomy hierarchy of an ontology. 2.6.2. On2Broker On2Broker [Fensel et al., 1999] (new release of Ontobroker [Decker et al., 1999]) uses ontologies represented in a language based on Frame-based logic (see Section 4.4). Reasoning in On2Broker is based on closed-world assumption (unlike SHOE which is based on open-world assumption) and deals with a domain specific ontology for every query. Like SHOE, On2Broker uses its own extension to HTML tags, that should be added by the document’s author. On2Broker uses an ontology produced by the (KA)2 initiative [Benjamins and Fensel, 1998]. It maintains a taxonomy hierarchy by means of IS-A relations

22

Related Work

(as in SHOE) and represents attributes in their definitions. It uses a formalism similar to Frame-based Logic for reasoning and querying the system. As a result, the means to define relations between concepts in On2Broker is by rule definition. Here are the main features for the definition of terms in On2Broker: • is-a hierarchy of inheritance for terms (similar to hierarchy of concept definitions) • attribute definition (similar to a role definition or a relation with only type constraint) • rules can not only play the role of constraints but also can be used to establish relations 2.6.3. RDF RDF (Resource Description Framework) is a standard that can be used for representing ontologies. It was introduced by W3 Consortium to describe the relationships between resources (Web documents) on the Internet [Decker et al., 2000]. All knowledge represented in RDF is based on a triple: subject, property and object. Property establishes a directed relation between two resources (and/or literals). RDF does not present many features and contain little predefined semantics. RDF Schema offers a set of RDF resources such as: class, subclass-of, attribute-of, subproperty-of, etc. to define properties, subjects or objects. Suitable reasoning systems are required to be developed to process the knowledge presented by RDF. An example of such system is SiLRI [Decker et al., 1998] which is a Frame-based Logic (see Section 4.4) based inference engine. 2.6.4. OIL OIL (Ontology Inference Layer) [Horroks et al., 2000, Fensel, 2001] is a standard language to support exchange of ontologies on the Internet. It mainly extends the capabilities of XOL [Karp et al., 1999]. OIL is based on both Description Logic and Frame-based Logic (see Chapter 4). It inherits positive aspects from both formalisms by supporting the modeling primitives of Framebased Logic and the formal semantics of Description Logic. [Horroks et al., 2000] presents OIL in both XML and RDF syntax. Consequently, OIL is potentially a powerful tool in the framework of the Semantic Web.

23

Discussion

2.7. Discussion Since the concern of this thesis is semantic integration by means of ontologies, this chapter focuses on work addressing questions such as: • how we find semantic relations between schema elements of local databases; • how ontologies can help to solve heterogeneity problems; • what kind of heterogeneity problem ontologies can find or prevent; • how we relate ontologies with schemata; and • how ontologies can interact with existing system architectures. The projects introduced in Section 2.4 are major research work directly related to such questions. Although, research work in Section 2.3 shows general integration approaches especially those performed using thesauri rather than ontologies. In contrast to work based on thesauri, we prefer to establish the similarity relations (introduced later in Section 5.3) based on ontologies. By using a thesaurus synonym and hyponym relations are either provided by domain experts or extracted from schema definitions with the help of domain experts. At first glance, using human experts to find the mapping between schemata appears easier and cheaper —as is done in [Bergamaschi et al., 1998, Madhavan et al., 2001, Palopoli et al., 1999]. However, to provide such solution we would need to use an expert in both domains to find out the relevant relations. Otherwise (i.e., in case of lack of expertise), we take the risk of producing error-prone mappings. Furthermore, using computers and applying an automated approach has known advantages (e.g., consistency, endurance) in comparison to relying on a human expert. In addition to the above-mentioned area of research work (which is directly related to database integration), further areas of research are related to this thesis from two other points of view, namely: building ontologies and formalizing and processing ontologies. Each of the three topics are subject to other research related to this thesis. Building ontologies refers to the extraction of specifications from a community’s conceptualization. Building ontologies concerns questions such as: • how ontologies explicate implicit assumptions; • how ontologies from existing knowledge sources and documents are extracted; • how a taxonomy tree is built and evaluated; • how an ontology is evaluated in accordance with the conceptualization of a community; and 24

Related Work

• how ontologies are organized.

The answer to these questions also has direct consequences on managing the ontologies. In this thesis, we do not use the term design for ontologies in contrast to conceptual models. This is because we do not impose any new term or definition of a term on the communities, rather we explicitly state the specifications of the existing concepts in a community. The last perspective is formalizing and processing ontologies. Formalization defines the way in which we express ontologies instead of using natural languages. Processing ontologies refers to the process of reasoning with the formalized definitions in ontologies. Problems such as • how to formulate or represent a definition in an ontology; • what mechanisms are important in formalizing ontologies; • how to recognize inconsistencies among intensional definitions; or • how to classify instance data according to the definitions. are addressed here. A variety of formalisms can be used such as logical languages, RDF [Decker et al., 2000], ER-diagrams, UML diagrams, Conceptual Graphs [Sowa, 2000], which can be distinguished with respect to their degree of expressiveness. Although another factor that plays an important role in this work is ability of existing systems to process these formalism. This chapter mainly paid attention to the research work contributing to the first perspective since it is the closest to our aims. Each of the other three related topics are the subject of research work which is not directly related to the scope of this research. Therefore, we briefly introduced work related to the other perspectives. Chapter 4 discusses issues related to formalizing and reasoning with ontologies in more details and shows a comparison between two reasoning systems.

25

Discussion

26

CHAPTER 3

Ontologies

3.1. Introduction Global database schema generation is a critical task necessary for information integration. Semantic heterogeneity is a major obstacle to perform this task which refers to misinterpretations of data. The term “semantics” has been used in a variety of different scientific disciplines with slight differences in its perspective. In the field of information system technology, some consider semantic and schematic heterogeneity the same [Bishr, 1997, Garcia-Solaco et al., 1996], while some consider it as only part of schema heterogeneity [Kim et al., 1993]. Consequently, approaches to resolve semantic heterogeneity depend on how the term semantics has been interpreted. This thesis suggests applying ontologies to resolve semantic heterogeneity problems. This chapter takes up the idea of semantics by adopting the definitions from logics and knowledge representation. First, in Section 3.2, a few basic terms such as syntax, semantics and symbols are discussed. This section recognizes the difference between two types of semantic problems in the interpretation of terms in logical schema1 definitions. It determines the type of semantics problem the thesis is going to address. The role of schema definitions in a database is shown at the end of this section. Section 3.3 introduces conceptual schemata and enhances their existing role for organizing data in databases. This section intends to provide a background understanding in advance of the next sections in which ontologies are compared or related to the conceptual 1. We are using the term schema in this chapter to refer to database schema, also known as logical schema. 27

Syntax and Semantics in Schema Definitions

schemata. Section 3.4 defines ontologies and describes their characteristics in more details and explains their potential advantages and difficulties. Section 3.5 discusses issues related to ontology in philosophy and psychology. It clarifies the differences in the perspective of the two disciplines in comparison to the application of ontology in this thesis. Finally, Section 3.6 draws conclusions.

3.2. Syntax and Semantics in Schema Definitions This section emphasizes the contrast between syntax and semantics. It defines semantics as the interpretation of symbols. Section 3.2.1 divides problems related to semantics of symbols into two parts, while specifying the division which is further discussed in the rest of this chapter. Section 3.2.2 depicts the role of schemata and symbols in organizing data in databases. [Sowa, 1984] presents a view of both terms syntax and semantics as follows: “In general, however, both syntax and semantics are important: syntax determines what slots the words fill in the sentence, and semantics determines what slots they fill conceptually.” (Page 255) He also mentions that: “Phonology, syntax and semantics are independent but interacting strata of language.” (Page 263) In the domain of database technology, semantics refers to people’s interpretation of data stored in a database. Where [Sowa, 1984] again states that: “The common aspect that unifies all the groups [of specialists] is a knowledge of the meaning of the data and the constraints necessary to keep it a faithful model of the real world. The study of the meaning and constraints on the data is called database semantics.” (Page 303) The American Heritage® Dictionary defines semantics as following: “The study of relationship between sign and symbols and what they represent.” This is a general definition and mainly is used in the domain of language and logics. This definition considers the relation between symbols and whatever (whether concrete or abstract) they represent. Organization of data in databases is always presented in a formal language2 —i.e., orders of particular digital symbols. Difference in interpretation of the symbols in a database is the source

28

Ontologies

Grammar Order

Symbols Non-Terminal Symbols Constants

Terminal Symbols Identifiers

Figure 3.1. Classification of symbols or terms in a grammar.

of semantic problems during communication. It is important to note that language is basically a means of communication —between people and/or computers. We adopt the definition from American Heritage® Dictionary and define semantics in this thesis as what determines how the constants and the variables are associated with things in the application domain. The semantics of a language is a notion often used together with the syntax or the grammar of a language. Grammar is used to validate sentences of a language. Grammar is also viewed as a set of rules for building valid sentences. In other words, a language is the set of all sentences that conform to its grammar. In turn, grammar has two main components: symbols and order [Mosses, 1990]. Order is represented by relations among symbols. Both order and symbols can be recognized in notations such as the Syntax Diagram or BNF (Backus Naur Form) [Marcotty and Ledgard, 1987]. A part of the grammar of INTERLIS 2 (see Section 2.2 [Keller, 2000]) is shown in Figure 3.2 in a Syntax Diagram. Arrows show the order and nodes show the symbols. In this context, terms appearing in the diagram are symbols. There are two types of symbol: terminal and non-terminal (Figure 3.1), as defined in [Mosses, 1990]. Terminal symbols appear in the sentences of a language. For instance, consider the statement in Table 3.1, which is a valid sentence according to the grammar described in Figure 3.2. All the terms (or symbols) in the sentence of Table 3.1 are terminal symbols. Non-terminal symbols do not appear in the sentences of a language and have their own definition by another symbol or an order of a set of symbols —e.g., “class-name” or

2. The word language is used to refer to formal languages used in computer systems in this chapter. 29

Syntax and Semantics in Schema Definitions

Class definition ::= EXTENDS

CLASS class-name

classRef

properties

STRUCTURE

ATTRIBUTE =

AttributeDef

PARAMETER

ConstraintDef

ParameterDef class-name

END AttributeDef ::=

LocalAttribute

attribute-name

Properties

:=

;

FunctionCall

Composition

:

;

RelAttribute LocalAttribute ::=

AttrDomainDecl ::=

TypeOrDomRef MANDATORY AttrDomainDecl TypeOrDomRef TypeOrDomRef ::=

Type ::=

Type

BaseType

DomainRef

LineType

BaseType ::=

TextType NumericType StructuredUnitEnumerationType

TextType ::= AlignmentType TEXT

*

MaxLength-PosNumber

BooleanType

NAME

CoordinateType

URI

BasketType

Figure 3.2. Part of INTERLIS 2 grammar shown by Syntax Diagram.

30

Ontologies

“AttributeDef” in Figure 3.2. Non-terminal symbols are shown by rectangles in the Syntax Diagram in Figure 3.2. 3.2.1. Semantics of Symbols As stated previously, semantics is interpretation of symbols. Here we focus on the terminal symbols and distinguish between two different type of terminal symbol. All nodes shown with ovals in Figure 3.2 are terminal symbols (they appear in sentences of the language —e.g., Table 3.1) and are called constantsymbols. This type of symbol appears both in the sentences of the language and in the grammar with no changes. Nodes shown as rectangles are non-terminal symbols and will not appear in the sentences of a language. Non-terminal symbols are replaced either by a terminal symbol (such as “class name” and “attribute name”), or by a sequence of symbols specified in the grammar (such as “properties” and “AttributeDef”). Eventually, terminal symbols replace all the non-terminal symbols in a grammar to build a valid sentence of a language. The terminal symbols that are used to replace these non-terminal symbols are called identifiers. Replacement of non-terminal symbols with the identifiers is done according to a user’s preference. The differences in the interpretation of identifiers causes the semantic heterogeneity problem. That is why identifier symbols (such as “building” and “story” in Table 3.1) are of special importance. Users of a language (i.e., database designers or programmers) are free to select their own terms (or symbols) for naming classes, attributes and so on, with no obligation to formally define their intension. The explicit formalization of the interpretation, instead of having to rely on common sense to interpret identifiers’ names, can help to deal with the semantic heterogeneity problem. Interpretation of identifier symbols in schemata is the main concern of this thesis and discussed in the rest of the thesis. In contrast to identifiers, constant-symbols defined in a grammar have predefined (or standard) semantics —e.g., “CLASS”, “=” and “TEXT” in CLASS building = ATTRIBUTE buildingCode:MANDATORY TEXT*12; campus: MANDATORY TEXT*20; story: 1..999; UNIQUE buildingCode END building;

TABLE 3.1. Example of a schema definition in INTERLIS.

31

Syntax and Semantics in Schema Definitions

Statement for defining building: BUILDING

(

building code

,

campus name

,

number of stories

)

Figure 3.3. This example shows how schema definition in Table 3.1 can be considered as grammar for a language that states the propositions of a model.

Figure 3.2 and Table 3.1. For instance, the semantics of the constant-symbol “CLASS” states that the sentence in Table 3.1 is defining a class; and a term appearing after it is the name of the class and so on. The semantics of the term TEXT determines that the identifier before the “:” sign is a variable name that can accept a string of letters and specific operations can be applied to it. A programmer writing code in a language (e.g., INTERLIS) should know (or understand) the semantics of constant-symbols prior to using that language. Understanding of semantics of constant-symbols is necessary in order to communicate through a language and receive expected reactions from the corresponding software system. When we receive the expected reaction, we say one knows a language (or the computer knows or understands the language), otherwise, it is considered either a misunderstanding of the semantics of the language by the programmer or a failure of the software system. Different sets of constant-symbols with different interpretations can cause part of the semantic heterogeneity in communication between systems. As an example, consider terms “Class”, “Attribute” and “Table” in different DDLs used by different database systems. Semantics of such terms is part of the semantic heterogeneity problem at a different level and has already been subject to research (such as: [Behm, 2001]). However, it is not in the scope of this thesis. 3.2.2. Role of Schema Definitions in Organizing Databases The language based on the grammar of Figure 3.2 (INTERLIS) is mainly used to describe the organization of data in spatial datasets. The example in Table 3.1 shows a sentence of the language defined by the grammar in Figure 3.2. This sentence expresses the following propositions in natural language: “Class Building has an ID and (is located on) a campus; both ID and campus are shown by combinations of characters. A Building also has a number of floors shown by an integer number.” In turn this statement describes the structure of data. That is, one can derive a grammar from the example in

32

Ontologies

The grammar of the data definition language (DDL)

Database Schema definitions

Models of the states of the mini-world (or data in a database)

The grammar (or structure) of the language to describe states of the world

Figure 3.4. Interpretation of schemata as a grammar that specifies the language in which the models of the mini-world are expressed.

Table 3.1 (mainly by dropping the constant terminals). Such grammar is shown in Figure 3.3. This is a grammar for a language to state facts in the database. In other words, statements of this language are stored as data in the database. As an example: PARCEL(BAU27,Irchel,4)

is a statement in a formal language according to the grammar of Figure 3.3. It states that “BAU27 is the code of a building which is located on Irchel campus and has 4 floors.” A set of such expressions creates a model of a state of the mini-world. mini-world3 is part of the world that is the subject of modeling. It includes only those aspects in the world that are of interest to a particular application. For example, all the possible states of the world in which “BAU27” is inside “Irchel campus” are considered one state of mini-world, independent of where it is located inside the campus. That is due to the fact that the location of “BAU27” is not of interest. A model4 is a set of sentences (propositions) that hold in a state of the mini-world, as in this example. The content of a database state is a model of a state of a mini-world. This model is expressed in a language which is defined by the schema definition (Figure 3.4). In short, we distinguish two languages with two grammars at two different levels. One language (DDL) is used to present database schema definitions (e.g., Figure 3.2). The schema definitions in turn contain the grammar (or structure) 3. Note that we distinguish between the states of the mini-world and states of the world in philosophical or logical sense. 4. Here, “model” refers to a representation of a state of the mini-world. It is close to what is called database state in the database community. To avoid any confusion one should note that it refers neither to conceptual model (e.g., represented by a UML or ER diagram) nor to database modeling approaches (e.g., relational or hierarchical) nor to database descriptions (or data model). 33

Conceptual Schemata

of data stored in database (e.g., Figure 3.3). The stored data describe models of the states of the mini-world.

3.3. Conceptual Schemata In this section we show how conceptual schemata are used to model the miniworld in a database. The role of conceptual schemata as a basis for developing database schemata is the main concern of this section. We also show how database schemata are contributing to the organization of data. An important motivation for the discussion in this section is to enhance the differences between conceptual schemata and ontologies introduced in the next section, which has been often a source of confusion. Various literature defines conceptual schema in slightly different ways. In some literature the term conceptual schema refers to a way of representing the miniworld in an application domain and conceptual model refers to the language or the approach (UML, ER modeling, etc.) used for describing the conceptual schemata. We share our view of conceptual schema with [Loucopoulos, 1992], as follows: “The important functional part of a requirements specification usually takes the form of a conceptual schema, defined according to some conceptual model, incorporating static as well as dynamic properties and rules of the application domain. The primary use of a conceptual schema is in understanding a specific application domain...“[Loucopoulos, 1992] Some literature refers to conceptual schema by the term conceptual data model (such as [Elmasri and Navathe, 2000]) or the term conceptual model (such as [Atzeni et al., 1999]). [Sowa, 2000, section 7.2] illustrates conceptual schema as application knowledge shared between a database, the applications and the user interface. According to [Elmasri and Navathe, 2000], a conceptual model is defined as: “Conceptual data models provide concepts that are close to the way many users perceive data …” in contrast to physical data model. [Elmasri and Navathe, 2000] also introduces representational data models (e.g., a relational data model) as a level between the two. This level is a pragmatic level and more close to the our notion of conceptual schema.

34

Ontologies

Batini et al. [Batini et al., 1992] indicate that conceptual design is based on “specification of requirements and results in conceptual schema of a database”. It is independent of storage structure and implementation software. Another definition of conceptual schemata is as follows: “... conceptual models allow the description of the organization of data at a higher level of abstraction, without taking into account the implementation aspects.” [Atzeni et al., 1999, page 160]. The common points in the above literature are vital for a clear perception of conceptual schemata. The major points are as follows: • Conceptual schemata are based on user needs and application requirements specifications (expressing characteristics of an application domain) [Batini et al., 1992, Elmasri and Navathe, 2000, Loucopoulos, 1992]; • Conceptual schemata describe a high-level organization of data while hiding details of their physical structure in a database (schemata are based on the conceptual schemata) [Atzeni et al., 1999, Batini et al., 1992, Elmasri and Navathe, 2000]; • Conceptual schemata are independent from implementation and software systems (e.g., DBMSs) [Atzeni et al., 1999, Batini et al., 1992]. A conceptual schema is the result of a conceptual design process and is used to implement systems. The purpose of conceptual schemata is to satisfy application requirements and the representation of a mini-world. Efficiency of querying and analysis, implementation cost, flexibility for further developments and so on, are important factors in qualification of a conceptual schema. A conceptual schema also provides a set of constraints that guarantee the four following conditions in a mapping H shown in Figure 3.5. H maps a set of states of the mini-world to their respective models of the mini-world (or states of database). In other words, the mapping H relates the states of the mini-world to true statements of a model which describes the state of the mini-world. The following conditions are often implicitly considered while designing a conceptual schema. 1. H maps every state of the mini-world to one model —i.e., it is a function.

This condition guarantees that for every possible state of interest we have one and only one true model. There is no state of the mini-world that we fail to describe by a model and we have only one unique model for describing it. Otherwise, we will have confusion in representing a state of the mini-world.

35

Conceptual Schemata

A model of the state of the mini-world (mi)

A state of the mini-world (wi)

Building

Mapping H

Street

Figure 3.5. Mapping (H) from a state of mini-world to its model. 2. This mapping should be an injective function (or one to one function)

[Weisstein, 1999]. That means every model represents a particular state of the mini-world. Different states of the mini-world have distinguished models to describe them. In other words, no two states are mapped into the same model of the mini-world. Otherwise, the interpretation of a model could cause confusion. The logical interpretation of a model does not comply with this condition. This is due to the fact that we consider a finite and simplified set of states of the world for modeling (as described in Section 3.2.2). In the logical interpretation states of the world are infinite and all the details of the world are considered. That is, many states of the world can be mapped to the same model. 3. The Mapping H should also be a surjective function (or onto function)

[Weisstein, 1999]. This fact guarantees that every model is representing a state of the mini-world. If a model cannot be related (inversely mapped) to a possible state of mini-world it is a false or inconsistent model. Practically, we avoid such models by appropriate consistency (or integrity) constraints.5 4. The last property of such a mapping takes operations into account. If there

exist operations (ok k=1…n) over the states of the mini-world (these operations map one state of the mini-world to another —O:W→W), then corresponding methods (pk k=1…n) should also exist on the models (P:M→M) — see Figure 3.6. These methods provide us with the ability to update our 5. A mapping of both onto and one to one is called bijection or one to one correspondence [Weisstein, 1999]. 36

Ontologies

States of the mini-world (W)

Models of the mini-world (M) mi+2

wi+2 ok

wi+3

pk

Mapping H

wi wi+1

mi+3

mi wi+4

mi+1

mi+4

Figure 3.6. Methods should comply with their counterparts in the world.

models to represent the intended state of the mini-world. One should note that the methods (P) corresponding to the operations (O) are updating the models. That is, methods in general sense may also analyze or process the data presented in the model without essentially changing the facts presented by the model. For example, a method that analyzes a model and produces some statistics is not essentially changing the model itself. It only reforms the facts (or data) already presented in the model. The methods (P) should comply with their counterpart on the states of the mini-world (O) by satisfying: H(ok(wi)) = pk(H(wi)) (Figure 3.6). This condition assures that the result of method pk complies with the mapping H. That is, if we can follow all the operations applied to the states of the mini-world and apply the corresponding methods on the models, then, our models represent corresponding states of the mini-world with no need to apply mapping H for every state. A mapping with all four conditions is similar to algebraic isomorphism or homeomorphism between topological spaces [Kuhn 1994]. Conceptual schemata are a result of the application analysis, conceptual design process and a major aid for database design. An important benefit of conceptual schemata is that they are free from implementation consideration. Schema definitions in databases are mainly derived from conceptual schemata. Database schema definitions are a logical description of conceptual schemata in a formal language (called Data Definition Language or DDL) suitable for a database to express the organization of data.

37

Ontologies

3.4. Ontologies This section introduces ontologies together with the notion of a community. We also describe the difficulties of building and using ontologies. We motivate the application of ontologies by presenting the potential advantages. Finally, we introduce the notion of ontology in other disciplines and briefly state the impact of works in the domain of cognitive science on the approach introduced in this work. Interpretation of terms (symbols) in schemata and relating them to members of the domain is usually taken for granted during conceptual schema design. The role of ontologies introduced in this section is to guarantee that symbols (such as “Street” in a mini-world as shown in Figure 3.5) are consistently attached to the members of the domain. An ontology can improve the consistency of the interpretation of communities of a mini-world and their models. By using ontologies, communities are able to communicate based on their defined ontologies. In the domain of philosophy, ontology is a branch of metaphysics that explains the nature and essential properties of beings or things, and is based on the nature of beings independent of one’s understanding. As the primary property of all beings is their existence, ontology refers to philosophical investigations of existence. It may concern questions such as “What exists?” and “What general sorts of things are there?”. From the perspective of cognitive science every branch of science has its own ontology. It is a collection of categories of things that can or do exist in a domain and relations among these categories. In the domain of artificial intelligence ontology is defined as “explicit specification of conceptualization” by Gruber [Gruber, 1993]. In this domain ontology has been used for knowledge representation and sharing or reusing knowledge between agents, with emphasis on formalizing the specification of concepts and relations used by the agents. Ontologies also attracted attention in the integration of information systems and databases [Guarino, 1998b, Welty and Smith, 2001]. 3.4.1. What is Ontology An Ontology consists of logical axioms that express the meaning of terms for a particular community. Logical axioms are the means to introduce concepts, relations and their taxonomic hierarchies and also to express constraints. An ontology exists only under a consensus amongst the members of a community [Bishr et al., 1999, OGC, 1999b] —e.g., users of one information system or 38

Ontologies

people in one discipline. We add the notion of community to emphasize the fact that there should be people who agree with the meaning expressed by the logical axioms. The logical axioms in ontologies define explicit specifications of conceptualization [Gruber, 1993]. The definition of conceptualization by Guarino in [Guarino, 1998a] is used here. Conceptualization is defined by a domain (D), a set of states of the world (W), and a set of intensional relations6 (ℜ) (also termed conceptual relations). The set of intensional relations introduced by Guarino [Guarino, 1998a] is a key issue in his definition of conceptualization. ℜ maps every state of the world in W to a world structure (R) —i.e., ℜ(W)→R. A world structure consists of extensional relations7 representing the world. A simple example of a conceptualization is described in the following. The domain and the intensional relations in this example are as following: D= {A, B, C}

(Domain of the world)

ℜ= {ρs, ρwider}

(Intensional relations)

Members of the domain D are individuals (or things) that one can distinguish in the world. The states of the world (W) for this example are illustrated in Figure 3.7. They are the states that one recognizes in the world and each has a particular state of affairs between members of the domain. The set of intensional relations (ℜ) (mentioned above) map every state of the world to the following extensional relations: R= {Rs, Rwider}

(Extensional relations)

By looking at Figure 3.7, one can guess (or learn) that the relation ρs is called “street”.8 By explicitly stating that such relation intends the meaning of “street”, the relations will be more than only a Cartesian product of sets (e.g., DxD).

6. The term “intension” is defined in WordNet [Fellbaum, 1998] as follows: What you must know in order to determine the reference of an expression. Sowa defines intension in [Sowa, 1984] as follows: “The intension of a word is that part of meaning that follows from general principles in semantic memory.” He also gives examples of principles stored in semantic memory: “... All cats are animals and wallets are designed for carrying money.” 7. [Sowa, 1984] defines extension as: “The extension of a word is the set of all existing things to which the word applies.” 8. It is more like trying to answer a question “what is a street?” by showing some instances of streets. Such approach of representing semantics is similar to prototype theory of semantics discussed in Section 3.5 39

Ontologies

World Structures

Illustration of possible states of the worlds (W)

C

Rs={, } Rwider={}

A B

A C

B

.....

.....

Intensional Relations (ℜ)

Rs={
, } Rwider={}

.....

.....

Figure 3.7. Role of intensional relations in a conceptualization.

In simple words, one can think of intensional relations (in ℜ) as black boxes that take states of the world (W) as input and have members of the domain classified by extensional relations (R) as the output. We can only approximate these intensional relations by logical axioms. These approximate definitions by logical axioms are called intensional definitions for terms. With such a definition of conceptualization (W,D,ℜ), we are able to overcome the lack of the extensional relations to represent the semantics of the relation Rs. By means of explicit definition, it is possible to define a mapping from each state of the world to one particular subset of all possible tuples in the relation Rs (see [Guarino, 1998a]). A set of extensional relations is a representation of the state of the world, and is called a world structure (Figure 3.7). It is the result of mappings from states of the world by means of intensional relations. What is usually referred to as “the real world” is similar to the world structure, in Figure 3.7. When people refer to “the real world” they already use terms such as building or street in the real world to refer to the individuals in the domain. As shown in Figure 3.7, a property like “being a street” is attributed to an individual not only based on a state of the world, but also, on the conceptual relation ρs. It is in the world structure that the property of being a street is attributed to the individual “A”. As an example, the following axioms for the intensional relation “street” (ρs) are defined:

40

Ontologies

street(x) if and only if, • x transports automobiles, and • x is surfaced, and • x has sidewalks. In fact, all the new terms (such as “transport” or “automobile”) introduced in the definition have to be defined as well. One can assign terms such as “street”, “road” or “strasse” to this intensional relation (Figure 3.8). Mapping from used vocabulary of a community to intensional relations (ℜ) is the commitment of community to the definitions in the ontology [Guarino, 1998a]. Finally, it is important to state the difference between a thesaurus and an ontology. A thesaurus applies a limited number of known relations among terms (such as synonym and antonym), in contrast, an ontology describes the relations among the concepts referred by terms (Figure 3.8). Ontologies define concepts and their relations. 3.4.2. Communities The notion of community plays an important role in the definition of ontologies. By adding communities to the definition of ontology we emphasize the importance of agreement. Amongst all the factors for defining a community, possessing a common norm and culture is an essential one [Hillery, 1955]. [Erickson, 1997] notes that a community suggests shared values between its members: “Community members may share a common set of concerns, values, goals, practices, procedures and symbols. Communities typically

World

Conceptual level

Linguistic level

States of the World

Concepts

Terms

Figure 3.8. A thesaurus introduces the relations between terms at the linguistic level.

41

Ontologies

have a shared history, and shared artifacts and places.” [Erickson, 1997] From a pragmatic point of view (as it is treated in this thesis) a community is defined as a set of people who agree upon and commit to an ontology. One can also define a community as set of users in an application domain. However, such a definition will relate ontologies to application domains. In this case, ontologies can help in communication between databases in the same application domain which can be basically done by conceptual schemata, as well. However, the proposed definition here aims to release ontologies from a dependency on application domains. This is an important difference between ontologies and conceptual schemata. While conceptual schemata are applicationdomain dependent, ontologies are based on people’s understanding. Consequently, ontologies can help in finding relations between terms in the application domains and conceptual schemata. A community can commit to ontologies of other communities. A group of communities can accept and commit to an ontology with more general terms and fewer constraints. These communities can use such general ontologies to develop their own specialized ontologies. One can find examples of such ontologies in the library of ontologies in Ontolingua [Farquhar et al., 1997]. Such ontologies with general terms and minimum constraints which are used by many communities to develop specialized ontologies are called higher-level ontology. The more detailed the definitions in an ontology the more difficult it becomes to reach a consensus within the community. This holds for building a higherlevel ontology for several communities. For instance, one may try to give a more precise definition of “street” by adding the following condition to the definition in Section 3.4.1: • x is located inside a residential area. This axiom specializes the definition of “street” so all members may not agree with adding the above axiom. Such details can create an invalid definition for some members of a community. Therefore, an ontology for a large number of communities cannot be easily specialized or represent the complete conceptualization of a community. Adding constraints can specialize an ontology for a smaller community within a larger community or among several communities committing to a more general community. That is, a community can adopt a general ontology and specialize it by adding more terms. A definition may only be added to an ontology under the consensus of all members. That is, no mem42

Ontologies

ber can alter or override the definitions in an ontology according to his/her preferences —otherwise, there will be occurrence of hyponym. Yet, one can add new terms along with their definitions to specialize an ontology for a subcommunity [Visser et al., 1998]. We should move towards building higher-level ontologies based on ontologies from different systems or communities. Such a process is referred to as integrating ontologies in this thesis and aims to resolve the conflicts between definition of terms in the ontologies. Result of such process is a higher-level ontology that is accepted by the sub communities. 3.4.3. Difficulties of Applying Ontologies The definitions of concepts in an ontology will not exactly coincide with the concepts in the conceptualization. One can only try to roughly express the intensional relations. This inexactness is due to several reasons. A concept as it is known in our thought can not be fully expressed by axioms. As an example, the complete definition of “book” is not as easy as being able to understand the concept “book”. One may define “book” by common sense using the following axioms: • A book is made of paper. • It is printed. • It has rectangle shape. • It is bound in one edge. All these axioms are plausible. Yet, a book may not be printed, such as an ebook, but is still considered a book in our conceptualizations. A book can also be written in handwriting, or have a hexagonal shape. Even if one separates all the pages of a book we still consider it as a book. One can also distinguish between the concepts of a “book” and a “copy of a book”. While the above definition is considered the definition of a “copy of a book”, book may consider properties such as: having an author or presenting a collection of coherent thoughts. Many concepts can have even more complication —e.g., time or space. Reaching an agreement for definition of “book” is a difficult task. This difficulty increases as the number of people or communities involved increases. What makes something be truly a book has been investigated by philosophers and psychologists. Complication and ambiguity in intensional relations can also cause difficulties during formalization of the definitions. If not impossible, it is very difficult to

43

Ontologies

be fully independent from background knowledge or common sense. As shown in the above example, when defining a term, one needs to use other terms. This causes every concept definition to be dependent on other concepts. Considering that one cannot define all the terms used in the axioms, we have to rely on our common understanding of some basic terms —e.g, consider defining the is_a relation (specialization relation). That is, relying on common sense is practically inevitable. Weaknesses in formalizations used to represent ontologies is another fact to consider when using ontologies. Representation formalisms also impose their weaknesses over the axiomatization. As we will see in Chapter 4 every formalization has its own potential to formalize intensional definitions. Selection of a formalism depends on the purpose the ontology is used for. 3.4.4. Why Ontologies In spite of all the difficulties in expressing intensional relations, such as “book” or “street”, using the above-mentioned axioms, they can still be considered part of an ontology definition of the intensional relations “book” and “street” within a community —also called intensional definitions. Intensional definitions are definitions of terms by logical axioms shown by ι (iota). These logical axioms estimate every intensional relation —i.e., intensional definitions in an ontology are approximations of intensional relations. Every term in a vocabulary is a symbol used to uniquely show an intensional relation. For instance, “Main Street” and “Road” are terms for two intensional relations and their possible intensional definitions are: ι[MainStreet(x)] = TransportationNetwork(x) ∧ (carPerDay(x) > 2000) ∧ (∃y: City(y) ∧ inside(x,y)). ι[Road(x)] = Transportation_Path(x) ∧ (∃y: automobile(y) ∧ transports(x,y)) ∧ (∀z: City(z) ∧ outside(x,z)).

An advantage of applying ontologies is that they help applications to be independent of the implicit background knowledge of the community or at least reduce the dependency on such knowledge. One has to explicitly say what his or her intension is by referring to a term. This reduces the chance of semantic heterogeneity in communications and amongst the communities —i.e., their respective information systems. Afterwards, the main concern would be finding out the discrepancies in the extension of the concepts and avoiding misinterpre-

44

Ontologies

tation. (If one limits a community to the users of a specific application-domain then the specifications in the conceptual schema can be considered as part of the ontology.) Intensional definitions can be used in application domains such as artificial intelligence and database integration. Ontologies help where two communities are willing to communicate based on a common understanding —i.e., higherlevel ontologies. Applying ontologies to resolve semantic heterogeneity does not result in deciding on a unique robust definition of terms and obliging communities to have that exact interpretation. However, a minimum agreement between communities in the form of a higher-level ontology is necessary to find relations between ontologies (more discussion in Section 5.1). Using ontologies gives the communities the freedom to define their conceptualization —rather than forcing an ontology on a community. By using ontologies communities are supposed to communicate through their defined ontologies and the imposed complication of building the ontologies is the cost for resolving ambiguity and semantic heterogeneity. 3.4.5. Conceptual schemata vs. Ontologies In the domain of geoinformatics the two topics of conceptual schema and ontology has been investigated closely. The distinction between the two helps in solving the semantic heterogeneity problem. Ontologies help to classify the objects under a relevant class in the taxonomy tree of a community and by that guarantee a consistent interpretation. In contrast, conceptual schemata help to represent the data considering relevant methods for processing the data by capturing different aspects of the application requirements. We emphasize on the difference between the two, in the GI community, since the solution presented in this work relies on the distinction between the two. In an ontology, one states those properties that makes an individual to be of some type. Any other property may or may not be stated. For instance, the fact that a road is called by a name does not make an individual of type road. On the other hand, an individual being of type road does not conclude that the individual is called by a name. However, such property (i.e., having a name) may be stated in a conceptual schema due to the application requirements. In another example, consider the fact that roads transport automobiles. This is an important property of a road to be stated in an ontology, because if an individual is of type road, it transports automobiles. This property of roads may not be stated in a conceptual schema since it is not required by the application.

45

Approaches towards Ontologies

While designing a conceptual schema, the answer to the question “what is a road?” concerns the computer representation issues. Possible answers are “road is a linear feature”, “road has a speed limit”, or “it has a transit code”. The answers at this level are extracted from the specifications of the application requirement. While building an ontology the answer to the same question may be, “road is where the automobiles pass”, “it has surfacing” or “it has a speed limit”. Yet, a question raises here: if all members of a community agree that “roads are represented by line features”, can this fact appear in their ontology? However, the representation of objects may appear in an ontology and may also be considered as a criterion to classify objects (as in case of road and building in above example). That means, for example, a community agrees that a characteristic feature of a road is its linearity (in other words, road is a line). In such case relating such definition of road with other communities is a major difficulty. If a road is presented by a line in one application-domain and by area in another, corresponding the two objects is favorable. [Sowa, 2000] discusses this issus as follows: “Different applications may classify the same objects in very different ways, and an ontology that is ideally optimized for one application may make knowledge sharing and reuse difficult or impossible.” [Sowa, 2000, Page 53] Keeping representational facts merely in conceptual schema can help to relate the two roads defined in two application-domains (i.e., multiple representation [Buttenfield, 1993]) by referring to their ontologies. Therefore, to distinguish between the representation (concern of conceptual schemata) and communication (concern of ontologies) is important. The approach to resolve semantic heterogeneity presented in the thesis focused on this fact in a way that the representation does not effect the data communication between systems.

3.5. Approaches towards Ontologies It is important to distinguish between the subject of this thesis and ontology in philosophy or similar topics in psychology (cognitive science). This thesis is concerned with finding relations among intensional relations, that is, to express intensional relations in axioms in term of other intensional relations. The axioms are built by using minimum sets of simple, well-known operators such as logical “and”, “greater than” and mathematical “addition”. Our definitions in

46

Ontologies

an ontology are created by expressing every intensional relation ρi by its relation to other intensional relations. As we showed in Section 3.4.1, states of the world are subject to intensional relations. The study of the states of the world and their existence and nature is the subject of philosophy. Investigations into the nature of intensional relations or how different people conceptualize states of the world in the same or different ways and their validity, is also the subject of philosophy and psychology. Questions such as: “Do things have special characteristics that make us categorize them in classes or is it a property of mind that creates these classes?”, or, “Do mountains have something intrinsically more than being a mass of matter that makes them a class or is it only our mind that creates such class?”, are not concern of this thesis. The nature of the states of the world and their intensional relations are not the concern of this thesis. The idea of ontologies discussed in this chapter is close to the componential theory of cognitive science (psychology) [Lakoff, 1987, Sowa, 2000], in which the understanding of semantics is made based on defining properties —also called characteristic features. According to this theory, each concept has a specific set of defining properties. With these, one can classify members of the domain under different concepts, by verifying the truth value for a set of defining properties (known also as component analysis). The important question that this approach faces is “Are we able to determine all the defining properties?”. An alternative to this theory is prototype theory [Lakoff, 1987] in which semantics are perceived by a typical representative of the type. According to this theory some members are better qualified as a concept than others. Uncertainty in matching to a prototype representative may be considered a positive aspect of this theory. The problem, that makes this thesis to lean towards componential theory, is the difficulties of introducing the representative prototype to an automated system. The only way is again by using a set of defining properties. This lead us to the same question also mentioned in the last paragraph. In essence then, the ontologies are built based on the defining properties of componential theory.

3.6. Conclusion This chapter introduces the interpretation of terms used in logical schema definitions as a source of semantic heterogeneity amongst databases. It defines the problem of interpretation of the identifier symbols in the schema definitions as the focus of this work. The semantics of the identifier symbols depends on the 47

Conclusion

interpretation of the people who formulate or perceive them. By assigning relevant terms people can try to show their intention. The relevance of a term to what people intend to represent is subjective. Semantics of constant-symbols is not considered here as it is predefined by the language and varies from one language to another. After, we suggest ontologies as a basis for the solution. This chapter provides a definition of ontologies and shows why they are considered as a solution for the problem. Knowledge represented in ontologies is based on the conceptualization of communities. The implicit assumptions in a conceptualization are explicitly defined in an ontology in a way that reasoning systems can use them. As presented in the rest of this thesis, reasoning systems can help us to reduce ambiguity in communication with other communities by means of committing to a common higher-level ontology (see Section 3.4.2, page 42). We also discuss issues related to conceptual schemata and highlight the difference between the two. That is, while conceptual schemata are based on application requirement specifications and are independent from implementation, ontologies are built based on conceptualization with the minimum of application requirements being considered. Finally, one should notice that although the two mappings in Figure 3.6 and Figure 3.7 look similar, the four conditions of homeomorphy are not necessary for intensional relations. Ontologies complement the knowledge9 already extracted and represented in conceptual schemata that is mainly extracted from the application domain. An ontology conveys more knowledge than schema definitions in databases. Schemata are more concerned with the structural aspects of data representation and researchers have been working on the syntactic aspects of schema integration. While schemata are used for organizing data in databases, ontologies are concerned with the understanding of the members of communities. It is important to note that schema definitions are closely related to the definitions in an ontology. Schema definitions contain knowledge about the ontology of a community. These definitions may also help to estimate and extract the ontology from the schemata. However, schema definitions should commit to a domain ontology. 9. In this thesis knowledge refers to true statements about a category of individuals or their relations in contrast to data as facts or measures about individuals. There is another interpretation of knowledge in which it is considered as the hard coded part of a computer system, in contrast to data as dynamic part of a computer system which may change during run time (see [Greiner et al., 2001]). Such interpretation of knowledge can cause confusion in the context of this thesis. 48

Ontologies

This thesis advocates an approach of separating representation issues from semantics. The former deals with representation specifications, which are basically extracted and forced from the application domain. These specifications aim at improving efficiency, flexibility, etc. Data represented in one way by a system can be converted and represented in another way in another system, if its semantics or interpretation remains the same. Therefore, Chapter 5 offers an approach to enhance such conversion by a semantic matching based on ontologies. Applying ontologies guarantees consistency in communities’ understanding of statements made during a communication. However, they will not necessarily represent the complete conceptualization of every community committed to the ontology. By using ontologies, communities are able to communicate based on their defined ontologies and the complication of building ontologies is the cost of resolving ambiguity and semantics heterogeneity in communication. It is important to note that both ontologies and conceptual schemata can be represented by the same variety of tools (e.g., UML or ER diagrams or different logical languages) though their sources and objectives are different.

49

Conclusion

50

CHAPTER 4

Logics for Formalizing Ontologies

4.1. Introduction Formalizing ontologies requires special attention in this thesis. Building ontologies is performed by extracting implicit knowledge from communities and representing this explicitly. The explicit representation may be made in a formal or informal language. Formalizing ontologies has the advantage of having less ambiguity in comparison to representing them in an informal language. In addition, we can use computer systems to reason with and analyze formalized ontologies. Depending on the purpose for which ontologies are built, one can use different formalization approaches. An ontology may be formalized by a hierarchy of concepts for some purposes, while other purposes may need more expressive mechanisms to formalize the same vocabulary. Formalizing the intensional definitions (See “section 3.4.4”) by means of specialization (or subsumption) relations may be all that is needed for some purposes. For example, some approaches provide typed links for representing relations between concepts while others offer more constraints. More requirements1 imply more expressive formalisms and consequently more complication. Section 4.2 of this chapter introduces important mechanisms of logical formalisms that make them suitable for representing ontologies. Eventually, such fea1. Note that, requirements play a role in the technique for representation or formalization of ontologies. But, requirements play no role in definitions of terms in an ontology. Intensional definitions should be extracted from conceptualizations with the minimum influence of application requirements (See “section 3.4.4”). 51

Formalisms Requirements

tures should be considered and applied according to our application —that is schema integration. Different systems have used logic to formalize and handle semantics and ontologies. Description Logic (DL) [Baader et al., 2003] and Frame-based Logic (FLogic) [Kifer et al., 1995] are the two main logical formalisms within existing implemented reasoning systems. For instance, the SHOE [Heflin and Hendler, 2000] and OBSERVER [Mena et al., 1998] projects use DL; and On2Broker [Fensel et al., 1999] and SiLRI [Decker et al., 1998] use FLogic. (These projects are briefly introduced in Chapter 2.) Ontolingua also offers a Frame-based language to formalize ontologies. Since, Ontolingua does not offer a reasoning system for its language we did not use it here for our evaluation. Although, we did use ontologies defined in the Ontolingua library of ontologies. The two logical formalisms (DL and Flogic) are briefly introduced in this chapter and their capabilities for formalizing ontologies are compared. We present a narrow and restricted view here and do not apply the conventional criteria of expressiveness and computational tractability to the formalisms. The comparison is done only in the frame work of this thesis and our purpose of defining and processing ontologies. Section 4.3 briefly discusses important features of DL for formalizing ontologies. Three reasoning systems NeoClassic, Loom and PowerLoom were used to evaluate DL during the evaluation. While the main focus is on PowerLoom, other implemented systems have been discussed wherever relevant. Section 4.4 evaluates FLogic by formalizing the same examples as in Section 4.3. We used the available reasoning system FLORID2, based on Frame-based Logic. The intensional definitions (definition of concepts and relations) that are used as examples for formalization are extracted from the GDF [GDF 1995] and ATKIS (in German) [ATKIS 1998] standards. Conclusions are presented in Section 4.5, where we compare the two logical formalisms and show why we use DL and PowerLoom in the following chapters of this thesis.

4.2. Formalisms Requirements A formalism to support the representation of ontologies should provide a set of suitable mechanisms. The introduced mechanisms are not a complete set. They are selected from the set of available mechanisms provided by the existing reasoning systems. We are pointing out here those mechanisms that are required 2. These sections do not intend to introduce these formalisms. However, one can follow the discussion with a basic knowledge of Frame Logic and Description Logic. 52

Logics for Formalizing Ontologies

for intensional definitions in our ontologies. There might be further demands that can be useful in building and reasoning ontologies (such as supporting modal logic), but here we introduce those mechanisms that are available by one or more existing reasoning systems. Here are general features required for representing ontologies with formalisms: a. Concept definition: Concept definition is a means to define intensional relations of arity one (unary predicate). Concepts can be compared to classes of objects in the object-oriented paradigm. A concept is defined by a set of constraints that an instance should satisfy to belong to that concept. Similar to class definitions, in concept definitions we need subconcept (is-a-type-of) relations to define hyponym (and hypernym) and specialization. By these, one can establish a taxonomy hierarchy of concepts. Concepts in an ontology may be defined partly by aggregation of relations to other concepts. It is important to note that the comparison here to the object-oriented paradigm is only for better understanding, otherwise concepts are different from object classes. Concept definitions in an ontology are concerned with properties that distinguish its instances from other concepts. While a class definition is concerned with those attributes needed for our application and representation of data. b. Relation definition: Relation definition is a means to define intensional relations of arity two or higher. Note that relations are not merely defined by typed attributes carrying referential keys. Relations in ontologies may be defined independent of concept definitions. This gives an identity to relations independent of concept definitions (in contrast to roles which are defined for a particular domain). This facilitates support for the specialization of relations. For instance, in the definition of the start-at relation, we can state that the relation is established only with a Point_Element whose representation-dimension is zero. This can help to have a hierarchical taxonomy of relations like a taxonomy of concepts. That is, start-at will be classified as a subrelation of is-bounded. Furthermore, this helps to deduce new relations (between concepts or instances of concepts) which are not explicitly stated. That is, an instance of is-bounded can be classified under starts-at if it satisfies the constraints in the definition of startsat. Another mechanism offered by some systems is called a Role. A Role is a mechanism through which we define or redefine a relation in a concept definition. The advantage of using roles is that one can easily specialize a general relation according to a particular concept in the domain of the relation.

53

Formalisms Requirements

For example, while a relation spouse maps persons to persons, the role spouse defined for man can have a constraint specifying that the spouse of a (heterosexual) man should be woman; and the same relation may be defined for a woman with the opposite constraint. Note that practically, one can specialize relations according to their domain by a more complicated relation definition without using the role mechanism. That is, one can express such constraints in the definition of the spouse relation instead of specializing it in definition of the concept man. c. Instance definition: Instances represent members of the domain or their relations. They are defined with respect to concepts or relations by is-aninstance-of relations. Every instance of a concept has a collection of facts. Facts related to every instance of a concept are represented by means of their relations (or roles). For example, an instance of the concept person can be defined by its roles such as its social security number, its name and/or its brother. An instance of a concept can play (or fill in) a role for another instance. As an example, a person can play (or fill in) the role of parent for another person or owner for a car. The two persons in the last example take part in an instance of a parenthood relation. That means, relations have instances as well as concepts. This instance of a relation parenthood can be classified as a motherhood or fatherhood. d. Constraints (or restrictions) definition: Constraints are conditional phrases mainly used for classifying concepts, relations or individuals. Reasoning systems are able to classify instances under concept or relation definitions by using constraints. In some formalisms, they may be represented as separate rules but for the same purpose. As an example, a motherhood relation is a parenthood with a constraint on its range that it should be woman; or woman is person with the constraint on its gender to be female. As far as a formalism can provide the above-mentioned features it can be used to formalize intensional definitions. Formalisms such as ER diagrams, UML diagrams and RDF schema are alternative formalisms for representing ontologies. However, a reasoning system capable of processing such formalisms is also required for our purpose here. At the time this chapter is written, no reasoning system capable of processing such formalisms exists. Therefore, this chapter focuses on Description Logic and Frame-based Logic.

54

Logics for Formalizing Ontologies

4.3. Description Logic Description Logic (DL) [Baader et al., 2003] is a potential means of formalizing intensional definitions in ontologies. DL is a successor of KL-ONE [Woods and Schmolz, 1992] and has been developed for knowledge representation. Projects such as OBSERVER [Mena et al., 1998] and SHOE [Heflin and Hendler, 2000] used DL for representation and reasoning with ontologies. There are few reasoning and knowledge representation systems that work with the DL formalism. Here, we selected the PowerLoom [MacGregor et al., 1997] implementation of DL for our discussion on the suitability of DL to formalize ontologies. We initially used NeoClassic [Patel-Schneider et al., 1996] and Loom [Brill, 1993] for our evaluation. Later, when PowerLoom (the successor of Loom) was released we evaluated it too. Therefore, we discuss a few issues related to Neoclassic and Loom where they are relevant or important. The two main formalizing elements in DL are concept definitions and relation definitions. Definitions (of concepts or relations) in DL are of two types: primitive and non-primitive (Non-primitive definitions are also known as defined concepts or relations). With primitive definitions, one expresses necessary constraints to be satisfied by individuals3 in their extension. Non-primitive definitions are described by necessary and sufficient conditions. In other words, nonprimitive definitions are used when one is able to give a thorough clear definition of a concept or relation. Primitive and non-primitive definitions are important features of a formalism for building ontologies. That is because one expresses the defining properties (also called the characteristic features, See “section 3.5”) of a concept or relation in one of the two forms. In a concept or relation definition constraints appearing after : are necessary and sufficient conditions for a non-primitive concept, while constraints appearing after :=> are necessary conditions. 4.3.1. Concept Definition Concepts are defined by subconcept relations to their super-concepts and constraints on their relations to other concepts. A DL reasoning system can implicitly recognize (or classify) individuals of non-primitive concepts. On the other hand, a DL system cannot recognize an individual of a primitive concept unless it is declared explicitly due to the fact that its definition is partial. An example 3. An individual is a member of the domain in a conceptualization (See “section 3.4.1” for definition). The term instance is used slightly differently in the sense that it refers to a set of facts (or data) which represents an individual in a computer system. 55

Description Logic

of a primitive intensional definition taken from GDF [GDF 1995] is as following: (defconcept Road_Element (?r Spatial_Feature) :=> (and (exists (?j) (and (Junction ?j) (= (starts-at ?r) ?j))) (= (representation-dimension ?r) one)))

(Ints. Def. 1)

Here Road_Element is defined as a primitive concept which is a subconcept of Spatial_Feature. Every Road_Element must have at least one Junction with which they have a start-at relation —i.e., every Road Element starts at a Junction. But, it does not conclude that everything with at least one Junction in a start-at relation is a Road_Element. Another example of a primitive definition here is the definition of Strasse extracted from ATKIS [ATKIS 1998] standard (in German): (defconcept Strasse (?s Komplex_Object) :=> (and (exists (?sk Strassenkörper) (bestehend-aus ?s ?sk)) (exists (?fb Fahrbahn) (bestehend-aus ?s ?fb)))) (Ints. Def. 2)

Strasse is a primitive subconcept of Komplex_object. Every Strasse must have at least one Fahrbahn and one Strassenkörper but not vice versa.

The following are examples of non-primitive concepts. They formalize the definition of Point_Feature and Line_Feature from GDF: (defconcept Point_Feature (?p Simple_Feature) :(and(representation-dimension ?p zero)))

(Ints. Def. 3)

(defconcept Line_Feature (?l Simple_Feature) :(and(representation-dimension ?l one)))

(Ints. Def. 4)

Ints. Def. 3 states that every Simple_Feature with representation_dimension of zero is a Point_Feature and vice versa. That means the conditions here are necessary and sufficient. One can also enhance the definition in (Ints. Def. 2) to a non-primitive concept: (defconcept Strasse (?s Komplex_Object) :(exists (?a automobil) (laufen ?s ?a)) :=> (and (exists (?sk Strassenkörper) (bestehend-aus ?s ?sk)) (exists (?fb Fahrbahn) (bestehend-aus ?s ?fb)))) (Ints. Def. 5)

56

Logics for Formalizing Ontologies

Another example of a defined concept from ATKIS is as follows: (defconcept Verkehrsweg_Klass6 (?x Verkehrsweg) :(< (breite ?x) (quantity* 6 Meter)))

(Ints. Def. 6)

The concept Verkehrsweg_Klass6 is a subconcept of Verkehrsweg and its breite relation is filled by a quantity less than 6 Meter. Therefore, the DL reasoning system will classify every instance of Verkehrsweg whose breite role is filled by a quantity less than 6 meter under the Verkehrsweg_Klass6 concept. On the other hand if an individual is introduced as Verkehrsweg_Klass6 then the system keeps the constraint of being less than 6 meter on the breite relation. 4.3.2. Relation Definition Relations can also be defined as being primitive or non-primitive. Here are examples of primitive relation definitions from GDF: (deffunction representation-dimension ((?se Spatial_Feature)) :-> (?sd Spatial_Dimension)) (Ints. Def. 7)

representation_dimension maps Spatial_Feature to Spatial_Dimension. deffunction specifies that representation_dimension is a single valued relation. In other words, every Spatial_Feature can be mapped to only one Spatial_Dimension. An example of a multivalued relation is as follow-

ing: (defrelation is-bound-by ((?x Spatial_Feature)(?y Spatial_Feature)) :=> (and (representation-dimension ?x one) (representation-dimension ?y zero)) (Ints. Def. 8) is-bound-by relates two Spatial_Features. The Spatial_Feature in the domain of is-bound-by has a representation_dimension of one and the Spatial_Feature in the range of is-bound-by has a representation_dimension of zero. The following example shows the definition of starts_at relation: (deffunction starts_at ((?x Spatial_Feature)) :-> (?y Spatial_Feature) :=> (and (representation-dimension ?x one) (representation-dimension ?y zero) (is-bound-by ?x ?y)))4 (Ints. Def. 9)

57

Description Logic

starts_at has a primitive definition and is a subrelation of is-bound-by. The following is the non-primitive definition of the bounds relation: (defrelation bounds ((?x Spatial_Feature)(?y Spatial_Feature)) :(and (is-bound-by ?y ?x) (representation-dimension ?x zero) (representation-dimension ?y one))) (Ints. Def. 10) bounds is defined as the inverse relation of is-bound-by.

4.3.3. Discussion Originally, DL systems consisted of two modules: the Terminological Box (TBOX) and the Assertion5 Box (ABOX). The TBOX refers to the part that reasons with knowledge presented in form of concepts and relations definitions. The ABOX is where the individuals and facts describing them are defined and classified. All such commands for introducing facts about individuals start with assert. PowerLoom, unlike its predecessor Loom and other conventional DL systems, does not have a TBOX and an ABOX; therefore, it does not distinguish between concept definitions and assertions.6 DL reasoning systems possess two main capabilities. The first is the capability of inferring subsumption relations between concepts and relations in the TBOX. That is, the reasoning system can determine where a concept can be located in a taxonomy hierarchy (a hierarchy built by means of subconcept relation). For instance, consider two definitions of Street and Strasse in the following: (defconcept Strasse (?s) :documentation “Anything that transports an automobile is a Strasse.” :(exists (?a automobile) (transports ?s ?a))) (Ints. Def. 11) (defconcept Street (?s) :documentation “Anything transports an automobile :(and (exists (exists ))

inside a residential area which is a Street.” (?a automobile) (transports ?s ?a)) (?r residential_area)(inside ?s ?r)) (Ints. Def. 12)

4. One can use different ways to state that a relation is a subrelation of another: (assert (subset-of start-at is-bound-by)) 5. Note that assertion here is different from the notion of assertion in SQL. 6. From this point of view PowerLoom is similar to FLORID which is a formalism for Frame-based Logic and is introduced in Section 4.4. 58

Logics for Formalizing Ontologies

The DL reasoning system (in this case PowerLoom) is able to recognize that Street is a subconcept of Strasse without it being explicitly declared. The following query results in true, which shows Street is a subconcept of Strasse: (ask (subset-of street strasse))

Second, a DL reasoning system is capable of implicitly recognizing individuals defined in an ABOX in the category of non-primitive concepts. As an example, consider the following fact assertion in PowerLoom: (assert (= (starts-at RD12) J20)

The DL reasoning system concludes: • RD12 and J20 are instances of Spatial_Feature (based on Ints. Def. 9); • RD12 has a representation-dimension of one (based on Ints. Def. 9) and consequently, it is an instance of Line_Feature (based on Ints. Def. 4); • also, J20 is an instance of Point_Feature (based on Ints. Def. 9 and Ints. Def. 3); • RD12 is-bound-by J20 (based on Ints. Def. 8); and • J20 bounds RD12 (based on Ints. Def. 10). Consistency checking is a crucial issue in checking the validity of intensional definitions (in TBOX). In addition, we need to evaluate if the assertions comply with the ontological definitions. The following are assertions that cause inconsistency (also termed incoherence) according to the intensional definitions of Point_Feature and starts-at. (assert (Point_Feature U)) (assert (= (starts-at U) J20))

Theoretically, DL systems have the ability to find inconsistencies, but in practice it depends on the system implementation. As an example, Loom [Brill, 1993], with an expressive language, offers functions to detect inconsistency in concept (TBOX) and individual (ABOX) definitions —although, it does not detect all examples of inconsistencies. PowerLoom, also with an expressive language, showed a better ability to detect inconsistencies, however it only reacts by issuing a warning message and does not support any special functions for such purposes, unlike its ancestor Loom. Since PowerLoom does not distinguish between terminological and assertion definitions, it can only detect an inconsistent state. In the case of an inconsistency, determining whether the terminological definition or the assertion defini59

Frame-based Logic

tion is not correct, would be a difficult task. It is important to differentiate between the notions of correctness and consistency here. Consistency refers to internal agreement with the logical definitions, while correctness refers to agreement of definitions and assertions with intension. Consistency checking and detection is an important feature of reasoning systems that are to be used for defining ontology. Inconsistency resolution is not required here, since an inconsistent case can be resolved in different ways (mapped to different consistent cases), although not all of them are necessarily correct solutions. NeoClassic [Patel-Schneider et al., 1996] is another implementation of a DL reasoning system. It can detect inconsistency in both concept and individual definitions. The NeoClassic implementation of DL is not as expressive as PowerLoom. As an example, the relation (or role) definition in NeoClassic is weak in comparison to PowerLoom. The role definition in NeoClassic is attributed to concepts and one cannot define a relation with its constraint independent of a concept definition. As a consequence, it does not offer the possibility of defining specialization hierarchies of relations. Role definition in NeoClassic offers only the type checking constraint. Unlike PowerLoom, NeoClassic offers stronger features for detecting inconsistencies and determining the definitions causing such inconsistencies. However, expressiveness is a matter of higher priority in our work, rather than determining inconsistent definitions.

4.4. Frame-based Logic The main purpose of the Frame-based Logic (also called FLogic) is to formalize various aspects of object-oriented paradigm [Kifer et al., 1995]. However, it has been used for knowledge representation as well. Its main focus is classes (or frames) and methods (and attributes, as non-parametric methods). Classes and methods in FLogic can be used respectively to represent concepts and relations. In this section, definition of the concepts formalized in Section 4.3 are formalized in FLogic using the FLORID system [May, 2000] to allow comparison between the two approaches. 4.4.1. Concept Definition FLogic does not explicitly distinguish between primitive and non-primitive definitions. However, both can be defined by means of rule definitions. Concepts can be defined by their super-concepts — declared by ::. The definition of the primitive concept road_element (in Ints. Def. 1) is as follows:

60

Logics for Formalizing Ontologies

road_element:: spatial_feature. road_element[starts_at=>junction]. road_element[representation_dimension*->one].

The

(Ints. Def. 13)

above

statements define road_element as a subconcept of Spatial_Feature. Expressions inside brackets (“[]”) declare relations or methods in FLogic. For example, the second line in Ints. Def. 13 defines the relation starts_at with a type constraint. => declares starts_at as a function method, in contrast to =>> which defines a type constraint for a multivalued method. Similarly, -> and ->> set values for methods. Unlike in PowerLoom, we cannot express the fact that a road_element must have at least one start_at relation with a junction. That is due to lack of existential qualifiers in FLogic. The last line in the above definition specifies that every instance of a road_element should have a representation_dimension of one. We define point_feature (Ints. Def. 3) as follows: point_feature::

simple_feature [representation_dimension*->zero].

If an individual is defined as an instance of concept point_feature the system concludes that it is a spatial_feature and it has a role representation_dimension filled by value zero. Yet, it is a primitive definition. To change the concept point_feature to a non-primitive concept one must change the definition by adding the following rule: point_feature::

simple_feature [representation_dimension*->zero]. X:point_feature :X:simple_feature[representation_dimension->zero]. (Ints. Def. 14)

In the above definition :- declares a rule in which the consequent appears before the sign and condition after the sign. According to this definition, if an object of type simple_feature has a representation_dimension of zero, the system concludes that the object is an instance of concept point_feature. Correspondingly, the definition of Line_Feature (in Ints. Def. 4) is formalized as follows: line_feature::

simple_feature [representation_dimension*->one].

X:line_feature :X:simple_feature[representation_dimension->one]. (Ints. Def. 15)

61

Frame-based Logic

The only way to define non-primitive concepts in FLogic is by applying such rules. 4.4.2. Relation Definition One can define the relations in Ints. Def. 7 by method definitions in FLogic as follows: spatial_feature[representation_dimension=>spatial_dimension]. (Ints. Def. 16)

The only constraint expressed above is the type checking of the range of a relation. The following shows a translation of Ints. Def. 8 to FLogic, in which rules define more constraints than just type checking. spatial_feature [is_bound_by=>>spatial_feature]. X:simple_feature,Y:simple_feature:-X[is_bound_by->>Y]. X[representation_dimension->one]:-X[is_bound_by->>Y]. Y[representation_dimension->zero]:-X[is_bound_by->>Y]. (Ints. Def. 17)

Consequently, the start_at relation (defined in Ints. Def. 9) as a subrelation of is_bound_by is formalized as following: spatial_feature [starts_at=>spatial_feature]. X:simple_feature,Y:simple_feature:-X[starts_at->Y]. X[representation_dimension->one]:-X[starts_at->Y]. Y[representation_dimension->zero]:-X[starts_at->Y]. X[is_bound_by->>Y]:-X[starts_at->Y].

(Ints. Def. 18)

The following presents the non-primitive definition of the relation bounds: spatial_feature[bounds=>spatial_feature]. X:simple_feature,Y:simple_feature:-X[bounds->>Y]. X[representation_dimension->zero]:-X[bounds->>Y]. Y[representation_dimension->one]:-X[bounds->>Y]. Y[is_bound_by->>X]:-X[bounds->>Y]. Y[bounds->>X]:-X[is_bound_by->>Y].

(Ints. Def. 19)

Similar to definition of bounds in Ints. Def. 10, the inverse of this relation is is_bound_by. 4.4.3. Discussion The FLogic reasoning system classifies individuals under classes, as in DL. This is the main task expected and performed by FLORID. rd12[starts-at->j20].

62

Logics for Formalizing Ontologies

The FLORID reasoning system concludes: • rd12 and j20 are instances of spatial_feature (based on Ints. Def. 18); • rd12 has a representation_dimension of one (based on Ints. Def. 18) and consequently, it is an instance of line_feature (based on Ints. Def. 15); • j20 is an instance of Point_Feature (based on Ints. Def. 18 and Ints. Def. 14); • rd12 is_bound_by j20 (based on Ints. Def. 17); and • j20 bounds rd12 (based on Ints. Def. 19). However, FLORID lacks the ability to deduce implicit knowledge about subsumption of classes and relations. Definition of relations as methods and their dependency on class definitions makes subsumption of relations (or methods) impractical. Subsumption of concepts is not recognized by FLORID. FLogic is not capable of detecting inconsistent definitions as one may expect. In fact, due to the lack of negation in the consequent part of rule definitions, the FLogic reasoning systems does not encounter inconsistency. The following definitions: u[starts_at->j20]. u:point_feature.

do not cause inconsistency in FLORID —unlike Powerloom. FLORID concludes that one and zero are equal —consequently point_feature and line_feature, as well. Due to the lack of negation in the consequent part of the rules in FLORID, there is no way to state that one is not equal to zero. In other words, there is no way to express the fact that point_feature and line_feature are disjoint concepts. However one can define relevant predicates to find inconsistency in definitions. As an example, one can define a predicate disjoint for FLORID and define point_feature and line_feature as disjoint concepts: disjoint(point_feature, line_feature).

Then by adding the following rule: instance_of_disjoint_concepts(O,C1,C2) :disjoint(C1,C2), O:C1, O:C2.

FLORID will be able to find an instance that is classified under two disjoint concepts and report it. Another important rule to detect inconsistencies in type definitions is the following: 63

Comparison

type_incoherent(O) :- O[M->>R], O:C, C[M=>>T], not R:T.

This rule finds and reports an object (O) whose method (M) results in a result (R) where the R does not comply with the type definitions of object O (C). These objects are reported as type-incoherent. In particular, this gives more flexibility for defining ontologies but at the cost of much more complexity. A characteristic of FLogic is that it does not distinguish between individuals and concepts. That is, unlike DL, it has nothing like the TBOX and the ABOX. An instance of a concept can be considered a concept itself and has its own instances. book_title[title=>name]. book_title is a concept with the relation title to the concept name. dictionary::book_title. dictionary is a subconcept of book_title. oxford_advanced_learner:name. oal_dictionary:dictionary[title*->oxford_odvanced_learner]. oal_dictionary is an instance oxford_advanced_learner.

of

dictionary

and

its

title

is

my_dictionary:oal_dictionary. My_dictionary is an instance of aol_dictionary.

Such a definition does not suit the TBOX/ABOX scheme in DL. However, PowerLoom does support such definitions. One should also remark that in spite of such an ability neither of the two reasoning systems fully support higher order logic.

4.5. Comparison This chapter introduces a set of mechanisms needed for formalizing ontologies. Two major formalisms Description Logic and Frame-based Logic are the focus of this chapter. The following comparison is not considering all aspects of knowledge representation into consideration but only those aspects we found critical for formalizing and processing of ontologies for our approach. The result of the comparison between the two is as follows:

64

Logics for Formalizing Ontologies

Description Logic

Frame-based Logic

1. Concept Definition and Subsumption

+ DL explicitly offers the primitive + One should use rule definitions as a and non-primitive definition of work-around to define primitive concepts and relations. and non-primitive concepts in FLogic and classify instances under relevant concepts. + PowerLoom is capable of deducing - FLORID does not deduce any subsubsumption (the specialization sumption between classes (subrelation) between concepts. sumption of relations is impossible due to lack in Flogic formalism) 2. Relation Definition and Subsumption

+ Relation definitions in DL are ± Any reference to a method (relaindependent of class definitions tion) in FLogic should be done and PowerLoom offers the possibased on its class —e.g., a query bility of defining a taxonomy tree such as: is brother a subrelation of of relations. sibling?; is practically impossible to make. + PowerLoom is capable of deducing - Subsumption of relations is impossubsumption (the specialization sible due to lack of taxonomy of relation) of relation definitions. relations in the Flogic formalism 3. Instance Definition and Classification

+ PowerLoom classifies individuals ± FLORID classifies individuals as well as instances of relations in (instance of concepts) in the extenthe extension of concepts and relasion of classes. Classification of tions. relation instances is not possible due to the lack in the formalism. ± DL has a TBOX/ABOX scheme. + FLogic offers a more expressive However, PowerLoom offers the capability of instance definition — same capability, though it does not i.e., defining an instance of an use the TBOX/ABOX scheme. instance. 4. Constraint Definition

65

Comparison

Description Logic

Frame-based Logic

+ DL has expressive means to for- ± FLogic offers type checking conmalize constraints in relation defistraints for methods, any other connitions. straints should be presented by means of rule definitions. + DL supports negation and conse- - Defining disjoint concepts is not quently it can define disjoint conpossible in FLogic due to the lack cepts. of negation in the consequent part of rules. + DL allows the use of existential - One cannot state an existential constraints in its definition constraint in FLogic —e.g., there exist a mother for every person cannot be stated. 5. Building and Managing Ontologies

+ PowerLoom provides a mechanism - FLORID does not provide any useful for preparing and managing mechanism for managing ontoloontologies, by its modular arrangegies. ment. ± PowerLoom offers a limited capa- ± Rules to detect certain types of bility to detect consistency in deficonsistency can be added to the nitions. definitions. According to the above comparison, we select DL for our experiments on formalizing ontologies in this thesis. DL offers as much expressiveness in formalizing relations as for formalizing concepts, while, FLogic mainly focuses on defining concepts (or classes), rather than relations.7 Considering PowerLoom’s reasoning capabilities, its implementation of DL is used as the reasoning system for the implementation of our approach. All the examples in the next two chapters are also tested with PowerLoom.

7. This evaluation had been done at a certain time and the decision had been made, however, the comparison criteria presented here were updated as the work proceeded and new requirements were realized in later stages of the work. 66

CHAPTER 5

Ontology-based Integration

5.1. Introduction Schema integration is an important task as a preparatory step to the integration of databases. A major bottleneck is the problem of semantic conflicts. There have already been proposals for helping schema integration by detecting and resolving semantic heterogeneity [Bergamaschi et al., 1998, Palopoli et al., 1999, Madhavan et al., 2001, Visser et al., 1999, Goh et al., 1999]. Such approaches use thesauri, schema definitions, ontologies or a combination of these. When using ontologies the main focus is the knowledge extracted from the communities. In this chapter, we present the idea of using ontologies formalized in a logical language to integrate data base schemata. Models of the mini-world (see Section 3.3) are based on conceptualizations. These models can be considered a projection of the world structures (see Section 3.4.1). The modeling process performs this projection and is done according to an application’s interest (Figure 5.1). As discussed in Section 3.5, the nature of the intensional relations is not the concern of this thesis. We use axioms in our ontologies to express relations among them. These axioms help us to relate the conceptualizations of communities. For sharing data between models based on different conceptualizations, we need to guarantee the closest possible interpretations, in other words, semantic consistency. This is done by making sure that the portion of the data integrated into the models of different conceptualizations is not different from what the original projection to that model would produce. Such consistent integration requires the type of higher level knowledge that ontologies can provide by the axioms defining intensional relations. 67

Introduction

World Structures

Illustration of possible states of the worlds (W)

C

Rs={
, } Rwider={}

A B

A C .....

B

.....

Intensional Relations

Rs={
, } Rwider={}

.....

.....

Modeling

Figure 5.1. Models are projections of world structures built by some constraints according to the requirements.

An ontology is a base to share parts of the conceptualizations and in turn it helps to have the same interpretation of data. One should note the two following cases in data integration: 1. Database schemata are based on the same common ontology and this ontol-

ogy facilitates their communication (Figure 5.2). In this case our problem is limited to schema integration or finding synonyms and homonyms in the names of schema elements. An example of this approach is taken by On2Broker [Fensel et al., 1999] using the (KA)2 ontology. Note that this solution is applicable if the two schemata have already been designed based on the same ontologies. Otherwise, building such a common ontology requires resolving conflicts which can be a difficult task. In fact, this solution denies the flexibility of the schemata to commit to different ontologies. 2. Database schemata are based on different ontologies. In this scenario two

communities using different ontologies aim to interchange or integrate data (Figure 5.3). In this case the problem is finding the similarities or the differ-

68

Ontology-based Integration

Schema p1

Schema p2

Underlying Common Ontology

Figure 5.2. One common ontology shared between two databases.

ences between concepts defined in the two ontologies. Finding similarities in turn requires a minimum of agreement which we expect from any higherlevel ontology. This solution gives more flexibility to communities to define their ontologies. This approach can be compared to that of SHOE [Heflin and Hendler, 2000] where every Internet page can introduce the ontology it is based on. In our case, we used geographic standards as basis to build our higher-level ontology of Transportation. Top-level ontologies would not be directly helpful for our approach, since they are addressing mainly general issues related to space such as position, shape and/or spatial relations. This chapter introduces a solution based on the second scenario by mergedontology and shows how it can help in solving heterogeneity problems. The next section presents an overall perspective of the solution and illustrates two main threads of research applying ontologies. Section 5.3 introduces a set of relations called semantic similarities which are used for merging ontologies. Section 5.4 explains how a merged-ontology is produced by taking the ontologies underlying schema definitions and finding the similarities between them. Schema integration using the merged-ontology is described in Section 5.5. Sec-

Schema p1

Underlying Ontology

Schema q1

Relating Ontologies

Underlying Ontology

Higher-level Ontology

Figure 5.3. Database schemata based on different ontologies.

69

Overview of the Architecture and the Solution

tion 5.6 shows how data mapping between the systems is done. The last section draws conclusions.

5.2. Overview of the Architecture and the Solution The approach presented here is appropriate for generation of an integrated global schema, which then defines the structure of data in a federated database system. A valid integration relies on a sound understanding of the meaning of the schema elements. That is, the integration method must find out if schema elements from different schemata refer to the same set of entities, or whether they are different and if they are different, to what degree do they differ. To this end, the solution proposed here relies on ontologies available for local schemata. Schema integration is then based on the merging of these ontologies. The approach is discussed in this chapter. In this approach, terms (identifier symbols defined in Section 3.2) in the schema definitions (mainly class and attribute names) must be based on (or commit to) the terms defined in the ontology of a community. This is done by linking class and attribute names in database schemata to the terms defined in the ontology. One can establish this link by using the same terms in schema definitions as they are defined in the ontology or storing the links between the schema elements and terms in ontologies separately. As illustrated in Figure 5.4, terms used in schemata p1 and p2 are based on (or commit to) definitions in the ontology p. Similarity relations (definitions in Section 5.3) are defined in order to find out whether and how elements from different schemata are semantically related. Detection of similarity relations is based on intensional definitions of terms in ontologies. A reasoning system is used to merge ontologies based on the detected similarity relations (see Section 5.4). The result of merging is used by a schema integrator to build an integrated global schema from local schemata. Human supervisors and a semiautomatic method cooperate in deriving the integrated schema. Finally, the possible meaningful mappings in the generated global schema are found and by that the mapping of data between the (global and local) databases is established. For instance, the schema integrator suggests a class in the global schema as well as the mapping of this class and its attributes to one or more classes in the underlying local schemata. Thus, the approach not only suggests a global schema, but also tries to find all the possible meaningful mappings between the generated global schema and the component schemata. 70

Ontology-based Integration

Ontology based Integration tool

Community P Ontology p

DBp2 DBp1

Community Q Reasoning System for Merging Ontologies

Similarity Relations

Schema p2

Schema p1

Ontology q

DBq1 Ontology-based Schema Integration

Schema q1

Global Integrated Schema

Figure 5.4. Global schema generation based on a common ontology produced by integration of domain ontologies.

This approach is suitable when the schemata are not subject to frequent changes (shown in Figure 5.4). As the number of underlying databases and the communities increase, the number of derived mappings increases, even though many of them may not be used by the applications. A statistical analysis or human supervision to maintain only the valid mappings and schema items can help to overcome this disadvantage. In contrast to the approach suggested above, there is another popular approach. This approach uses ontologies for translating queries, or their results (as in SHOE, On2Broker or OBSERVER). In this approach, a reasoning system finds the similarities between concepts in two ontologies and the mediator maps the corresponding items in two schemata. A query must be based on (or commit to) the ontology of its community and may introduce its ontology to the system. The mediator uses a reasoning system to find the mapping between schemata. This approach (shown in Figure 5.5) is suitable whenever schemata are subject to frequent changes (such as DTDs in XML data), when many data sources are involved, or the number of involved data sources changes frequently (such as data sources on the Internet). In these examples, integration of schemata is not beneficial due to the frequent changes. Therefore, queries are translated for the component data sources and the response to the query, in turn, is translated on71

Semantic Similarities

Community Q

Community P

DBp1

Ontology q

Reasoning System Mediator

Schema q1

Schema p1

DBp2

Schema p2

Ontology p

DBq1

Figure 5.5. On the fly integration, with local queries committing to a domain ontology and no global schema or global query.

the-fly. One drawback of this approach is the high processing cost, since for every query ontologies must be processed to derive required mappings — unless the number of ontologies is small and a set of similarity relations can be extracted and stored. On the other hand, due to the nature of an interactive approach (or on-the-fly process), human supervision to validate extracted similarity relations is impossible. Consequently, the lack of human supervision makes this approach less reliable.

5.3. Semantic Similarities As mentioned in Section 5.2, an important task during geodata integration is relating schema elements from different local databases. To that end, the essential task is to match the intensional definitions from different ontologies and create a merged ontology. Semantic similarities are introduced and used in this section as the means to merge ontologies. Two intensional definitions with partially identical characteristics are said to be similar. Four levels of similarities between two coherent intensional definitions1 are identified here. By semantic similarity, we refer to the particular relations equality, specialization, overlapping and disjoint between intensional definitions in two different ontologies. [Elmasri and Navathe, 2000]2 introduces the relations between classes in a 1. Coherence of the intensional definitions is a major topic but not related to this work. The coherence of an intensional definition implies that it is possible to have an individual in the extension of a concept or an instance of a relation. On the contrary an incoherent intensional definition does not have any instance in its extension in any possible world. 2. Sections 4.2 and 4.3, pages 76-86. 72

Ontology-based Integration

database schema. The relations defined here are different from similarity measures presented in different research work such as [Sato and Fujimoto, 2000, RodrÌguez and Egenhofer, 2003, Lord et al. 2003]. Similarity measures are essentially applied to quantify similarity. One can also compare similarity relations introduced here to the spatial relations introduced in [Egenhofer and Herring, 1991]. Detection of the similarity relations is based on the axioms specifying the intensional definitions (defined in Section 3.4.4) of concepts or relations (represented by ι in the following elaborations). The implications of these similarity relations on the extensions (shown by ε) of intensional relations are also determined in the following. We use the term concept to refer to intensional relations with arity one and relation to refer to intensional relations of arity greater than one. 1. Disjoint definitions: This level has the lowest degree of similarity. Two con-

cepts or relations are disjoint if conjunction of their intensional definitions implies false —necessarily false or for all possible worlds. It implies that the extensions of the two concepts or two relations are disjoint —e.g., narrow street and highway, truck and employee, or sisterhood and fatherhood. ( ( ι [ T i ] ∧ ι [ T j ] ) = ■False ) p

q

(EQ 1)

Disjoint is shown by “≠” as in pTi ≠ qTj . ( T i ≠ T j ) ⇒ ( ∀ x ) ¬( x ∈ ε [ T i ] ∧ x ∈ ε [ T j ] ) p

q

p

q

(EQ 2)

2. Overlapping definitions: If the conjunction of two intentional definitions

cannot be proven to be false (not necessarily false) then they overlap. That means, it is possible that an instance of the definition Ti in ontology p is an instance of the definition Tj in ontology q. It depends on the facts stated about the instances and the intensional definition of Tj. It implies that the extension of the definitions intersect —e.g., wide-main-street and primaryhighway, employee and student, or colleague and sister. In practice, all concepts overlap, unless otherwise is proven by intensional definitions. This fact makes overlapping the most popular level of similarity. ( ( ( ι [ T i ] ∧ ι [ T j ] ) = ι [ T k ] ) ∧ ( ι [ T k ] = ◆True ) ) p

q

(EQ 3)

73

Semantic Similarities

Tk is called a conjunction concept or conjunction relation here. If Tk can be proven to be false in all possible worlds then the two intensional definitions are disjoint. Overlapping is shown by “≈” as in pTi ≈ qTj . ( T i ≈ T j ) ⇒ ( ∃x ) ( x ∈ ε [ T i ] ∧ x ∈ ε [ T j ] ) p

q

p

q

(EQ 4)

3. Specialized definitions (subconcepts or subrelations): If the intentional defi-

nition of Tj is an implication of the intensional definition of Ti, then Ti is a specialization of Tj. Hence, if a definition Ti in ontology p is a specialization (or hyponym) of Tj in ontology q then every instance of the definition Ti is an instance of Tj. This implies that the extensions are in a subset relation. For instance, “man” is a subconcept of “person” and “wife” is a subrelation of “spouse”. The specialization similarity is a partially ordered relation. ((ι[ T i] ∧ ι[ T j]) = ι[ T i]) p

q

p

(EQ 5)

Specialization is shown by “≤” as in pTi ≤ qTj . ( T i ≤ T j ) ⇒ ( ∀x ) ( x ∈ ε [ T i ] ⇒ x ∈ ε [ T j ] ) p

q

p

q

(EQ 6)

4. Equal definitions: This level has the highest degree of similarity. If the two

intensional definitions are equivalent, then the defined concepts are equal. Therefore, every instance of the Ti in ontology p would be an instance of Tj under ontology q and vice versa. According to the above definition, if two concepts or relations are equal, each of them specializes the other one, respectively. Furthermore, the corresponding extensions are also equal. For instance, “vehicle” and “transportation-facility” are equal if they have the same intensional definition. (ι[ T i] = ι[ T j]) p

q

(EQ 7)

Equality is shown by “≡” as in pTi ≡ qTj . ( T i ≡ T j ) ⇒ ( ∀x ) ( x ∈ ε [ T i ] ⇔ x ∈ ε [ T j ] ) p

q

p

q

(EQ 8)

The above similarity relations are a result of adopting the 4-intersection approach presented in [Egenhofer and Herring, 1991] where relations between spatial regions have been introduced. As presented in Table 5.1, we use four combinations of conjunctions of the intensional definitions of the two terms and their negation. Logical and is used instead of intersection; true and not-

74

Ontology-based Integration

TABLE 5.1. Concluding similarity relations based on the consistency of conjunction of intensional definitions of terms A and B.1 ι[A] ∧ ι[B]

ι[A] ∧ ¬ι[B]

¬ι[A] ∧ ι[B]

¬ι[A] ∧ ¬ι[B]

1

■F

■F

■F

■F

Impossible

2

■F

■F

■F

◆T

ι[A] = ι[B] = ■F

3

■F

■F

◆Τ

■F

ι[A] = ■F, ι[B] = ■Τ

4

■F

■F

◆Τ

◆Τ

ι[A] = ■F

5

■F

◆T

■F

■F

ι[A] = ■Τ, ι[B] = ■F

6

■F

◆T

■F

◆T

ι[B] = ■F

7

■F

◆T

◆T

■F

ι[A] = ¬ι[B] (A ⊥ B)

8

■F

◆T

◆T

◆T

A⊥B

9

◆T

■F

■F

■F

ι[A]= ι[B]= ■Τ (A ≅ B)

10 ◆T

■F

■F

◆T

A≅B

11 ◆T

■F

◆T

■F

ι[B] = ■Τ (A ≤ B)

12 ◆T

■F

◆T

◆T

A≤B

13 ◆T

◆T

■F

■F

ι[A] = ■Τ (B ≤ A)

14 ◆T

◆T

■F

◆T

B≤A

15 ◆T

◆T

◆T

■F

B ≠ A and ι[A] ∨ ι[B]=■Τ

16 ◆T

◆T

◆T

◆T

B≠A

1.■F or necessarily false: false for all possible states of the world. ◆T or possibly true: true for at least one possible state of the world (¬■F=◆T)

true instead of empty and non-empty sets; and intensional definitions and their negation instead of interior and boundary [Egenhofer and Herring, 1991]. Modal logic and crisp set theory result in the above levels of similarities, using other approches such as multivalued logic might give rise to more levels of similatities between intensional definitions. This approach is applied to explore possible relations between intensional definition. The sixteen combinations resulted in five major relations. Specialization and generalization are considered at the same level of similarity since one is just the inverse of the other. The extensional implication of the similarity relations is shown in Figure 5.6. One may recognize finer degrees of similarity between disjoint definitions. For instance, similarity between “Railway” and “Highway” is more than similarity between “Railway” and “Building”. However, such fine degrees of similarity are not relevant to integration process. 75

Semantic Similarities

p

Ti Equal qTj

p

q

p

Ti Generalizes qTj

pT i

Specializes qTj q p ((ι[ T i] ∧ ι[ T j]) = ι[ T i])

((ι[ T i] ∧ ι[ T j]) = ι[ T i])

p

pT i

Degree of Similarity

+

(ι[ T i] = ι[ T j])

p

q

p

Overlap qTj

( ( ( ι [ T i ] ∧ ι [ T j ] ) = ι [ T k ] ) ∧ ( ι [ T k ] = ◆True ) ) p

q

pT i

Disjoint qTj

-

( ( ι [ T i ] ∧ ι [ T j ] ) = ■False ) p

q

Figure 5.6. Levels of similarity among intensional definitions [Egenhofer and Herring, 1991].

The case of homonyms is not being considered here due to the fact that all the intensional definitions for a particular community must be identified by their respective terms and known to and agreed on by all members of one community —otherwise, it can cause confusion in the community. In the case of intercommunity, all the intensional definitions and their respective terms ported from outside the community should be uniquely identified according to their respective ontology [Visser and Cui, 1998] —i.e., terms coming from different communities are considered as referring to different concept, unless, proven otherwise. Mechanisms to adopt and use terms defined in other ontologies are discussed in more detail in Section 6.2.4. Deriving similarities between ontologies requires common references in two ontologies and a reasoning system (heuristic) for matching. The common references can be provided by a higher-level ontology such as Ontolingua [Farquhar et al., 1997] or by a thesauri such as WordNet [Fellbaum, 1998] (as suggested in the KRAFT project). As an example we adopted the ontology of PhysicalQuantity from Ontolingua. Finding similarities can also be done by experts familiar with both communities or by a hybrid semiautomatic method. Another approach could be keeping part of the similarity relations in a repository (as in OBSERVER) and then trying to infer new relations from the relations stored in the repository.

76

Ontology-based Integration

5.4. Finding Similarity Relations A major task of the reasoning system in this approach is detecting similarity relations. Similarity relations help to produce an integrated global schema after and map the underlying databases. Finding similarity relations between intensional definitions requires both ontologies to commit to higher-level ontologies (see Section 3.4.2 or [Visser and Cui, 1998]). We need higher-level ontologies as a common reference in two ontologies for finding similarity relations by a reasoning system. A reasoning system uses the higher-level ontology as a common reference in two ontologies for matching. We refer to the combination of two ontologies and the similarity relations between them as a merged-ontology. The examples in Appendix A.1 and Appendix A.2 show the logical expressions defined in Description Logic [Baader et al., 2003] for PowerLoom [MacGregor et al., 1997].3 Taxonomy trees of the ontologies p and q are illustrated in Figure 5.7 and Figure 5.8. The logical expressions define application domain ontologies p and q. Both ontologies commit to a higher-level ontology Transportation shown in Appendix A.3. The Transportation ontology in turn commits to other ontologies (as shown in Figure 6.5). Detection of similarity relations is a major task of the reasoning system in Figure 5.4. When given the above intensional definitions, a reasoning system (in this example PowerLoom) is able to detect similarity relations such as: • “Street” defined in ontology p is a specialization of “Strasse”4 in ontology q; • “Highway” in ontology p is a specialization of “Strasse” in ontology q; • “Road” is a specialization of “Strasse”; • “Highway” is a specialization of “Schnellstrasse”; • “Railway” is equal to “Schienenbahn”; • “Railway” is disjoint from “Strasse”.

The disjoint definitions are not discussed at this point and no action is taken about them in the merging process. The result of merging process is depicted in Figure 5.9. The term integrating ontologies is not used here.This avoids giving the false impression that any of the communities should agree with or commit to the 3. Readers who are not familiar with Description Logic need only read the :documentation lines in the definitions. It can also help those with no knowledge of German. 4. In this chapter, only the names of concepts and classes are capitalized to distinguish them from relations and attributes. 77

Integration of schemata

result of ontology merging. This approach is not aiming to detecting or resolving conflicts or mismatches between ontologies as in [Visser et al., 1998]. This is because members of one community will not necessarily agree with the terms defined in the ontology of the other communities (except those common terms adopted from higher-level ontologies) for the data integration process. The result of merging is mainly used for database schema integration. As an example, heteronymous terms are not treated here due to the practical fact that all the terms and their definitions are only valid within a community whose members agree on them. One can consider the term “Faculty”. In one context it refers to a teaching staff member and in another it is interpreted as an administrative body in a university. Our concern is to ensure a data communication with minimum semantic conflicts, rather than, ensuring an agreement on a single definition of the term “Faculty” and resolving the problem at the ontology level. The later activity is of major concern when building higher-level ontologies.

5.5. Integration of schemata This section shows how two database schemata (Sp1 and Sq1) based on ontologies (P and Q) can be integrated into a global schema (SG). Schema integration is done in two main phases: global class derivation and global attribute derivation. In the first phase, the classes and their hierarchies are generated, and in the second phase, the classes are enhanced by attributes. 5.5.1. Class Integration All the classes in the local schemata must be based on concept definitions in the community’s ontology. In other words, the names of all schema elements used in schema definitions are uniquely referring to definitions in the ontology of the community5. As an example, class Pr_Road in schema Sp1 (Figure 5.7) is based on the term “Primary_Road” defined in an ontology P. We show this link to a term in ontology p with τp, e.g., τq: (Sq1.Pr_Road) → “Primary_Road”, or

(EQ 9)

τp: (Sq1.Nebenstr) → “Neben_Strasse”.

(EQ 10)

5. Whether the term definition already exists in the ontology or is added during database design; or if the links are established just before the integration process or earlier, are not in the scope of this work. We take them for granted here and focus on the integration process. 78

Ontology-based Integration

Road

Secondary Road

Primary Road

Wide

Schema p1

Highway

Street

Railway

Concepts in ontology P Narrow

Railway cross

Road name pavement

Pr_Road name pavement speed_lmt

Highway lane_num speed_lmt pavement

Sec_Road name pavement width

Figure 5.7. Schema p1 and the taxonomy tree of the ontology p.

“τ” returns exactly one term in the respective ontology. “τ” is introduced to provide flexibility in naming schema elements. If the database designer does not link a schema element to a term in the ontology, the integration process will not be able to relate it to schema elements in the other local schema. In this approach, for every class in the local schema we generate a class in the global schema. The goal is that every class in the local schema is represented by (or can be mapped to) a class in the global schema, which is important for data mapping. We start by initializing the global schema SG with the class hierarchy of Sp1. The classes of the schema Sq1 are inserted into the global schema SG by the following steps. The insertion of classes is performed in a stepwise and top-down manner starting from superclasses in the class hierarchy of the local schema Sq1. For the insertion of each class, the following conditions are checked: • A class (c) is only added if no other class already exists in the schema whose concept is equal (synonym) to the one c represents. That is, to add a new

79

Integration of schemata

Transportation Path

Concepts in Ontology q

Schienenbahn

Strasse

Nebenstrasse

Schema q1

Eisenbahn start end

Strasse breite

Hauptstr breite kreuzung

Hauptstrasse

Schnellstrasse

Nebenstr breite verkehr

Schnellstr breite kreuzung tempo-lmt

Figure 5.8. Schema q1 and taxonomy tree of ontology Q. Note that definition of Transportation_Path is adopted from transportation ontology.

class such as “Strasse” to the global schema, the following condition should hold: ∀ c∈SG: ¬[τq(Sq1.Eisenbahn) ≡ τp(SG.c)].

(EQ 11)

For example, the class Sq1.Eisenbahn is not inserted since a class SG.Railway based on the equal (“Schienenbahn” ≡ “Railway”) term is already present in the global schema. In this case, only an alias name Eisenbahn is stored for the same existing class (this is needed during both global attribute generation and the data mapping). • The specialization similarity of the concepts in the merged-ontology should be reflected as a subclass relation in the global schema, as well. Therefore, we establish a subclass (or super class) relation with every class based on a generalized (specialized) concept of the current class. As an example, the

80

Ontology-based Integration

Transportation Path Merged-Ontology Strasse

Street

Road

Primary Road Wide

Secondary Road

Narrow

Schienenbahn

Hauptstrasse

Nebenstrasse

Highway

Schnellstrasse

Figure 5.9. Result of merging ontologies by finding specialization similarity.

class Sq1.Strasse is defined as a superclass of SG.Road and SG.Highway because the following holds according to the merged-ontology: τp(SG.Road) ≤ τq(Sq.Strasse),

(EQ 12)

τp(SG.Highway) ≤ τq(Sq.Strasse).

(EQ 13)

This step can generate redundant subclass relations. For example, after Sq1.Strasse is inserted into the global schema, during insertion of class Sq1.Hauptstr the subclass relation between SG.Strasse and SG.Highway is redundant —see Figure 5.10. After generation of a subclass (or superclass) relation, such redundant relations are detected and eliminated. • While inserting a subclass from the local schema (such as Sq1.Nebenstr), we maintain its subclass relation with the existing superclasses in the global schema (SG.Strasse). However, maintaining such relations can cause duplicate relations, just as explained in the previous paragraph. For an example of such a case, see the insertion of Sq1.Schnellstr in Figure 5.11. Sq1.Schnellstr has an original subclass relation to SG.Hauptstr and we detected the specialization relation between Sq1.Schnellstr and SG.Highway. There are also situations in which new classes are created in the global schema. Both cases need supervision by the database integration administrator: 81

Integration of schemata

Global Schema G Strasse

Road

Secondary

Hauptstr

Primary

Highway

Figure 5.10. Occurrence of a redundant subclass relation while establishing a new subclass relation. • New subclass insertion: New classes may be added to the global schema if

the base concepts of classes are overlapping. A class based on the conjunction concept of the two overlapping classes is added. As an example, when two classes in the global schema are based on two overlapping concepts: τp(SG.Nebenstr) ≈ τp(SG.Sec_Road),

(EQ 14)

then a class (say “safe_street”) based on their conjunction concept can be added to the global schema. This class semantically represents roads with low traffic and low speed limits. Subclass relations are established with both classes (a case of multiple inheritance). Although, such cases often happen during the merging process, many of them are not relevant to applications. In our example, a road with low traffic and a low speed limit may be of no interest. Therefore, there is a need for supervision at this point. The database integration administrator should decide on the necessity of generating such classes. One may use result of the work of [RodrÌguez and Egenhofer, 2003] rather than interaction with the administrator to insert a new class. (In [Hakimpour and Geppert, 2001] we suggest the creation of the conjunction concepts during the merging process. However, we did not find this advantageous). • New superclass insertion: If two classes refer to two overlapping or disjoint

concepts, while the corresponding concepts have a common superconcept, a class based on the common superconcept may also be generated in the global schema. As an example, the class SG.Strasse and SG.Eisenbahn are disjoint. We may add a superclass based on their superconcept “Transportation_Path”. As with the previous case, this action also need justi82

Ontology-based Integration

Global Schema G

Strasse

Road

Secondary

Hauptstr

Primary

Nebenstr

Highway

Nebenstr

Figure 5.11. Occurrence of a redundant subclass relation while maintaining existing subclass relation.

fication from the application point of view. That is, the generation of the new class based on “Transportation_path” should be verified by the database integration administrator. (To further develop the approach one may consider the logical disjunction of the intensional definitions as a new superclass of the two classes.) The final class hierarchy produced by this approach is shown in Figure 5.12. Also the algorithm is present in Section 6.4.1 (Table 6.2 and Table 6.3). 5.5.2. Filling Classes with Attributes All attributes in the database schemata represent binary relations either by pointing to another class or by taking a primitive type such as string or integer. As we assume all classes are based on concepts, we also assume attributes to be based on binary relation definitions in their respective ontologies. For example: τq: (Sq1.Hauptstr.kreuzung) → “strassenkreuzung”,

(EQ 15)

states that an attribute kreuzung in class Hauptstr of local schema p1 is based on relation “strassenkreuzung” in ontology Q (Appendix A.2). There is a major constraint imposed while establishing links for attributes. The class of the attribute should not be based on a concept definition which is disjoint from the concept in the domain of its binary relation. For example in the following: τp: (Sp1.Highway.pavement) → “surfaced-by”,

(EQ 16)

83

Integration of schemata

Global Schema G Strasse breite

Road breite name speed_lmt Pr_Road breite name speed_lmt pavement

Sec_Road breite name speed_lmt

Hauptstr breite tempo-lmt

Highway lane_num breite speed_lmt width tempo-lmt pavement Schnellstr lane_num breite speed_lmt width tempo-lmt pavement kreuzung

Nebenstr breite verkehr

Eisenbahn Railway name start end

Figure 5.12. Final global schema generated by the proposed approach.

the domain of the “covered-by” relation must not be disjoint from τp(Sp1.Highway). This constraint is for instance violated when defining an attribute based on the “covered-by” definition for class Sp1.Railway. Since “Railway” is disjoint from the domain of “covered-by”, it does not comply with the semantics defined in the ontology. This constraint ensures that the schema definitions in Sp1 (and the schema mapping τp) agree with the ontology P. During the generation of attributes, for each attribute in a class of a local schema, we define an attribute in the respective class in the global schema. This confirms that each attribute in a class of a local schema has a counterpart in the global schema. However, an equal relation might have already been represented by another attribute in the same global class. For example, before we add a new attribute “surfacing” to the class “Highway” in the global schema, we check the following: ∀ a∈SG.Highway: ¬[τq(Sq1.Highway.pavement) ≅ τp(SG.Highway.a)].

84

(EQ 17)

Ontology-based Integration

Street

surfaced-by

Pavement

unpaved

tiled

asphalt

Figure 5.13. Example of a relation between a concept (street) and subconcepts of another concept (pavement) rather than its instances.

If this constraint is violated, a semantically equal (synonym) attribute has already been inserted into the class SG.Highway in the global schema. We only keep information about the equality of the attributes for the data mapping. Unlike the case of synonym classes where we only keep an alias name, here we maintain both attribute definitions in the class. The reason is that the equality link is based on semantics of the attribute but does not indicate the similarity in representation and data type (such as: unit, structure) of the value of the attribute. We consider a data conversion during data mapping. The case of attributes being based on relations in specialization similarity is detected and stored to be used during the data mapping. As an example, while defining attribute Sp1.Highway.crossing for class SG.Highway, the following similarity is detected and we keep the information about the type of relation between the two attributes for the data mapping phase: τq(Sq.Strasse.kreuzung) ≤ τp(Sp.Highway.crossing),

(EQ 18)

considering that: τp: (Sp1.Highway.crossing) → “intersect”.

(EQ 19)

We did not find the case of attributes based on overlapping relations relevant to this work. The detection of such cases by the reasoning system is not problematic, though. The relation “surfaced-by” in our example here is showing a special case (Figure 5.13). In general, a relation between two concept definitions specifies how instances of two concepts are related. However, “covered-by” relates instances of a concept (e.g., street) with subconcepts of “Surfacing” rather than an instance of “Surfacing”. One can define “surfaced-by” as a relation between “Street” and instances of “Surfacing Type” - as we did. However, we cannot define them as concepts and assign intensional definitions to them. Necessary

85

Data Mapping

reasoning with such relation definition requires a reasoning system to support higher order logic. Therefore, in our intensional definition of “Surfacing Type” in Appendix A.1 “Asphalt” is defined as an instance of “Surfacing Type”. In such a case an attribute based on the relation is not filled in with a primitive value (e.g., a number or a character string) or an object, but with an enumeration or code that represents a set of individuals or a range of values (e.g., “Asphalt” in Figure 5.13 or one can imagine values “wide” or “narrow” as ranges of value for an attribute width).

5.6. Data Mapping This section discusses problems arising during data mapping using ontologies. We mainly discuss potential problems encountered during the data mapping phase and discuss possible solutions. The generated global schema can be used for the integration of two databases instantiating the local schemata Sp1 and Sq1. The instances of classes in the local databases are mapped to those of the global schema and vice versa. This mapping of instances is straightforward and relies on the information acquired during the schema integration process. Afterwards, a set of operations performs the mapping of the data. Classes in two local schemata referring to the same definition are mapped to the same global class by means of alias names. A potential problem may occur during the data mapping whenever both databases provide instances that represent the same individual in the domain. In our example, one railway route may be stored in both databases DBq1 and DBq2. For example, in a query against the global schema; “Give me all Railways.”, we must be able to detect those railways that are present in both databases. To deal with this problem, we need an identification criterion to recognize if two objects in the underlying databases represent different individuals.6 This criterion must be present in both local class definitions. For instance, for a railway route candidate identification, criteria would be its location or its start and end points. Note that the identification criterion may not necessarily be the primary key in one or both systems (but it certainly should be a unique property). In case the identification criterion of a global class evaluates to true for two instances, such inconsistencies can be prevented or at least detected. If an identification criterion cannot be found, incon-

6. See [Guarino and Welty, 2000a, section 2.3] for related discussion. 86

Ontology-based Integration

sistencies can occur and we risk invoking two objects representing the same individual, when we cannot distinguish such redundancy. In the case where classes are in a specialization relation in the global schema, all instances of a subclass can be mapped to its superclass, but not in the other direction. As an example, consider the query “Give me all Highways” (or highways from Germany), which results in the invocation of data from database DBq1. One solution is to retrieve the SG.Schnellstr (also termed substitutability) which is an incomplete result to the query. For the mapping from a superclass to its subclass we need a classification criterion, which offers a better result in this case. That means, in order to map instances of the superclass SG.Hauptstr to the subclass SG.Highway, the instances should satisfy a classification criterion. By referring to the intensional definitions in Appendix A.1 and Appendix A.2, one can see that if an instance of SG.Hauptstr has a speed-limit of more than 80 and more than 4 lanes it can be mapped to class SG.Highway. Finding the classification criteria and implementing the necessary mapping needs human interaction.7 Most current reasoning systems offer the capability of classifying the instances on the fly. However, the reasoning system (in our case PowerLoom) additionally requires a powerful interface to the database(s). Furthermore, the necessary data must be available in the database to render classification possible. In our example, the information on the number of lanes and the speed limit of the highway should be present in the database. In the case of mapping Hauptstr to Highway this information is however not available. Elaborations on the identification criterion applies here as well. For example, one road may be classified under both SG.Hauptstr and SG.Highway. While mapping SG.Hauptstr object instances to SG.Highway the identification criterion must be checked in order to ensure that the mapping is legitimate. Attributes of a class in a local schema are mapped directly to their counterparts in the global schema. A set of rules map attributes in the global schema. In the case where two attributes are linked by equal similarity, the attribute values are mapped mutually. In the case where two attributes are related by a specialization similarity, the value of the specialized attribute can be mapped to the generalized one, but not the other way round. By looking at the intensional definitions in Section A.2, one can see that every “strassenkreuzung” relation is an “intersect” relation, but not the other way round. In general, there are cases

7. Finding classification criteria seems a trivial activity for people but we could not make such query against the reasoning system (PowerLoom). Although, PowerLoom is perfectly capable of classifying the instance according to the intensional definitions. 87

Conclusion

in which attribute values should be mapped by considering a classification criterion. This means that those crosses between “Strasse” can be mapped to “strassenkreuzung”. An attribute mapping often requires a data conversion process (e.g., integer to real or vector to raster). This is because during the integration we did not utilize any knowledge about the data types. There is often a need for further processing steps during data mapping, such as conversion of units. If the changes are not due to the structural differences, a detailed ontology can eventually help in some cases (such as unit conversion). However, work such as [Rosenthal and Sciore, 1995] can perfectly suit this stage and satisfy the need for data conversions. We intentionally avoid using any representation knowledge to guarantee that the similarity relations are established independent of their representation. Finally it is worth mentioning that unit conversion is also an issue at the ontology level —e.g., in the definition of “width” or “speed limit”. We approached unit conversion problems by adopting and simplifying the ontology of Phyiscal-Quantities present in the Ontolingua [Farquhar et al., 1997] library of ontologies. The only numbers explicitly present are those used as cardinality or ratio. All the numbers used as quantity are defined as an instance of Constant_Quantity which facilitates the unit conversion issue. The same approach can be applied to approach the unit conversion issue at the data level, however we do not consider that an efficient solution compared to existing ones.

5.7. Conclusion The chapter presents a methodology for generating global schemata for tightly coupled federated database systems. The approach facilitates the integration process and in the case of changes in the local schemata, performing the integration process will not be as expensive. This solution uses ontologies as a basis for the integration and for resolving heterogeneity problems during the integration of local schemata. Semantic problems avoided by our approach for creation of global schemata are as follows: • The approach does not consider the class definitions (e.g., attributes and methods) and their structure defined in the schemata. Many approaches rely on knowledge extracted from schema definitions because such knowledge is already available and reduces the cost of their integration approach. However, taking such knowledge into account can be misleading, since it relies on representational facts which are also application dependent. Ontologies provide the advantage of being independent of the knowledge in the applica88

Ontology-based Integration

tion domain. Our approach can be completed by work such as [Rosenthal and Sciore, 1995] in the data mapping phase. • We use only the intensional definitions in ontologies, that is, we avoid using the matching of terms used to name schema elements. This is because, relying on common sense to interpret names is the main source of semantic heterogeneity. Applying explicit definition of terms in schema definitions makes the approach independent of the naming of the schema elements, but depending on their intensional definitions. Consequently, it helps to resolve synonymy and homonymy problems. Ontologies are long term assets that will remain independent of application systems and the presence of domain experts. As ontologies are becoming more popular they can be used for purposes other than database integration. Building a higher level ontology that many communities agree on is a difficult task. Major tasks here are extracting all detailed specifications from members of the community and formalizing these specifications. This makes ontologies expensive to build. However, it is a price worth paying to avoid semantic conflicts that can be even more expensive. A solution could be building ontologies for smaller communities and integrating them (i.e., removing conflicts between them) in a reliable way so as to build higher-level ontologies for a larger community. The quality of the ontologies plays an important role here. There are two important quality measures for the success of this approach are completeness and accuracy. • Completeness: Explicit specification of implicit assumptions in the community facilitates the reasoning process. If the ontologies do not offer detailed specifications, the reasoning system will not be able to detect similarity relations except overlap. If a term is only added to an ontology with no definition (that means no axiom is stated in the definition of the term), the result of similarity detection with every other term will be overlap. The overlap relation in turn results in a query to the integration administrator. The higher the number of other similarity relations, the less the interaction with the integration administrator is required. If the number of non-overlapping similarity relations detected is small, our approach will result in a lowquality integration, as it will be a union of schema definitions rather than an integration. • Accuracy: Accordance of an ontology with the conceptualization (defined in Section 3.4.1) of the community guarantees the result of integration in

89

Conclusion

meeting the expectation of users. If specifications in ontologies do not comply with the conceptualization then the similarity relations will not be accurate. In turn, this will result again in a low-quality and imprecise integration. The accuracy of the formalized ontologies is much depending on the capabilities of the formalism being used. Apart from the limitation of supporting higher-order logic by a reasoning system shown in Figure 5.13, we faced another limitation during the ontology formalization phase. As one can see in Appendix A.3, a phrase in the definition of “Transportation_Path” states that it transports at least one “Wheeled_Vehicle”. The original phrase in the definition of the “transportation_path” should state that it can transport “Wheeled_Vehicles”. That is, “Transportation_Path” possesses the ability to transport a “Wheeled_Vehicle”. In other words, there is at least one possible world that a “Transportation_Path” transports at least one “Wheeled_Vehicle”. Such statement can be expressed by modal logic [Blackburn et al., 2001]. Although the lack of accuracy in this case did not have a negative impact on our example, supporting modal logic is an essential characteristics of a formalism for accuracy of formalized ontologies. Another factor that plays an important role in the success of this approach is the commitment of the schema definitions to the community’s ontology. There are constraints that should apply to schema definitions to guarantee the agreement of schema definitions with intensional definitions. Obviously, we need a formal and clear definition of commitment and the consequent constraints should be clearly specified. One such constraint is discussed in the second paragraph of Section 5.5.2. This chapter covered the integration of classes and attributes, however, methods are not discussed in this work. Methods are often considered as parametric attributes based on relation definitions of arity higher than two in an ontology. However, this approach can only support those methods that represent an action (actions change the states of the world, in other words, map one state of the world to another) as the basis of a definition in the ontology. That is, methods that are the result of implementation (such as triggers) may not have a counterpart in the communities’ ontology. This is a general discussion that applies to rare attributes that are used only for implementational reasons (such as object id) and do not have a counterpart in the ontology. Furthermore, semantics of methods can be supported separately by approaches such as: denotational semantics [Mosses, 1990], action semantics [Mosses, 1992], etc.

90

Ontology-based Integration

Ontologies are mainly based on taxonomy trees that are presented by is-a relations. The semantics of the is-a relation is clearly defined and supported by many reasoning systems (see Section 4.2). The similarity relations presented in Section 5.3 are associated with these relations. Using aggregation (part-of and consists-of) may further improve this approach. However, this requires clear semantics of the aggregation relation. That is, defining its properties and possibly different types of aggregation, Aggregation relations used in our experience are defined in Table A.6.

91

Conclusion

92

CHAPTER 6

The Solution in Practice and the Prototype

6.1. Introduction This chapter presents issues related to using the approach introduced in the previous chapter. We evaluate the capabilities of the solution in practice. The capabilities are evaluated in terms of the problem definitions in Section 1.2. This means that the prototype is not designed and implemented with concerns such as efficiency, flexibility, maintainability and so on in mind. We concentrated on semantic heterogeneity problems during integration. A detailed view of the tasks for this approach are shown in Figure 6.1. There are three manual tasks in the approach: building ontologies, relating schemata to the ontologies and responding to the schema integration module (shown by full boxes in Figure 6.1). The most critical and expensive task in the approach is building and formalizing ontologies. This task produces a set of detailed definition of terms, also guarantee the agreement between members of the community (or at least users of the database). Building ontologies also require understanding and adopting definitions from higher-level ontologies. The second task is to relate schema items to the existing ontologies. This task is similar to that of annotating on the Web in the framework of Semantic Web. However, the structured nature of the data in databases makes of the task done by lower level of complexity. The result of this task is an index function (shown by τ, called commitment here) that returns a term in the ontology for the name of every schema item. By using this index function the name of the schema item does not have to be the same as the term in the ontology. Such flexibly can

93

94

Relating Schema elements to terms in ontologies

Higher-level Ontologies, e.g. Transportation or Vehicle

Ontology Q

Building and formalizing ontologies

Ontology P

(defconcept primary_road ...

(defconcept Hauptstrasse ...

Commitment τ τ(Pr_Road) → Primary_Road

Shows that domain ontologies commit to higherlevel ontologies. Shows the flow of information Shows interaction. Shows production of information.

Relating Schema elements to terms in ontologies Commitment τ τ(Hauptsrt) → Hauptstrasse

Finding similarities

DBp1 Schema q1 Class Pr_Road ...

Introduction

Building and formalizing ontologies

Schema Integration

Integrated schema and the relations to the local schemata

Interaction with integrator for generating new classes

DBq1 Schema p1 Class Hauptstr ...

Figure 6.1. Actions to be taken for the integration by the proposed approach.

The Solution in Practice and the Prototype

be of importance when adopting an existing ontology or when name of schema items do not comply with the terms in the ontology. The third manual task is to interact with the integration process. There are cases in the integration task that the system requires consulting the integration administrator. In our implementation (as shown in algorithm in Table 6.2) an case of overlapping similarity the integration system requires a response from the administrator to generate a new class. The boxes in dashed line are the automatic integration system. The similarity finding part is using the PowerLoom reasoning system to find out the relations between schema items based on the formalized ontologies. The integration part is generating the integrated schema based on the approach presented in Chapter 5. The rest of this chapter is organized as follows. Section 6.2 describes issues related to building ontologies and experiences gained from performing this process ourselves. Section 6.3 gives an overview of the prototype modules and their functionalities. Section 6.4 presents technical details of the prototype including, the class diagram of the prototype, algorithm based on the approach in Chapter 5 and how similarity relations are found by the PowerLoom reasoning system. Section 6.5 discusses limitations of the solution in specific cases. In Section 6.6, we draw the conclusion and suggest further work to enhance the solution.

6.2. Building and Formalizing Sample Ontologies Building an ontology is a pragmatic and fundamental topic for applying this approach. We need well-defined ontologies to successfully practice our approach. Therefore, we discuss the details of defining and evaluating ontologies. Since access to data from a geographic community with an existing ontology was not possible, we based our work on geographic standards. Two geographical standards GDF [GDF 1995] and ATKIS [ATKIS 1998] have been used for our experience. We used guidelines presented in [Jones, 1998, Uschold, 1996] to build the ontologies. 6.2.1. Methodology for Building Ontologies Based on the practical experiences with GDF and ATKIS the following phases are suggested for building ontologies.

95

Building and Formalizing Sample Ontologies

Phase 1. Scope clarification: The purpose of the work should be clear, as well

as the essential sources to extract the ontologies. In spite of specifying a domain, we stay away from application domain requirements and try to be as general as possible. This is due to the fact that ontologies describe universal intension of terms for a community. In our case we are building an ontology for transportation network features for the geographic community. For that purpose, we selected GDF and ATKIS standards. But we do not take into consideration any specific application requirements. It may be used for a traffic control application, road maintenance or tourism applications. Application-independence is crucial for inter-application communication and reusability of the ontology. Phase 2. Concept and relation extraction: Based on the defined scope we

extract relevant terms; phrases (concepts and relations) from the document or other information sources in the context. One should note that some concepts or relations are named by phrases or expressions consisting of several terms (e.g., Verkehrsweg Klasse 6), rather than a single term. A set of sample terms extracted from ATKIS standard1 are shown in Table 6.1. Definition extraction: All statements that can be used for the definition of every concept or relation are extracted (or marked). Some typical examples of such statements are those that state a concept is a specialization (or generalization) of another one, or state a constraint or relation for a concept. For example, ATKIS states that roads (Strassen) and railways (Schienenbahnen) can be regarded as complex objects and they are composed of track and body. Therefore, we define them as subconcepts of complex objects —we discuss this topic further in Section 6.2.2. One may also refer to complementary sources such as dictionaries or members of the community, in the case

Phase 3.

Straße (Road) Schienenbahnen (Railway) Fläche (Surface) Straßenkörper (Road-body) Bahnkörper (Railway-body) Verkehrsweg (Route) Fahrbahn (Roadway or lane) Bahnstrecke (Railway-track) Begrenzungslinie (Side line) breite (width) Komplexes_objekt (Complex-object) bestehend-aus (part-of or component-of) grenze (limits or bounds) abstand (distance) Verkehrsweg_Klasse6 (Route_class6) Verkehrsweg_Klasse9 (Route_class9) Verkehrsweg_Klasse12(Route_class12) Verkehrsweg_Klasse15 (Route_class15) ...

1. The terms are extracted from Seite 3.E, Blätter 1-3. 96

TABLE 6.1.

The Solution in Practice and the Prototype

where definitions of terms are not explicitly mentioned in the selected documents. For example, while adopting the ATKIS ontology for our ontology of transportation, we inserted the phrase “Roads transport automobiles”, considering it as an implicit assumption. Phase 4. Complementary terms. The definitions from the last phase are likely

to contain other concepts or relations which are not directly at the center of our scope but necessary for our definitions —e.g., automobile in the definition of a road. We can treat these terms in two different ways. These terms can have a simple definition (that is leaving their interpretation to common sense). Alternatively we can take their definition from another ontology (a higher-level ontology) which is already agreed by other communities. The later approach is preferred because such agreement with a higher-level ontology facilitates relating our produced ontologies to that of other communities (See Section 3.4.4 and 5.1). For example, we use the definition of “component-of” from our aggregation ontology to define “bestehend-aus” and “automobile” in the definition of street is taken from the Vehicle ontology. However, one should make sure that the adopted definition coincides with the conceptualization of the community that the ontology is built for. Phase 5. Formalization. Finally we formalize the extracted definition in a logi-

cal language that can be processed by reasoning systems and computers. Examples mentioned in phase 3 from the ATKIS standard are formalized as follows: (defconcept Strasse (?s Komplexes_Objekt) :documentation “Strasse is Komplexes_Objekt”) ;; Every Strasse (road) must have at least one Strassenkoerper (surfacing). (assert (forall (?s) (=> (Strasse ?s) (exists (?sk Strassenkoerper) (bestehend-aus ?s ?sk)))))

Expressiveness of a formalism plays an important role in this phase. The more expressive a formalism the more precise the definitions will be. For example, to add the complementary phrase “Roads transports automobiles” (domain ontology Q in Appendix A.2) we insert the following statement: (defconcept Strasse (?r Transportation_Path) : (exists (?a Automobile) (transports ?r ?a)))

The above statement expresses that a road (Strasse) transports at least one automobile. While a precise definition of road should refer to its ability for transporting automobiles, or possibility of every automobile to move on it. Such

97

Building and Formalizing Sample Ontologies

precise definitions can be achieved by using higher order logic (in this case modal logic [Blackburn et al., 2001]), however, analyzing such formalizations is beyond the capability of existing reasoning systems. The availability of a reasoning system to perform desired processes is the important factor for selecting a formalism (Chapter 4 mainly discusses issues relevant to this phase). Natural language definitions, produced in phase 3 and phase 4, need to agree with the conceptualization of the community. Since our ontologies are extracted from geographic standards, we took this agreement for granted. We are not giving any guideline on how the members of the community should reach an agreement or commit to the ontology. However, while emphasizing its importance, we consider this to be an administrative issue and outside the scope of this thesis. A practical question often asked is, “Whose role it is to build an ontology?”. The person building an ontology should have a good understanding of the vocabulary and the conceptualization of the community. Such knowledge helps practically to ensure the accordance of ontologies with the community’s conceptualization as a measure of quality for ontologies (pointed out in Section 5.7). Ontologies should be used by a community not only for a single geographic information system or geo-database. Although, there is nothing preventing a database or information system designer from building an ontology, so long as the requirements from an application do not influence the ontologies. We show a result of such influence in Section 6.2.3 in spatial modeling. Ontologies are long term assets that can be used in different application domains and more importantly for resolving semantic conflicts in communication between different application domains. Therefore, ontologies should not depend on a specific application domain or application requirements. 6.2.2. Importance of Extracted Statements An essential task during the building of ontologies is recognizing the important concepts and their relations to other concepts, in phase 3. Apart from the is-atype-of relation, which is important for the taxonomy hierarchy, we distinguish between three kinds of axioms used for the definition of concepts: 1. Necessary and sufficient characteristics: Taking part in a relation can be nec-

essary and sufficient to be of a certain type. For example, a railway is a geographic feature that can transport a train. That means any geographic feature that can transport a train is a railway and vice versa. Relations such as “transports” are essential to build a non-primitive definition (Section 4.3) for

98

The Solution in Practice and the Prototype

railways. While building ontologies it is of major importance to extract necessary and sufficient properties. 2. Necessary characteristics: Taking part in a relation can be necessary to be of

a certain type. For example, all railway tracks end at two places (train stations). That implies if a spatial feature is classified under railway, then it has two end points —but not vise vera. These relations contribute in building primitive definitions (Section 4.3). Such axioms were also useful in determining intensional definitions of terms in our ontologies. Note that sometimes conjunction of a few necessary relations is also sufficient for an individual to be classified under a concept. 3. Prototypical properties: Taking part in such relations may not be necessary

to be of a certain type. For example, a railway may be called by a name. The fact that a railway can have a name plays no role in the classification of a spacial feature to be of the type railway. This type of axiom does not play a significant role in our approach of building and using ontologies. However, this type of relation is essential for modeling because it helps to capture aspects of the mini-world that are relevant to an application and organize data in databases. The motivation for the above classification of relations is their relevance to define concepts. Capturing these relations and the constraints on them specifies the semantics of those concepts. Extraction of the first and the second type of relations are important for this work, because, it is based on the componential theory (see Section 3.5) for specifying semantics. The first two types of relations express characteristic features of a concept. When relying on prototype theory rather than componential theory (or both), one must capture all relations including the third type. Using knowledge of the third category of axioms for the purposes of classification is not the concern of the existing reasoning systems. 6.2.3. Using GDF and ATKIS Ontologies To Perform a suitable similarity matching between ontologies we need basic agreement between application ontologies. Such agreement is articulated by means of higher-level ontologies (Section 3.4.2). Our ontologies are at the level of application ontology and domain ontology as described in [Guarino, 1998a] (Figure 6.2). We use geographic standards to build our higher-level ontologies. That is in spite of works already done on top-level ontologies. The reason behind this decision is that the standards are closer to the content of the spatial

99

Building and Formalizing Sample Ontologies

databases rather than top-level ontologies, as top-level ontologies are very general [Guarino, 1998a]. Works such as [Borgo et al., 1996, Coenen and Visser, 1998, Smith and Mark, 1998, Casati et al., 1998], although essential issues in spatial data modeling, would not be directly helpful for our approach. Especially, due to the fact that we do not use the spatial characteristics of the objects. As an example, while semantic similarity of class street with other classes is being checked, we are concerned with the interpretation of class Road in people’s understanding, rather than, how a road occupies space or what is the theory of position and shape of road. We took two known geographic standards to build our example top-level ontology ontologies. After extracting logical axioms from ATKIS and GDF standards and evaluating the formalizatask domain tion approaches (presented in ontology ontology Chapter 4), we faced a difficulty. The difficulty with both standards application and especially the GDF standard ontology was the influence of representation issues on the standards. At the Figure 6.2. Kinds of ontology from beginning, the focus of the work [Guarino, 1998a]. was the GDF standard. However, GDF is mainly concerned with the representation of geographic features and that makes it suitable for conceptual schema development rather than building ontologies. For instance, GDF specifies, in detail, how a road is represented rather than what a road is or how an object is classified under the concept “road” (Part of the axioms extracted from GDF are presented in Appendix A.10). Originally, the main goal of the GDF standard was to present a framework for geographic data exchange, but not at the semantic level. The second standard used in this thesis is the ATKIS object-domain catalog. Unlike GDF, ATKIS offers more specifications (in term of definition) of geographic features. For instance, ATKIS explains the specifications of “strasse” in terms of its components, rather than how to represent it. While building the sample ontologies for the prototype, many of the definitions from both of the standards have been adopted and used. However, we also added extra axioms to the definitions to build our sample ontologies. These axioms state our implicit assumptions about the semantics of terms used in the standards.

100

The Solution in Practice and the Prototype

Geographic Feature

Point Feature

Linear Feature

Street

...

Aereal Feature

Building

...

Figure 6.3. Geographic features classified in three main classes.

We used OntoClean [Guarino and Welty, 2002] to show a problem with the taxonomy tree of spatial features. OntoClean guidelines apply mainly to the is-a relations of a taxonomy tree. The guidelines can be applied during the extraction of the definition or applied afterwards to the results produced in phase 3 and phase 4. GDF (like many geo-spatial systems) classifies spatial objects by primitive types namely: point (0D), line (1D) and area (2D). A spatial object inherits its representation depending on its representation dimensions. This approach is very efficient and useful to model spatial objects. Representation of such objects in a database can change depending on the scale and applicationdomain, while, associated with the same individual in the domain. For instance, a street is either a linear or an areal object; and a building is either a point or an areal object (Figure 6.3). Using the notion of rigidity introduced in Ontoclean one can clarify the source of the heterogeneities caused by conceptual modeling in this example. While being a building is a rigid property, being an areal or point feature is not a rigid property of a spatial feature. In other words, the dimension in which an object is represented is not a rigid property. Therefore, considering a street as a specialization of linear feature or building as a specialization of point features is not justified. In fact, object classes should be related to their representational dimension not inherit from it (see Figure 6.4). [Sowa, 2000] also observes the result of this problem in the work of [Warren and Fernando, 1982]. We used this perspective (Figure 6.4) for building and formalizing our GDF ontology in Appendix A.10. Finally, we built formalized ontologies Transportation and domain ontologies P and Q, by adopting parts of the axioms extracted from GDF and ATKIS. These 101

Building and Formalizing Sample Ontologies

Geometric Representation

Point

Line

Geographic Feature

represents

Area

Building

...

Figure 6.4. Geographic features are represented by either point or line or area.

ontologies are independent of representation as much as it is possible. Specially, Transportation ontology as a higher-level ontology does not concern representation issues. 6.2.4. Inter-Ontology Relations An ontology for a large number of communities cannot be complete or highly specialized. The more detailed the definitions in an ontology for a community are, the more difficult it is to reach a consensus within the community —or between communities. A community can adopt a higher-level ontology (see Section 3.4.2) and specialize it by adding its own definitions to it. As a result, a specialized ontology cannot remove any constraint or term of a higher-level ontology without agreement of the communities already committed to that ontology [Visser and Cui, 1998]. In other words, modification of ontologies need agreements among all communities committed to it. Modular structuring of formalized ontologies facilitates developing and reusing them. Two relations, includes and uses, have been utilized in previous work to organize and manage ontologies [Heflin and Hendler, 2000, Visser and Cui, 1998, Visser et al., 1998] ([Farquhar et al., 1997] suggest more relations between ontologies). PowerLoom supports necessary mechanisms to define a relation between ontologies to adopt terms from their higher-level ontologies. Three mechanisms called include, use and referencing may be used for this purpose. The same mechanisms are provided by Ontolingua. The include mechanism makes an ontology inherit all the definitions from another ontology (a higher-level ontology) and all those that are included in it —that is, include is transitive. One should notice here that when a higher-level ontology is adopted by the “include” mechanism, the new ontology should still comply with the higher-

102

The Solution in Practice and the Prototype

Math Relation

Aggregation

GDF axioms

PhysicalQuantity

ATKIS axioms

Vehicle

Spatial Relation

Transportation

Application Domain P

Application Domain Q

Figure 6.5. Organization of ontologies P and Q and their high-level ontologies (arcs illustrate the include relations). As explained in Section 6.2, we also build a set of axioms extracted from GDF and ATKIS standards. Some of the axioms are used during our experience with the prototype.

level definitions of the adopted ontology and its parents —i.e., they should be coherent. Includes is the only inter-ontology relation we use for our prototype. Another mechanism to adopt an ontology is the use mechanism. When an ontology uses a higher-level ontology it inherits definitions only from the specified higher-level ontologies but not from the higher-level ontologies that these use —which implies, use is not transitive. The use mechanism is not suitable for defining ontologies in our solution as we rely on transitivity of inheriting definitions from higher-level ontologies. This last mechanism allows an ontology to adopt a single definition from another ontology. In the case where a community can only adopt part of a higher-level ontology the referencing mechanism offered by Ontolingua and PowerLoom can be applied. Such a mechanism allows an ontology to take some of the definitions (concept or relation definition) in another ontology and adopt them. Supporting this mechanism can cause much complication for the ontology management module (Section 6.3). While the same result can be achieved by defining an ontology containing (a copy of) the required definitions and using the include mechanism. A complementary mechanism, which is essential in multilingual environments, is called translation. This mechanism allows one to adopt the definition of a concept or relation but change the term used to refer to that concept. It helps in 103

Overall Functionalities of the Prototype System

cases of differences between the (natural) language of the community that builds the ontology and the (natural) language of the communities which will use the terms of the higher-level ontology. Such translation needs to have a proper understanding of the ontology in the higher-level ontology. For example, the translation of “street” in a higher-level ontology to “strasse” without paying attention to the definitions can cause an unintended result from a reasoning system. The only mechanism currently being used by the prototype for adopting higher-level ontologies is the includes mechanism. Figure 6.5 illustrates how our ontologies are organized in modules adopting higher-level ontologies.

6.3. Overall Functionalities of the Prototype System This section shows how the prototype system can help to integrate database schemata. We show the functionality of the system by a use case and its diagram (Figure 6.6). The solution consists of two main modules. The first is the ontology management module. This module provides the necessary tools to define modify and organize the ontologies as well as process them. The second is the Schema Integration module which facilitates the integration by helping the integration administrator. The integration module is distributed. That means, several instances of the integration module may run on different systems. Unlike the integration module, the ontology management module runs on a single centralized system. It can be enhanced in future by a single central management module, storing the ontologies on distributed systems. The ontology management module provides the following services: • Register Community: One can use this service to introduce one’s community. The same service is used by a community to commit to an existing ontology. Committing to an ontology has two major effects. First, the terms defined in the ontology are added to the scope of the community. That means, the community will be enabled to relate its database schema elements to the terms defined in the ontology. As a consequence, the Ontology Management module performs necessary checks to make sure the ontology does not contain a term already existing in the scope of the community. Second, in the case of a request for modification of a committed ontology the respective communities are consulted and their agreement obtained to authorize such modifications. (However, the prototype does not support the operations related to the second issue, yet.)

104

The Solution in Practice and the Prototype

• Add Ontology: A registered community can create a new ontology. The new

ontology may include any number of higher-level ontologies, as long as no term is defined in multiple ontologies. PowerLoom offers more possibilities such as translating terms but the prototype is only using the include feature. One may also require copying and modifying an ontology for their own specific intension. • Add Term: A registered community can propose adding a term to an existing ontology. The method performs major checks to make sure that the term has not already been defined in the ontology or one of its higher-level ontologies. It also checks that the definition of the term is consistent in the context of the ontology. We are also considering an agreement process between issuing a request and actually adding a term, which guarantees the agreement of other communities. That is, if a community commits to an ontology and consequently its applications are based on that ontology, it should agree with any changes proposed by the other communities before the ontology can be modified. • Modify Axiom: A registered community can propose adding an axiom to an existing ontology. It also checks that adding the new axiom is syntactically correct, making sure it is consistent with the ontology and that the respective terms of the axiom are defined in the ontology. A process similar to that of

Ontology Management Add Term

Schema Integration

Reasoning Commitment Editor

Modify Axioms

Add Ontology

Community

Finding Similarities

Register Community

Schema Integrator

Database Designer

Integration Administrator

Figure 6.6. A use case diagram to show the functionalities of the prototype.

105

Overall Functionalities of the Prototype System

adding terms is required here to guarantee the consensus of communities to a requested modification. • Finding similarities: This service is the main reason we need a reasoning system. This service finds the similarity relations using PowerLoom. Details of how this service interacts with PowerLoom are presented in Section 6.4.2. This service is used by Schema Integrator. It finds out the type of similarity between two terms requested by Schema Integrator. • Reasoning: This service performs a set of common services to interact with PowerLoom, For example, loading an ontology into PowerLoom, or adding a new definition temporarily to a loaded ontology just to check its consistency with the existing terms and axioms. The two modules are designed with maximum possible independence. The services offered by the ontology module are general semantics related services. The methods are available through the CORBA remote invocation standard. CORBA IDL interfaces for using the above services are provided in Appendix B.1. The Schema Integrator module is designed in such a way that it can run in a distributed environment. That is, different instances of the Schema integrator can run and use the Ontology Management module. The Schema Integration module remotely invokes required methods in the Ontology Management module. The Schema Integration module in Figure 6.6 consists of two services: • Commitment editor: A database designer or integration administrator uses this service to relate a schema element (class and attribute name) to a term in the communities’ ontologies. This service sets up the hash function shown in Section 5.5 by “τ”. Prior to integration a database designer (or the integration administrator) should link schema elements to the terms defined in the ontology server. These links can be established during database design or anytime before the database integration (in the case of legacy systems). All schema elements should be related to terms defined in the committed ontologies of a community. This means, database schema elements in one schema can be linked to terms in the scope of one single community (not a combination). • Schema integrator: One uses this service to integrate schemata based on their commitments and the similarities. This service suggests adding new classes to the global schema and establishes necessary subclass relations as described in Section 5.5. The algorithm is also present in Section 6.4.1.

106

The Solution in Practice and the Prototype

6.4. Technical Specification of the Prototype The class diagram of the implementation is illustrated in Figure 6.7. It consists of major classes in the prototype system. The upper part of the class diagram shows the model in the Ontology Management module and the lower part illustrates the classes in the Schema Integration module. The description of the classes in the ontology module is as follows: • Term: This class contains the word and links to all the axioms related to the term. Each term requires at least one simple axiom to state its existence in exactly one ontology. A major method in this class evaluates the similarity of the term with any other given term. • Logical_Axiom: This class contains an axiom and the verbal documentation for it. The class contains one axiom in DL for PowerLoom, while it can be extended to contain a translation in other languages (e.g., FLogic) or standards (e.g., OIL). Every axiom must state a fact about at least one term. • Ontology: This class offers several methods for inserting a term or axiom into the ontology, including methods for checking consistency of new terms and axioms. Another method finds similarities between all terms contained in one ontology with another one and stores them in the database. It also contains a method to load all the axioms defining the terms in the ontology. • Community: This class allows a community to request a modification of ontologies (e.g., request for adding a term or modifying a definition). This class requires improvement to help communities to reach an agreement. • Modification Request: We foresee the need to store modification requests for every ontology. This Class helps to keep track of those communities that have committed to an ontology on which a modification has been requested. Prior to actual modification, agreement of the communities should be obtained. As mentioned in Section 6.3 (register community), we did not develop the necessary methods for this class any further. The integration module should model the database schema definitions. During our work we described our schemata (p1 and q1) by their classes and the attributes of every class according to the following model. In general, a final product should import the schemata from the data dictionary of a database system. The description of the classes in the integration module is as follows: • Schema: This class represents the database schemata and contains the Class Elements. The major method in this class produces a global schema by receiving another schema as an input. The algorithms producing the global schema are presented in Section 6.4.1. Every Schema has a specific 107

Technical Specification of the Prototype

community that limits the term definitions to what the community committed to. The schema consists of a set of Class Elements. • Class Element: This class represents the class definitions in the database schema. That is, it has one and only one Schema. It consists of a set of Attribute Elements. It has a set of super classes in the same schema. Every class in the Global schema has a link to at least one class in a local schema. The correspond attribute is null for all classes in the local schemata. For Class Elements in a generated global schema correspond points to the underlaying classes (at least one) in the local schemata. • Attribute Element: This class represents an attribute in the database schemata. It belongs to one and only one Class Element. The correspond attribute is set to null for all attributes in a local schema (similar to Class includes

0…n Ontology

0…n

name documentation

Logical Axioms axiomDL documentation 1…n

ModificationReq.

0…n

1 0…n Term

1…n

0…n Community

word verbalDefinition

2 0…n Similarity Rel.

1

0...1

similatityType

0...n Schema Element

similar attribute 0…n Attribute Element 1

Contains 1 1…n

0...n belongs to

1

Class Element

0…n Schema

superclass 0…n

1…n Corresponds

Corresponds

Figure 6.7. The class diagram of the system (See APPENDIX B for details).

108

The Solution in Practice and the Prototype

Element). For Attribute Elements in a generated global schema correspond points to one and only one underlying class in the local schemata. The prototype is implemented using Java. It is using the Oracle database to store the relevant data in both modules. More details of class definitions are presented in Appendix B.3 and Appendix B.2. We used PowerLoom 2.0.alpha as the underlying reasoning system. 6.4.1. Integration Process The Table 6.2 and Table 6.3 show the integration algorithms. The algorithms are based on the approach described in Section 5.5. The inputs of the algorithm are two Schema classes (see Figure 6.7) Sp and Sq and the output is another Schema SG which is the result of the integration. The interface definitions for the methods in the following algorithm and their documentation are present in Appendix B.2. TABLE 6.2. The class integration algorithm implements the approach described in Section 5.5.1. // Initializing the global schema SG with Sq. Sort (Sq); // Moves all super classes before their // subclasses in the list of classes. for (i = 1; i > (exists (?endra Residential_Area) (ends-at ?r ?endra))) (defconcept Railway (?r) :documentation "A Transportation Path is a Railway if and only if it transports Train." : (exists (?t Train) (transports ?r ?t)) :=>> (forall (?x vehicle) (=> (transports ?r ?x) (Train ?x)))) ;; Specifying disjoint concepts (assert (forall (?x Street) (not (Road ?x)))) (assert (forall (?x Road) (not (Street ?x)))) (assert (forall (?x Railway) (not (Street ?x)))) (assert (forall (?x Railway) (not (Road ?x)))) (assert (forall (?x Railway) (not (Highway ?x)))) ;; Only using second order logic one can define the following four surfacings as concepts and assign intensional definitions to them. Definitions would not cause a major problem here but full reasoning is not supported. (defconcept SurfacingType (?pt) :documentation "SurfacingType is one of the following unpaved, tiled or asphalt." : (setof unpaved tiled asphalt ?pt)) (deffunction surfaced-by (?x) :-> (?pt SurfacingType) :documentation "’paved-by’ assigns a PavementType only to the following concepts Street, Highway or Road" :axioms(or (domain paved-by Street) (domain paved-by Road) (domain paved-by Road)) :=>> (or (Street ?x)(Highway ?x)(Road ?x))) (defconcept Primary_Road (?pr Road) :documentation "Primary Road has a high amount of traffic, that is, having more than 1500 Automobiles per day running over it." : (>= (vehicle_traffic ?pr) 1500) ) (defconcept Secondary_Road (?sr Road) :documentation "Secondary Road has a low amount of traffic, that is, having less than 1500 Automobiles per day running over

128

APPENDIX A

it." :

(< (vehicle_traffic ?sr) 1500))

;; Primary and Secondary Road are disjoint concepts (assert (forall (?x Primary_Road) (not (Secondary_Road ?x)))) (assert (forall (?x Secondary_Road) (not (Primary_Road ?x)))) (assert (Constant_Quantity Meter25)) (assert (unit-of Meter25 meter)) (assert (val Meter25 25)) (defconcept Narrow_Road (?nr Primary_Road) :documentation "Narrow Road is a primary road with a width less than 25." : (greater-quantity Meter25 (width ?nr))) (assert (Constant_Quantity Meter30)) (assert (unit-of Meter30 Meter)) (assert (val Meter30 30)) (defconcept Wide_Road (?wr Primary_Road) :documentation "Wide Road is a primary road with a width greater than 30." : (greater-quantity (width ?wr) Meter30)) ;; Narrow and Wide Road are disjoint concepts (but not exhaustive) (assert (forall (?x Narrow_Road) (not (Wide_Road ?x)))) (assert (forall (?x Wide_Road) (not (Narrow_Road ?x))))

APPENDIX A.2. Ontology of the application domain Q. (in-package "STELLA") (defmodule "QDOMAIN" :includes ("TRANSPORTATION")) (in-module "QDOMAIN") (in-dialect KIF) (defconcept Strasse (?r Transportation_Path) :documentation "Strasse is a subconcept of Transportation path that transports cars." : (exists (?a Automobile) (transports ?r ?a)) :=>> (forall (?x vehicle) (=> (transports ?r ?x) (Automobile ?x)))) ;; Defining a quantity of 50 Km/H. (assert (Constant_Quantity KMPerH50)) (assert (unit-of KMPerH50 KMPerH)) (assert (val KMPerH50 50))

129

(defconcept Hauptstrasse (?h Strasse) :documentation "Main Rout has a speed limit of more than 50 KM/ H. Note: independent of where it is located." : (greater-quantity (speed_limit ?h) KMPerH50)) (defconcept Nebenstrasse (?n Strasse) :documentation "Secondary Rout has a speed limit of less than 50 KM/H. Note: independent of where it is located." : (greater-quantity KMPerH50 (speed_limit ?n))) (assert (forall (?x Nebenstrasse) (not (Hauptstrasse ?x)))) ;; Defining a quantity of 50 Km/H. (assert (Constant_Quantity KMPerH100)) (assert (unit-of KMPerH100 KMPerH)) (assert (val KMPerH100 100)) (defconcept Schnellstrasse (?ss Strasse) :documentation "Schnellstrasse is a fast driving route for Automobiles. It is a subconcept of Strasse that has a speed limit of at least 100 and has at least 6 lanes." : (and (greater-quantity (speed_limit ?ss) KMPerH100) (>= (cardinality has-lane ?ss) 6))) (defrelation strassenkreuzung ((?x Strasse)(?y Strasse)) :documentation "’strassenkreuzung’ is an intersection between two ’Strasse’n." : (and (intersect ?x ?y)(Strasse ?x)(Strasse ?y))) (defconcept Schienenbahn (?f) :documentation "A Transportation Path is a Railway if and only if it transports Train. The term is taken from ATKIS standard but the semantics is different." : (exists (?t Train) (transports ?f ?t))) (assert (forall (?x Schienenbahn) (not (Strasse ?x)))) (deffunction breite ((?x Thing)) :-> (?q Constant_Quantity) :documentation "breite is the same as width (width is not shadowed)." :=>> (= (quantity-dimension ?q) length) : (width ?x ?q))

APPENDIX A.3. Transportation ontology.

Parts of this ontology are adopted from ATKIS and GDF standards. (in-package "STELLA") (defmodule "TRANSPORTATION" :includes ("VEHICLE_ONTOLOGY", "SPATIAL_RELATION",

130

APPENDIX A

"AGGREGATION" "PHYSICAL-QUANTITY")) (in-module "TRANSPORTATION") (in-dialect KIF) (defconcept Transportation_Path (?tp)) (defconcept Lane (?l)) (defrelation has-lane ((?tp Transportation_Path) (?l Lane)) :documentation "’Transportation_Path’ is composed of lane." :=>> (composed-of ?tp ?l)) (defconcept Sideline (?s) :documentation "A Transportation_Path is surrounded by a left and a right side line. It is adopted from ATKIS ’linienhaft’.") (deffunction left-surrounded-by ((?tp Transportation_Path)) :-> (?s Sideline) :documentation "A Transportation_Path is surrounded by a single Sideline at its left. The side line is part od the Transportation_Path and it is at its left side." :=>> (composed-of ?tp ?ls) :=>> (left-of ?ls ?tp)) (deffunction right-surrounded-by ((?tp Transportation_Path)) :->(?s Sideline) :=>> (composed-of ?tp ?ls) :=>> (right-of ?ls ?tp)) (deffunction transports ((?tp Transportation_Path)) :-> (?vw Wheeled_Vehicle) :documentation "run maps every Transportation Path to one and only one Wheeled Vehicle.") (deffunction starts-at ((?tp Transportation_Path)) :-> (?x Thing)) (deffunction ends-at ((?tp Transportation_Path)) :-> (?x Thing)) (defconcept Transportation_Path (?tp) :documentation "An individual is a Transportation Path if and only if it transports Wheeled Vehicle. A Transportation Path is surrounded by two Sidelines and has at least one Lane. The definition is adopted from ATKIS ’verkehrsweg’ with minor changes." : (exists (?wv Wheeled_Vehicle) (transports ?tp ?wv)) :=>> (exists (?l Lane) (has-lane ?tp ?l)) :=>> (exists (?ls Sideline) (left-surrounded-by ?tp ?ls)) :=>> (exists (?rs Sideline)

131

:=>> :=>>

(right-surrounded-by ?tp ?rs)) (exists (?x) (starts-at ?tp ?x)) (exists (?y) (ends-at ?tp ?y)))

(deffunction belongs-to-path ((?l Lane)) :->(?tp Transportation_Path) :=>> (part-of ?tp ?l)) (assert (forall (?l Lane) (exists (?tp Transportation_Path) (belongs-to-path ?l ?tp)))) (assert (forall (?l Lane) (= (cardinality (belongs-to-path ?l)) 1))) (assert (forall ((?l Lane)(?tp Transportation_Path)) (=> (belongs-to-path ?l ?tp)(has-lane ?tp ?l)))) (assert (forall ((?l Lane)(?tp Transportation_Path)) (=> (haslane ?tp ?l)(belongs-to-path ?l ?tp)))) (deffunction vehicle_traffic ((?tp Transportation_Path)) :-> (?n Number) :documentation "Automobile per day maps every Transportation Path to one and only one Number.") (deffunction speed_limit ((?tp Transportation_Path)) :-> (?q Constant_Quantity) :documentation "speed limit maps every Transportation_Path to one and only one Constant_Quantity." :=>> (quantity-dimension ?q LengthPerTime)) (defrelation intersect ((?x Transportation_Path) (?y Transportation_Path)) :documentation "’intersect’ relates the following concepts. Nothing intersects iteself. No further detail is stated! This is in conflict with the GDF definition of intersection") (deffunction distance (?x ?y):->(?q Constant_Quantity) :=>> (quantity-dimension ?q Length)) (deffunction width ((?tp Transportation_Path)) :-> (?q Constant_Quantity) :documentation "width is distance between two sides of a Transportation_Path. It is a function defined for transportation_Path and gives a quantity with quantity-dimension of length. It is adopted from ATKIS standard." : (= (distance (left-surrounded-by ?tp) (right-surrounded-by ?tp)) ?q) :=>> (quantity-dimension ?q Length)) (defconcept Residential_Area (?ra Area))

132

APPENDIX A

APPENDIX A.4. Ontology of spatial relations (in-package "STELLA") (defmodule "SPATIAL_RELATION" :includes ("MATH_RELATION")) (in-module "SPATIAL_RELATION") (clear-module "SPATIAL_RELATION") (clear-instances "SPATIAL_RELATION") (reset-features) (in-dialect KIF) (defconcept area (?r)) (defrelation inside ((?x Area) (?y Thing)) :documentation "It stated that something is entirely inside an area. It is a transitive relation.") (assert (transitive-relation inside)) (defrelation outside ((?x Thing) (?y Area)) :documentation "it is NOT a transitive relation.") ;; inside and outside are disjoint. (assert (forall (?x ?y) (=> (inside ?x ?y)(not (outside ?x ?y))))) (assert (forall

(?x ?y) (=> (outside ?x ?y)(not (inside ?x ?y)))))

;; outside is symmetric if Both individuals are area (assert (forall (?x ?y) (=> (and (outside ?x ?y) (area ?x) (area ?y)) (outside ?y ?x)))) (defrelation left-of ((?x Thing) (?y thing) :documentation "left-of is a transitive relation between two things.") (assert (transitive-relation left-of)) (defrelation right-of ((?x Thing) (?y thing) :documentation "right-of is a transitive relation between two things.") (assert (transitive-relation left-of))

APPENDIX A.5. Ontology of mathematical relations (in-package "STELLA")

133

(defmodule "/PL-KERNEL/PL-USER/MATH_RELATION") (in-module "MATH_RELATION") (clear-module "MATH_RELATION") (clear-instances "MATH_RELATION") (reset-features) (in-dialect KIF) (defrelation transitive-relation ((?r RELATION)) :(forall (?x ?y ?z) (=> (and (?r ?x ?y) (?r ?y ?z)) (?r ?x ?z)))) (defrelation symmetric-relation ((?r RELATION)) :(forall (?x ?y) (=> (?r ?x ?y) (?r ?y ?x)))) (defrelation antisymmetric-relation ((?r RELATION)) :(forall (?x ?y) (=> (?r ?x ?y) (not (?r ?y ?x))))) ;; symmetric and antisymmetric are disjoint (assert (forall (?r) (=> (symmetric-relation ?r) (not (antisymmetric-relation ?r))))) (assert (forall (?r) (=> (antisymmetric-relation ?r) (not (symmetric-relation ?r)))))

APPENDIX A.6. Aggregation ontology

We only distinguished two types of aggregation relations here. One type is transitive called part-of (its inverse is called composed-of). The other type is not transitive and is called component-of (its inverse is called aggregation-of). Some aggregation relations have monopolizing characteristics. That means, something which is part of a specific type of whole can not be part of another instance of the same type of whole. For example, a province is part of a country and cannot be part of any other country. While some aggregations do not have this property. For example a street can be part of many addresses. How useful such consideration would be is to be seen! We did not define the relations according to the latter criteria here. 134

APPENDIX A

(in-package "STELLA") (defmodule "AGGREGATION" :includes "MATH_RELATION") (in-module "AGGREGATION") (in-dialect KIF) (defrelation part-of(?p ?w)) (assert (transitive-relation part-of)) (assert (antisymmetric-relation part-of)) (defrelation composed-of (?w ?p)) (assert (transitive-relation composed-of)) (assert (antisymmetric-relation composed-of)) ;; composed-of and part-of are inverse relations! (assert (forall (?w ?p) (=>

(composed-of ?w ?p) (part-of ?p ?w))))

(assert (forall (?w ?p) (=>

(part-of ?p ?w) (composed-of ?w ?p))))

(defrelation component-of (?p ?w) :documentation "similar to part-of relation, but not transitive.") (assert (antisymmetric-relation component-of)) (defrelation consist-of (?w ?p) :documentation "similar to composed-of relation, but not transitive.") (assert (antisymmetric-relation consist-of)) ;; consist-of and component-of are inverse relations! (assert (forall (?w ?p) (=> (consist-of ?w ?p) (component-of ?p ?w)))) (assert (forall (?w ?p) (=>

(component-of ?p ?w) (consist-of ?w ?p))))

APPENDIX A.7. An ontology of Vehicles. (in-package "STELLA") (defmodule "/PL-KERNEL/PL-USER/VEHICLE_ONTOLOGY" :documentation "An Ontology for Vehicles.") (in-module "VEHICLE_ONTOLOGY") (in-dialect KIF) (deffunction carries ((?v Vehicle)) :-> (?x Thing)) (defconcept Vehicle (?v) :documentation "Vehicles carry some Thing." 135

:

(exists (?x Thing) (carries ?v ?x)))

(defconcept Wheeled_Vehicle (?wv) :documentation "Wheeled Vehicle is a Thing.") (defconcept Automobile (?a Wheeled_Vehicle) :documentation "Automobile is a type of Wheeled Vehicle.") (defconcept Train (?t Wheeled_Vehicle) :documentation "Train is a type of Wheeled Vehicle.") (assert (forall (?x Automobile) (not (Train ?x))))

APPENDIX A.8. Ontology of physical quantities adopted from the library of ontologies on Ontolingua.

This ontology of Physical Quantity is simplified and adopted from the Ontolingua library of higher level ontologies (available in http://WWW-KSLSVC.stanford.edu:5915). The main concern of this ontology is to handle unit conversion at the ontology level, while keeping the expression of semantics independent of units of measures. We believe the problem of unit conversion at the data level is solved, otherwise, this ontology is useful for unit conversion at the data conversion level. (in-package "STELLA") (defmodule "/PL-KERNEL/PHYSICAL-QUANTITY" :documentation "This module is defining the physical quantity which is a simplified version of Physical-Quantity adopted from Ontolingua library of ontologies.") (in-module "PHYSICAL-QUANTITY") (clear-module "PHYSICAL-QUANTITY") (reset-features) (in-dialect KIF) (add-trace :CLASSIFIER-INFERENCES) (defconcept Physical_Quantity (?q Thing)) (defconcept Physical_Dimension (?d Thing) :documentation "The concept of Quantity_Dimension is mainly used to distinguish comparable quantities -e.g., length, time, length per time, or .") ;; Physical_Quantity and Physical_Dimension are disjoint. (assert (forall (?x) (=> (Physical_Quantity ?x) (not (Physical_Dimension ?x)))))

136

APPENDIX A

(deffunction quantity-dimension ((?q Physical_Quantity)) :documentation "Every quantity can have only one Physical_Dimension." :-> (?d Physical_Dimension)) (defconcept Physical_Quantity (?q Thing) :documentation "A Physical_Quantity must have a Physical_Dimension" :=>> (exists (?d Physical_Dimension)(quantity-dimension ?q ?d)) ) (defrelation compatible-quantities ((?q1 Physical_Quantity) (?q2 Physical_Quantity)) :documentation "Two physical quantities are compatible if their physical-dimensions are equal." :(= (quantity-dimension ?q1) (quantity-dimension ?q2))) (defconcept Constant_Quantity (?c Physical_Quantity)) (deffunction val ((?c Constant_Quantity)) :-> (?n Number)) (defconcept Unit_of_Measure (?u Constant_Quantity)) (deffunction unit-of ((?c Constant_Quantity)):-> (?u Unit_of_Measure)) (defconcept Constant_Quantity (?c Physical_Quantity) :=>> (exists (?n Number) (val ?c ?n)) :=>> (exists (?u Unit_of_measure) (unit-of ?c ?u))) (deffunction magnitude ((?q Constant_Quantity)):-> (?mag Number) :=>> (= (* (magnitude (unit-of ?q)) (val ?q)) ?mag)) (assert (forall (?q Constant_Quantity) (= (magnitude ?q) (* (magnitude (unit-of ?q)) (val ?q)) ) ) ) (defrelation greater-quantity ((?q1 Constant_Quantity) (?q2 Constant_Quantity)) :documentation "A quantity is greater than the other if and only if its magnitude is greater and has the same Quantity_Dimension." :=>> (compatible-quantities ?q1 ?q2) :> (consist-of ?w ?p)) (defconcept Komplexes_Objekt (?w thing) :documentation "complex-objects are objects consist-of (bestehend-aus) others." : (exists (?p thing) (bestehend-aus ?w ?p))) (defconcept Strasse (?s Komplexes_Objekt) :documentation "Strasse is Komplexes_Object") (defconcept Strassenkoerper (?sk Flaeche) :documentation "Strassenkoerper (Road-body) is a type of Flaeche (Surfacing)") (defconcept Fahrbahn (?fb Verkehrsweg) :documentation "Fahrbahn is a type of route (Verkehrsweg).") ;; Anything consist of Strassenkoerper (Road-body) is a Strasse (road) - i.e., Strassenkoerper can only be part of Strasse. (assert (forall (?x ?sk) (=> (and (Strassenkoerper ?sk)

139

(bestehend-aus ?x ?sk)) (Strasse ?x)))) ;; Anything consist of Fahrbahn (roadway) is a Strasse (road) i.e., Fahrbahn only can be part of Strasse. (assert (forall (?x ?fb) (=> (and (Fahrbahn ?fb) (bestehend-aus ?x ?fb)) (Strasse ?x)))) ;; Every Strasse (road) must have at least one Strassenkoerper (surfacing). (assert (forall (?s) (=> (Strasse ?s) (exists (?sk Strassenkoerper) (bestehend-aus ?s ?sk)) ))) ;; Every Strasse (road) must have at least one Fahrbahn. (assert (forall (?s) (=> (Strasse ?s) (exists (?fb Fahrbahn) (bestehend-aus ?s ?fb)) ))) ;; Strasse (road) can be only composed of Strassenkoerper (Surfacing) and roadway. (assert (forall (?s ?x) (=> (and (Strasse ?s) (bestehend-aus ?s ?x)) (or (Strassenkoerper ?x) (Fahrbahn ?x))))) (defconcept Schienenbahn (?sb Komplexes_Objekt) :documentation "Schienenbahn is Komplexes_Object.") ;; Schienenbahn and Strasse are disjoint concepts. (assert (forall (?x Schienenbahn) (not (Strasse ?x)))) (defconcept Bahnkoerper (?bk Flaeche)) ;; Bahnkoerper and Strassenkoerper are disjoint concepts (assert (forall (?x Bahnkoerper) (not (Strassenkoerper ?x)))) (defconcept Bahnstrecke (?bs Verkehrsweg)) ;; Bahnstrecke and Fahrbahn are disjoint concepts (assert (forall (?x Bahnstrecke) (not (Fahrbahn ?x)))) ;; Anything composed of railway-body is a railway - i.e., railway-body can only be part of railway. (assert (forall (?x ?bk) (=> (and (Bahnkoerper ?bk) (bestehend-aus ?x ?bk) ) (Schienenbahn ?x))))

140

APPENDIX A

;; Anything composed of railway-line is a railway - i.e., railway-line can only be part of railway. (assert (forall (?x ?bs) (=> (and (Bahnstrecke ?bs) (bestehend-aus ?x ?bs) ) (Schienenbahn ?x)))) ;; Every railway must have at least a railway-body. (assert (forall (?sb) (=> (Schienenbahn ?sb) (exists (?bk Bahnkoerper) (bestehend-aus ?sb ?bk))))) ;; Every railway must have at least a Bahnstrecke. (assert (forall (?sb) (=> (Schienenbahn ?sb) (exists (?bs Bahnstrecke) (bestehend-aus ?sb ?bs)) ))) ;; Railway can be only composed of railway-line and railwaybody. (assert (forall (?sb ?x) (=> (and (Schienenbahn ?sb) (bestehend-aus ?sb ?x)) (or (Bahnkoerper ?x) (Bahnstrecke ?x)) ))) (defconcept Begrenzungslinie (?bl Thing) :documentation "Begrenzungslinie (boundry line) bounds Verkehrsweg (Route).") (defrelation begrenzt ((?v Verkehrsweg)(?bl Begrenzungslinie)) :documentation "Verkehrsweg is bounded by (wird begrenzt) side lines (Begrenzungslinie)." :=>> (cardinality begrenzt ?v 2)) ;; NOTE: In the next release of powerloom ’cardinality’ changes to ’range-cardinality’ - Anyway, its semantics (consequent reaction of powerloom) is not implemented, yet! (deffunction grenze ((?bl Begrenzungslinie)):->(?v Verkehrsweg) :documentation "Begrenzungslinie bounds (grenze) a route (Verkehrsweg). It is the inverse of begrenzt (is bound by) relation" :=>> (begrenzt ?v ?bl)) (deffunction abstand ((?x Thing)(?y Thing)) :-> (?q Constant_Quantity) :documentation "abstand is a function defined for two things and its result is a quantity with dimension of length. It is 141

commutative." :=>> (and (= (quantity-dimension ?q) length) (= (abstand ?x ?y)(abstand ?y ?x)))) (deffunction breite ((?x Thing)) :-> (?q Constant_Quantity) :documentation "breite (width) is a function defined for any thing and gives a quantity with quantity-dimension of length." :=> (= (quantity-dimension ?q) length)) ;; ’Verkehrsweg Klass 6’ is a type of Verkehrsweg (Route) with a width (breite) greater than 6 meter. (assert (Constant_Quantity Meter6)) (assert (unit-of Meter6 Meter)) (assert (val Meter6 6)) (defconcept Verkehrsweg_Klass6 (?x Verkehrsweg) :(greater-quantity Meter6 (breite ?x))) ;; ’Verkehrsweg Klass 6’ is a type of Verkehrsweg (Route) with a width (breite) greater than 6 meter. (assert (Constant_Quantity Meter9)) (assert (unit-of Meter9 Meter)) (assert (val Meter9 9)) (defconcept Verkehrsweg_Klass9 (?x Verkehrsweg) :(greater-quantity Meter9 (breite ?x))) ;; ’Verkehrsweg Klass 12’ is a type of Verkehrsweg (Route) with a width (breite) greater than 12 meter. (assert (Constant_Quantity Meter12)) (assert (unit-of Meter12 Meter)) (assert (val Meter9 12)) (defconcept Verkehrsweg_Klass12 (?x Verkehrsweg) :(greater-quantity Meter12 (breite ?x))) ;; ’Verkehrsweg Klass 15’ is a type of Verkehrsweg (Route) with a width (breite) greater than 15 meter. (assert (Constant_Quantity Meter15)) (assert (unit-of Meter15 Meter)) (assert (val Meter15 15)) (defconcept Verkehrsweg_Klass15 (?x Verkehrsweg) :(greater-quantity Meter15 (breite ?x))) ;; ’Verkehrsweg Klass 18’ is a type of Verkehrsweg (Route) with a width (breite) greater than 18 meter. (assert (Constant_Quantity Meter18)) (assert (unit-of Meter18 Meter)) (assert (val Meter15 18))

142

APPENDIX A

(defconcept Verkehrsweg_Klass18 (?x Verkehrsweg) :(greater-quantity Meter18 (breite ?x)))

APPENDIX A.10. Part of the ontology of spatial features extracted from GDF.

The following is a set of logical axioms extracted from the GDF standard (Ver.3). The GDF standard is mainly concerned with presenting a conceptual schema (or conceptual model) for representing geographic data related to the road network. It also specifies what data should be captured for representing a feature. GDF lacks specification of features - i.e., what the features are. (in-package "STELLA") (defmodule "GDF" :includes "AGGREGATION") (in-module "GDF") (in-dialect KIF) (add-trace :CLASSIFIER-INFERENCES) ;; ;; Overall Data Model; ;; Section 4.5 and Figure 4.2 ;; (defconcept Feature_Theme (?ft Thing)) (deffunction theme-name ((?ft Feature_Theme)):-> (?fn String)) (deffunction theme-code ((?ft Feature_Theme)) :-> (?fc Integer)) ;; Everything that has a theme-name and a theme-code is a Feature_Theme. (assert (forall (?ft) ( (Feature_Theme ?ft) (and (exists (?s String)(theme-name ?ft ?s)) (exists (?i Integer) (theme-code ?ft ?i)) )))) (defconcept Feature_Class (?fc thing)) (deffunction feature-name ((?fc Feature_Class)) :-> (?fn String)) (deffunction feature-code ((?fc Feature_Class)) :-> (?fc Integer)) ;; Everything that has a feature-name and a feature-code is a Feature_Class (assert (forall (?fc) ( (Feature_Class ?fc)

143

(and (exists (?s String) (feature-name ?fc ?s)) (exists (?i Integer) (feature-code ?fc ?i)) )))) (deffunction belongs-to-theme ((?f Feature_Class)) :-> (?ft Feature_Theme)) ;; Every Feature_Class belongs to at least one Feature_Theme (assert (forall (?x) (=> (Feature_Class ?x) (exists (?y Feature_Theme) (belongs-to-theme ?x ?y)) ))) (defconcept Feature (?f Thing)) ;; The relation belongs-to-feature-class relates every subconcept of Feature to one Feature_Class. Note that the relation is between subconcepts of Feature and instances of Feature_Class. (deffunction belongs-to-feature-class (?f) :-> (?fc Feature_Class) :=> (subrelation ?f Feature)) ;; Every subconcept of Feature belongs to at least one Feature_Class (assert (forall (?f) (=> (subrelation ?f Feature) (exists (?fc Feature_Class) (belongs-to-feature-class ?f ?fc)) ))) (defconcept Attribute (?a Thing)) (defrelation carries-attribute ((?f Feature)(?a Attribute))) (defconcept Simple_Feature (?sf Feature)) (defconcept Complex_Feature (?cf Feature)) ;; A Feature is either complex or simple (in other word, ;; Complex Feature and Simple Feature are disjoint). (assert (forall

(?sf Simple_Feature) (not (Complex_Feature ?sf))))

(assert (forall

(?cf Complex_Feature) (not (Simple_Feature ?cf))))

;; ;; ;; ;; ;;

144

The fact that every complex feature should contain at least one feature(complex or simple) is not stated here, because it is not obeyed in the standard itself! e.g., Road should not necessarily contain any simple featyre according to Figure 5.2.

APPENDIX A

(defconcept Spatial_Dimension (?sd Thing) :(member-of ?sd (setof zeroD oneD twoD))) (assert (Spatial_Dimension zeroD)) (assert (Spatial_Dimension oneD)) (assert (Spatial_Dimension twoD)) (deffunction representation-dimension ((?sf Simple_Feature)) :-> (?sd Spatial_Dimension)) (assert (forall (?sf Simple_Feature) (exists (?d Spatial_Dimension) (representation-dimension ?sf ?d) ))) (defconcept Point_Feature (?p Simple_Feature) : (representation-dimension ?p zeroD)) (defconcept Line_Feature (?l Simple_Feature) : (representation-dimension ?l oneD)) (defconcept Area_Feature (?a Simple_Feature) : (representation-dimension ?a twoD)) (defrelation bounds ((?x Feature)(?y Feature)) :=>> (=> (Simple_Feature ?x) (and (representation-dimension ?x zeroD) (representation-dimension ?y oneD) )) :=>> (=> (Complex_Feature ?x) (Complex_Feature ?y) )) (deffunction starts-at ((?x Feature)) :-> (?y Feature) :=>> (=> (Simple_Feature ?x) (and (representation-dimension ?x oneD) (representation-dimension ?y zeroD) )) :=>> (=> (Complex_Feature ?x) (Complex_Feature ?y)) :=>> (bounds ?y ?x)) (deffunction ends-at ((?x Feature)) :-> (?y Feature) :=> (=> (Simple_Feature ?x) (and (representation-dimension ?x oneD) (representation-dimension ?y zeroD) )) :=> (=> (Complex_Feature ?x) (Complex_Feature ?y)) :=> (bounds ?y ?x)) (defrelation is-bounded-by ((?x Feature)(?y Feature)) :=> (=> (Simple_Feature ?x) (and (representation-dimension ?x twoD) 145

(representation-dimension ?y oneD))) :=> (=> (Complex_Feature ?x) (Complex_Feature ?y) )) (deffunction left-bounds ((?x Feature)) :-> (?y Feature) :=> (and (representation-dimension ?x oneD) (representation-dimension ?z twoD)) :=> (is-bounded-by ?y ?x)) (deffunction right-bounds ((?x Feature)) :-> (?y Feature) :=> (and (representation-dimension ?x oneD) (representation-dimension ?z twoD)) :=> (is-bounded-by ?y ?x)) ;; ;; Feature Theme Definitions;; ;; Apendix A1 of GDF v.3 ;; (assert (Feature_Theme roads_and_ferries)) (assert (theme-name roads_and_ferries "Roads and Ferries")) (assert (theme-code roads_and_ferries 41)) (assert (Feature_Theme administrative_areas)) (assert (theme-name administrative_areas "Administrative Areas")) (assert (theme-code administrative_areas 11)) (assert (Feature_Theme settlement_and_named_areas)) (assert (theme-name settlement_and_named_areas "Settlement and Named Areas")) (assert (theme-code settlement_and_named_areas 31)) (assert (Feature_Theme railways)) (assert (theme-name railways "Railways")) (assert (theme-code railways 42)) ;; ;; Feature Class Definitions;; ;; Apendix A1 of GDF V.3 ;; (assert (Feature_Class road_element_class)) (assert (feature-name road_element_class "Road Element")) (assert (feature-code road_element_class 4110)) (assert (belongs-to-theme road_element_class roads_and_ferries)) (assert (Feature_Class junction_class)) (assert (feature-name junction_class "Junction")) (assert (feature-code junction_class 4120)) 146

APPENDIX A

(assert (belongs-to-theme junction_class roads_and_ferries)) (assert (Feature_Class ferry_connection_class)) (assert (feature-name ferry_connection_class "Ferry Connection")) (assert (feature-code ferry_connection_class 4130)) (assert (belongs-to-theme ferry_connection_class roads_and_ferries)) (assert (Feature_Class enclosed_traffic_area_class )) (assert (feature-name enclosed_traffic_area_class "Enclosed Traffic Area")) (assert (feature-code enclosed_traffic_area_class 4135)) (assert (belongs-to-theme enclosed_traffic_area_class roads_and_ferries)) (assert (Feature_Class road_class)) (assert (feature-name road_class "Road")) (assert (feature-code road_class 4140)) (assert (belongs-to-theme road_class roads_and_ferries)) (assert (Feature_Class intersection_class)) (assert (feature-name intersection_class "Intersection")) (assert (feature-code intersection_class 4145)) (assert (belongs-to-theme intersection_class roads_and_ferries)) (assert (Feature_Class ferry_class)) (assert (feature-name ferry_class "Ferry")) (assert (feature-code ferry_class 4150)) (assert (belongs-to-theme ferry_class roads_and_ferries)) (assert (Feature_Class address_area)) (assert (feature-name address_area "Address Area")) (assert (feature-code address_area 4160)) (assert (belongs-to-theme address_area roads_and_ferries)) (assert (Feature_Class address_area_boundary_element_class))

147

(assert (feature-name address_area_boundary_element_class "Address Area Boundary Element")) (assert (feature-code address_area_boundary_element_class 4165)) (assert (belongs-to-theme address_are_boundary_element_classa roads_and_ferries)) (assert (Feature_Class aggregated_aay_class )) (assert (feature-name aggregated_way_class "Aggregated Way")) (assert (feature-code aggregated_way_class 4170)) (assert (belongs-to-theme aggregated_way_class roads_and_ferries)) ;; ;; Feature Definitions ;; ;; Section 5.2 - Roads and Ferries ;; Ten features are defined in this section at ;; first axioms extracted from Figure 5.2 ;; (defconcept Junction (?j Point_Feature)) (defconcept Road_Element (?re Line_Feature) :=> (and (exists (?sj Junction) (starts-at ?re ?sj)) (exists (?ej Junction) (ends-at ?re ?ej)) )) ;; It is not explicitly mentioned in the standard if it ;; belongs to either of the three categories of point, line ;; or area feature. ;; It is defined as area because of its name. (defconcept Enclosed_Traffic_Area (?eta Area_Feature)) (defconcept Ferry_Connection (?fc Line_Feature) :=>> (and (exists (?sj Junction) (starts-at ?fc ?sj)) (exists (?ej Junction) (ends-at ?fc ?ej)) )) (defconcept Intersection (?i Complex_Feature)) ;; Junctions can only be component of an Intersection, and only one Intersection. (assert (forall ((?j Junction) (?cf Complex_Feature)) (=> (component-of ?j ?cf) (and (intersection ?y) (cardinality component-of ?x 1) )))) ;; Note: cardinality will change to range-cardinality in the

148

APPENDIX A

new release of PowerLoom. Its semantics is not implemented, anyway. (defconcept Road(?r Complex_Feature) :=>> (and (exists (?si Intersection) (starts-at ?r ?si)) (exists (?ei Intersection) (ends-at ?r ?ei)) )) (defconcept Ferry (?f Complex_Feature) :=>> (and (exists (?si Intersection) (starts-at ?f ?si)) (exists (?ei Intersection) (ends-at ?f ?ei)) )) (defconcept Aggregated_Way (?aw Complex_Feature)) (defconcept Address_Area_Boundary_Element (?aabe Line_Feature) :=>> (and (exists (?sj Junction) (starts-at ?aabe ?sj)) (exists (?ej Junction) (ends-at ?aabe ?ej)) )) (defconcept Address_Area (?aa Area_Feature) :=>> (and (exists (?aabe Address_Area_Boundary_Element) (is-bounded-by ?aa ?aabe) ))) ;; A Road Element is started and ended only by Junctions (assert(forall (?re Road_Element) (=> (bounds ?j ?re) (Junction ?j) ))) ;; A Ferry Connection is started and ended only by Junctions (assert(forall (?fc Ferry_Connection) (=> (bounds ?j ?fc) (Junction ?j) ))) ;; Aggregated Way contains only Road_Element or Junction. (assert(forall

((?aw Aggregated_Way) (?sf Simple_Feature)) (=> ( ?aw ?sf) (or (Road_Element ?sf) (Junction ?sf)) )))

;; At least one Road Element should be part of Aggregated Way. (assert(forall (?aw Aggregated_Way) (exists (?re Road_Element) (consist-of ?aw ?re) ))) ;; Junction bounds only Road Element or Ferry Connection. (assert(forall ((?j Junction) (?lf Line_Feature)) (=> (bounds ?j ?lf) (or (Road_Element ?lf)

149

(Ferry_Connection ?lf)) ))) ;; Every Junction should at least bound some feature (the ;; type of such feature is specified in the previous ;; assertion). (assert (forall (?j Junction) (exists (?lf Line_Feature) (bounds ?j ?lf)) )) ;; Road contains only Road_Element (assert (forall (?r Road) (=> (consist-of ?r ?x) (Road_Element ?x) ))) ;; Ferry contains only Ferry_connection (assert (forall (?f Ferry) (=> (consist-of ?f ?x) (Ferry_Connection ?x) ))) ;; Ferry connection is the only part of Ferry (assert (forall (?fc Ferry_Connection) (=> (exists (?x)(component-of ?fc ?x)) (Ferry ?x) ))) ;; Intersection bounds only Road or Ferry. (assert(forall ((?i Intersection) (?cf Complex_Feature)) (=> (bounds ?i ?cf) (or (Road ?cf) (Ferry ?cf)) ))) ;; Every Intersection should at least bound some feature. (assert (forall (?i Intersection) (exists (?cf Complex_Feature) (bounds ?i ?cf)))) ;; At least one Junction should be part of Intersection. (assert (forall (?i Intersection) (exists (?j Junction) (consist-of ?i ?j)))) ;; A Junction can be a component of only one Intersection. (assert (forall ((?j Junction) (?i Intersection)) (=> (component-of ?j ?i) (cardinality component-of ?j 1)))) ;; cardinality will change to range-cardinality in next release of PowerLoom. Its semantics is not implemented, yet. (assert(forall ((?re Road_Element)(?r Road)) (=> (component-of ?re ?r) (cardinality component-of ?re 1))))

150

APPENDIX A

(assert(forall

((?re Road_Element)(?i Intersection)) (=> (component-of ?re ?i) (cardinality component-of ?re 1))))

(assert (belongs-to-feature-class Road_Element road_element_class)) (assert (belongs-to-feature-class Road road_class)) (assert (belongs-to-feature-class Junction junction_class)) (assert (belongs-to-feature-class Intersection intersection_class)) (assert (belongs-to-feature-class Ferry ferry_class)) (assert (belongs-to-feature-class Ferry_Connection ferry_connection_class)) (assert (belongs-to-feature-class Aggregated_Way aggregated_way_class)) (assert (belongs-to-feature-class Address_Area_Boundary_Element address_area_boundary_element_class)) (assert (belongs-to-feature-class Address_Area address_area_class)) (assert (belongs-to-feature-class Enclosed_Traffic_Area enclosed_traffic_area_class)) ;; More axioms extracted from the text. (defconcept Vehicle_Type (?vt Thing)) (assert (Vehicle_Type wheeled_vehicle)) (defrelation moves-on ((?vt Vehicle_Type) (?f Feature))) (deffunction transports ((?f Feature)) :-> (?vt Vehicle_Type) : (moves-on ?vt ?f)) ;; structured-traffic unstructured-traffic (defconcept Traffic_Flow (?tf Thing) :(member-of ?tf (setof structured unstructured))) (assert (Traffic_Flow structured)) (assert (Traffic_Flow unstructured)) (deffunction traffic ((?f Feature)) :-> (?tf Traffic_Flow) :=>> (transports ?f wheeled_vehicle)) ;; Wheeled vehicles move on the Road Elements (assert (forall (?re Road_Element) (transports ?re wheeled_vehicle))) ;; Road Elements transport structured traffic (assert (forall (?re Road_Element) (traffic ?re structured)))

151

;; Enclosed Traffic Areas transport wheeled vehicles (assert (forall (?eta Enclosed_Traffic_Area) (transports ?eta wheeled_vehicle) )) ;; Enclosed Traffic Areas transport unstructured traffic (assert (forall (?eta Enclosed_Traffic_Area) (traffic ?eta unstructured) )) (deffunction conveys ((?f Feature)) :-> (?vt Vehicle_Type)) ;; Ferry Connections conveys wheeled vehicles (assert (forall (?fc Ferry_Connection) (conveys ?fc wheeled_vehicle) )) (defrelation connected (

(?sf1 Simple_Feature) (?sf2 Simple_Feature)) :documentation "A junction connects (physical connection) its adjoining Road Elements and Ferry Connections." : (and (exists (?j Junction) (and (bounds ?j ?sf1) (bounds ?j ?sf2))) (or (Road_Element ?sf2) (Ferry_Connection ?sf2)) (or (Road_Element ?sf1) (Ferry_Connection ?sf1)) )) (deffunction valency ((?j Junction)) :-> (?i Integer) :documentation "The number of Road Elements or Ferry Connections joining at a junction")

152

APPENDIX B

Implementation

This appendix contains details of the implementation of the prototype for the MIGI project.

APPENDIX B.1. CORBA IDL definitions for the Ontology Managment module. module OntoLibModule{ struct OntologyStruct { string ontologyName; string documentation; }; exception OntoException { string explanation; }; struct TermStruct { string termName; string verbalDefinition; OntologyStruct name_space; }; struct SimilarityRelationStruct { TermStruct term1; TermStruct term2; octet similarityType; }; struct Result { string message;

153

boolean error; }; interface OntoLib { Result SimilarityEvaluation ( in TermStruct term1, in TermStruct term2, out octet similarityRel ); Result MergeOntologies ( in OntologyStruct onto1, in OntologyStruct onto2 ); Result AddOntology ( in OntologyStruct onto ) raises (OntoException); Result RegisterCommunity (); Result NewTermRequest ( in TermStruct newterm ) raises (OntoException); Result NewAxiomRequest ( in string newaxiom ); }; };

APPENDIX B.2. The Class Definition of the Schema Integration Module public class Schema { public static final int equalSimilarity = 1, generalizationSimilarity = 2, specializationSimilarity = 3, overlapSimilarity = 4, disjointSimilarity = 5; public String schemaName; public String communityName; private int numberOfClasses; public ClassElement schemaClass[]; public int getNumberOfClasses() { return numberOfClasses;

154

APPENDIX B

} /** Adds a class to list of sclasses in a schema */ public void AddClass(ClassElement newClassElem) { } /** Integrates two schemata. This method implements the algotithm in /* Table 6.2 and Table 6.3. */ public void IntegrateSchema(Schema Sq, Schema Sp) { } /** This method finds a synonym class of c which corresponds to /* schema q. That is done by finding similarity relation ‘Equal’ in the /* underlying ontologies. */ public ClassElement EqualClass(ClassElement c, Schema q) { } /** This method finds a synonym attribute of a which belongs /* class c. That is done by finding similarity relation ‘Equal’ in the /* underlying ontologies. */ public ClassElement EqualAttr(AttributeElement a, ClassElement c) { } /** This method finds immediate generalizations of class c /* which corresponds to schema q. */ void Immediate_Generalization(ClassElement c, Schema q, ClassElement[] result){ } /** This method moves all super classes before their /* subclasses in a list of classes. */ public void Sort(ClassElement[] cl, int n){ } /** Sorts all the classes in a schema. */ public void Sort(){ }

155

} public class ClassElement { public AttributeElement attr[]; private int numberOfAttr = 0; public ClassElement supperclass[]; public ClassElement correspond; public Schema belongTo; public int numberOfSupperclasses = 0; public String className[]; public int numberOfNames = 0; public TermStruct term; /** Creates a new class with the specified name. */ public ClassElement(String className) { } /** Adds a new attribute to a class. */ public void AddNewAttr(AttributeElement attr) { } /** Sets a correspondance to the specified class in the local schema. /* Also sets its name with the name of the local class. Note: by /* correspondance we refer to the relation between a class in the global /* schema and its original class in a local schema. */ public Void SetCorrClass(ClassElement corrClass) { } /** Returns the corresponding class in the local schema. */ public ClassElement GetCorrClass() { } /** Removes redundent supperclass relation. This method is used /* whenever there is possibility that we generate a redundant /* superclass relation - see Figure 5.10 */

156

APPENDIX B

public void BreakSupperclassTriangle() { } /** This methos adds a new supperclass to the list of superclasses /* of class c. */ public void SetSupperclass(ClassElement c) { } /** Removes a class from list of supperclasses public void RemoveSuperclass(ClassElement c){ } /** Returns true if c is in the list of superclasses */ public boolean Superclass(ClassElement c){ } } class Term { public String termName; public String verbalDefinition; public Ontology nameSpace; public LogicalAxiom LogicalDefinition; public Term(Ontology _ontology, String _termName, String _verbalDefinition)

{

} public void InsertTerm () throws OntoException { } public boolean exists () throws OntoException{ } class Ontology{ public String ontologyName; public String documentation; public Ontology(String ontologyName, String documentation) { } public void InsertOntology () throws OntoException

{

} public boolean exists () throws OntoException {

157

} } public class Community { private int name; public Ontology commited_onto[]; public void NewTermReq() { } public void NewAxiomReq() { } public void RegisterCommunity() {] } }

APPENDIX B.3. Object Type Definitions for Oracle database (Ver 9i)

Object types used in the Ontology Librery Management are as follows: CREATE OR REPLACE TYPE Term_typ AS OBJECT ( termname CHARACTER VARYING(50), verbalDefinition CHARACTER VARYING(300), nameSpace REF Ontology_typ) CREATE OR REPLACE TYPE Ontology_typ AS OBJECT ( ontologyName CHARACTER VARYING(50), documentation CHARACTER VARYING(300) ) CREATE OR REPLACE TYPE LogicalAxiom_typ AS OBJECT ( term REF Term_typ, axiomDL CHARACTER VARYING(200) ) CREATE OR REPLACE TYPE SimilarityRelation_typ AS OBJECT ( term1 REF Term_typ, term2 REF Term_typ, similarityType number ) CREATE OR REPLACE TYPE onto_nested AS Table of REF ontology_typ CREATE OR REPLACE TYPE Community_typ AS OBJECT ( communityName CHARACTER VARYING(50), commitedOntologies onto_nested)

Object types used in the Schema Integration module are as follows. Note that, to keep the two model independent, the relations between schema elements and terms as well as schema and community are preserved by foreign key rather than an internal reference (REF). In order to keep the two modules dependently

158

APPENDIX B

running on two different databases. The two relations are shown by dashed lines in Figure 6.7. CREATE OR REPLACE TYPE name_nested AS Table of CHARACTER VARYING(50); CREATE OR REPLACE TYPE superclass_nested AS Table of REF ClassElm_typ; CREATE OR REPLACE TYPE localclass_nested AS Table of REF ClassElm_typ; CREATE OR REPLACE TYPE ClassElm_typ AS OBJECT ( ontologyName CHARACTER VARYING(50), termname CHARACTER VARYING(50), className name_nested, superclass superclass_nested, correspond localclass_nested); CREATE OR REPLACE TYPE attribute_nested AS Table of REF AttributeElm_typ; CREATE OR REPLACE TYPE AttributeElm_typ AS OBJECT ( ontologyName CHARACTER VARYING(50), termName CHARACTER VARYING(50), attrName CHARACTER VARYING(50), similarTo attribute_nested, correspond REF AttributeElm_typ, attrType CHARACTER VARYING(50)); CREATE OR REPLACE TYPE class_nested AS Table of REF ClassElm_typ; CREATE OR REPLACE TYPE Schema_typ AS OBJECT ( schemaName CHARACTER VARYING(50), communityName CHARACTER VARYING(50), contains class_nested);

159

160

References

[Alonso and Abbadi, 1994] Alonso, G. and Abbadi, A. E. (1994). Cooperative modeling in applied geographic research. International Jornal of Intelligent and Cooperative Information Systems, 3(1):83–102.

[ATKIS 1998] ATKIS (1998). Amtliches Topographisch-Kartographisches Informationssystem - objektartenkatalog (in German). Arbeitsgemeinschaft der Vermessungsverwaltungen (AdV), http://www.atkis.de.

[Atzeni et al., 1999] Atzeni, P., Ceri, S., Paraboschi, S., and riccardo Torlone (1999). Database Systems: Concepts Languages and Architectures. Mc Graw - Hill.

[Baader et al., 2003] Baader, F., Calvanese, D., Deborah, McGuinness, Nardi, D., and PatelSchneider, P., editors (2003). The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press.

[Batini et al., 1992] Batini, C., Ceri, S., and Navathe, S. B. (1992). Conceptual Database design: an Entity-relationship Approach. Benjamin/Cummings.

[Bayardo et al., 1997] Bayardo, R. J., Bohrer, W., Brice, R., Cichochi, A., Fowler, J., Helal, A., Kashyap, V., Ksiezk, T., Martin, G., Nodine, M., Rashid, M., Rusinkiewicz, M., Shea, R., Unnikrishnan, C., Unruh, A., and woelk, D. (1997). Infosleuth: Agentbased semantic integration of information in open and dynamic environments. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 195–206. 161

[Behm, 2001] Behm, A. (2001). Migrating Relational Databases to Object Technology. PhD thesis, University of Zurich.

[Benjamins and Fensel, 1998] Benjamins, V. R. and Fensel, D. (1998). The ontological engineering initiative (KA)2. In Guarino, N., editor, Formal Ontology in Information Systems, pages 287–301. IOS Press, Trento-Italy.

[Bergamaschi et al., 1998] Bergamaschi, S., Castano, S., di Vimercati, S. D. C., Montanari, S., and Vincini, M. (1998). An intelligent approach to information integration. In Guarino, N., editor, Formal Ontology in Information Systems, pages 253–267. IOS Press.

[Berners-Lee, 1998] Berners-Lee, T. (1998). Semantic web road map. http://www.w3.org/DesignIssues/Semantic.html.

[Bishr, 1997] Bishr, Y. (1997). Semantic Aspects of Interoperable GIS. PhD thesis, ITC, The Netherlands. ITC publication number 56.

[Bishr et al., 1999] Bishr, Y. A., Pundt, H., Kuhn, W., and Radwan, M. (1999). Probing the concept of information communities - a first step toward semantic interoperability. In Goodchild, M., Egenhofer, M., Fegeas, R., and Kottman, C., editors, Interoperating Geographic Information Systems, pages 55–69. Kluwer Academic.

[Blackburn et al., 2001] Blackburn, P., de Rijke, M., and Venema, Y. (2001). Modal Logic. Cambridge University Press.

[Borgo et al., 1996] Borgo, S., Guarino, N., and Masolo, C. (1996). A pointless theory of space based on strong connection and congruence. In Aiello, L. C., Doyle, J., and Shapiro, S., editors, KR’96: Principles of Knowledge Representation and Reasoning, pages 220–229. Morgan Kaufmann.

[Brachman et al., 1991] Brachman, R. J., McGuinness, D. L., Patel-Schneider, P. F., Resnick, L. A., and Borgida, A. (1991). Living with classic: When and how to use a kl-one-like language. In Sowa, J., editor, Principles of Semantic Networks: Explorations in the representation of knowledge, pages 401–456. Morgan-Kaufmann.

[Brill, 1993] Brill, D. (1993). Loom Reference Manual Version 2.0. Information Sciences Institute, University of South California, http://www.isi.edu/isd/LOOM/documentation/LOOM-DOCS.html. 162

References

[Buehler and McKee, 1998] Buehler, K. and McKee, L. (1998). The OpenGIS Guide: Introduction to Interoperable Geoprocessing and the OpenGIS Specification. Open GIS Consortium Technical Committee, third edition.

[Buttenfield, 1993] Buttenfield, B. P. (1993). Research initiative 3, multiple representations closing report. Technical report, NCGIA National Center for Geographic Information and Analysis.

[Casati et al., 1998] Casati, R., Smith, B., and Varzi, A. C. (1998). Ontological tools for geographic representation. In Guarino, N., editor, Formal Ontology in Information Systems, pages 77–85. IOS press.

[Coenen and Visser, 1998] Coenen, F. and Visser, P. (1998). A general ontology for spatial reasoning. In Miles, R., Moulton, M., and Bramer, M., editors, Research and Development in Expert Systems XV, proceedings of ES’98, pages 44–57. Springer.

[Decker et al., 1998] Decker, S., Brickley, D., Saarela, J., and Angele, J. (1998). A query and interface service for rdf. http://www.w3.org/TandS/QL/QL98/pp/queryservice.html.

[Decker et al., 1999] Decker, S., Erdmann, M., Fensel, D., and Studer, R. (1999). ONTOBRKER: Ontology based access to distributed and semi-structured information. In Meersman, R., Tari, Z., and Scott, editors, Semantic Issues in Multimedia System, Proceedings of Eighth Working Conference on Database Semantics (DS8), IFIP TC2/WG2.6, pages 351–369. Kluwer Academic Publisher. ftp:// ftp.aifb.uni-karlsruhe.de/pub/mike/dfe/paper/rdf3.ps.

[Decker et al., 2000] Decker, S., Melnik, S., Harmelen, F. V., Fensel, D., Klein, M., Broekstra, J., Erdman, M., and Horrocks, I. (2000). The semantic web: The role of XML and RDF. IEEE Internet Computing.

[Domenig and Dittrich, 2000] Domenig, R. and Dittrich, K. R. (2000). A query based approach for integrating heterogeneous data sources. In Proc. of 9th Int’l Conf. on Information and Knowledge Management.

[Egenhofer and Herring, 1991] Egenhofer, M. J. and Herring, J. R. (1991). Categorizing binary topological relations between regions, lines and points in geographic databases. Technical report, Department of Surveying Engineering, University of Maine, Orono.

163

[Elmasri and Navathe, 2000] Elmasri, R. and Navathe, S. B. (2000). Fundamentals of Database Systems. Addison-Wesley, third edition.

[Erickson, 1997] Erickson, T. (1997). Social interaction on the net: Virtual community as participatory genre. In Nunamaker, J. F. and Sprague, R. H., editors, Proceedings of the Thirtieth Hawaii International Conference on System Science, volume 6, pages 23–30. IEEE Computer Society Press, Los Alamitos, CA. http://www.pliant.org/personal/Tom_Erickson/VC_as_Genere.html.

[European Commision, 2000a] European Commision (2000). Adopting a multiannual community programme to stimulate the development and use of european digital content on the global networks and to promote the linguistic diversity in the information society: A proposal for council decision. http://europa.eu.int/ISPO/docs/econtent/ COM2000_323_en.pdf.

[European Commision, 2000b] European Commision (2000). Commercial exploitation of europe’s public sector information: Executive summary. Technical report, Pira International Ltd., University of East Anglia and Knowledge Ltd., ftp://ftp.cordis.lu/pub/econtent/ docs/2000_1558_en.pdf.

[Farquhar et al., 1997] Farquhar, A., Fikes, R., and Rice, J. (1997). The ontolingua server: a tool for collaborative ontology construction. International Journal of Human-Computer Studies, 46:707–727. ftp://ftp.ksl.stanford.edu/pub/KSL_Report/ KSL_96_26.ps.

[Fellbaum, 1998] Fellbaum, C., editor (1998). WordNet, An Electronic Lexical Databaset. The MIT Press.

[Fensel, 2001] Fensel, D. (2001). Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer-Verlag.

[Fensel et al., 1999] Fensel, D., Angele, J., Decker, S., Erdmann, M., Schnurr, H.-P., Staab, S., Studer, R., and Witt, A. (1999). On2Broker: Semantic-based access to information sources at the WWW. In Proceedings of the World Conference on the WWW and Internet (WebNet 99), pages 366–371. ftp://ftp.aifb.uni-karlsruhe.de/ pub/mike/dfe/paper/webnet.pdf.

164

References

[Fernandez et al., 1997] Fernandez, M., Gomez-Perez, A., and natalia Juristo (1997). METHONTOLOGY: from ontological art to ontological engineering. In Workshop on Ontological Engineering AAAI’97, Stanford USA. http://delicias.dia.fi.upm.es/ miembros/ASUN/SSS97.ps.

[Fowler et al., 1999] Fowler, J., Perry, B., Nodine, M., and Bargmeyer, B. (1999). Agent-based semantic interoperability in InfoSleuth. SIGMOD Record, 28(1).

[Frank and Kuhn, 1998] Frank, A. and Kuhn, W. (1998). A specification language for interoperable GIS. In Goodchild, M. F., Egenhofer, M. J., Fegeas, R., and Kottman, K., editors, Interoperating Geographic Information Systems, pages 123–132. Kluwer.

[Garcia-Solaco et al., 1996] Garcia-Solaco, M., Saltor, F., and Castellanos, M. (1996). Semantic heterogeneity in multidatabase systems. In Bukhres, O. A. and Elmagarmid, A. K., editors, Object-oriented Multidatabase Systems: A Solution for Advanced Applications, chapter 5, pages 129–202. Printice-Hall.

[GDF 1995] GDF (1995). Geographic Data Files Standard Version 3.0 (European Standard). European Committee for Standardization (CEN/278), http:// www.ertico.com/links/gdf/gdfdoc/gdfdoc.htm.

[Gibson, 1986] Gibson, J. J. (1986). The Ecological Approach to Visual Perception. Lawrence Erlbaum Associates.

[Goh et al., 1999] Goh, C. H., Bressan, S., Madnick, S., and Siegel, M. (1999). Context interchange: New features and formalisms for the intelligent integration of information. ACM Transaction on Information Systems, 17(3):270–290.

[Gomez-Perez et al., 1996] Gomez-Perez, A., Fernandez, M., and De Vicente, A. J. (1996). Towards a method to conceptualize domain ontologies. In Workshop on Ontological Engineering, ECAI’96. http://delicias.dia.fi.upm.es/miembros/ASUN/ECAI96.ps.

[Goodchild et al., 1997] Goodchild, M. F., Egenhofer, M. J., and Fegeas, R. (1997). Interoperating giss, report of a specialist meeting held under the auspices of the varenius project. Technical report, National Center for Geographic Information Analysis (NCGIA).

165

[Grefen, 1992] Grefen, P. W. P. J. (1992). Integrity Control in Parallel Database Systems. PhD thesis, University of Twente, The Netherlands.

[Greiner et al., 2001] Greiner, R., Darken, C., and Santoso, N. I. (2001). Efficient reasoning. ACM Computing Surveys, 33(1):1–30.

[Gruber, 1993] Gruber, T. R. (1993). Towards principle for the design of ontology used for knowledge sharing. In Guarino, N. and Poli, R., editors, Formal Ontology in Conceptual Analysis and Knowledge Representation, International Workshop on Ontology. Kluwer Academic.

[Grüninger and Fox, 1995] Grüninger, M. and Fox, M. S. (1995). Methodology for design and evaluation of ontologies. In Proceedings of Workshop on Basic Ontological Issues in Knowledge Sharing, IJCAI’95. http://www.eil.utoronto.ca/enterprise-modelling/ papers/gruninger-ijcai95.pdf.

[Guarino, 1998a] Guarino, N. (1998). Formal ontology and information systems. In Guarino, N., editor, Formal Ontology in Information Systems, Proceedings of FOIS’98, pages 3–17, Trento, Italy. IOS Press, Amsterdam.

[Guarino, 1998b] Guarino, N., editor (1998). Formal Ontology in Information Systems. IOS Press, Amsterdam.

[Guarino and Welty, 2000a] Guarino, N. and Welty, C. (2000). A formal ontology of properties. In Dieng, R., editor, Proc. of 12th Int’l Conf. on Knowledge Engineering and Knowledge Management. Springer Verlag.

[Guarino and Welty, 2002] Guarino, N. and Welty, C. (2002). Evaluating ontological decisions with OntoClean. Communications of the ACM, 45(2):61–65.

[Hakimpour and Geppert, 2001] Hakimpour, F. and Geppert, A. (2001). Resolving semantic heterogeneity in schema integration: An ontology base approach. In Welty, C. and Smith, B., editors, Formal Ontology in Information Systems: Collected Papers from the Second Int’l Conf., FOIS’01, pages 297–308. ACM Press.

[Hakimpour and Geppert, 2002] Hakimpour, F. and Geppert, A. (2002). Global schema generation using formal ontologies. In Spaccapietra, S., March, S. T., and Kambayashi, Y., editors, Proc.

166

References

of the 21st Int’l Conf. on Conceptual Modeling (ER2002), LNCS 2503, pages 307–320. Springer Verlag.

[Hammer and McLeod, 1993] Hammer, J. and McLeod, D. (1993). An approach to resolving semantic heterogeneity in a federation of autonomous, heterogeneous database systems. Journal for Intelligent and Cooperative Information Systems, 2(1):51–83. http:// www-db.stanford.edu/pub/hammer/usc/ijicis-93.ps.

[Handschuh and Staab, 2002] Handschuh, S. and Staab, S. (2002). Authoring and annotation of web pages in cream. In In The Eleventh International World Wide Web Conference (WWW2002).

[Heflin and Hendler, 2000] Heflin, J. and Hendler, J. (2000). Semantic interoperability on the web. In Extreme Markup Languages 2000. http://www.cs.umd.edu/projects/plus/SHOE/ pubs/extreme2000.pdf.

[Hillery, 1955] Hillery, G. A. (1955). Definitions of communities: Areas of agreement. Rural Sociology, 20:111–123.

[Horroks et al., 2000] Horroks, I., Fensel, D., Broekstra, J., Decker, S., Erdmann, M., Goble, C., van Harmelen, F., Klein, M., Staab, S., Studer, R., and Motta, E. (2000). The ontology inference layer OIL. http://www.cs.vu.nl/ dieter/oil/Tr/oil.pdf.

[Hudak, 2000] Hudak, P. (2000). The Haskell School of Expression: Learning Functional Programming through Multimedia. Cambridge University Press.

[Jones, 1998] Jones, D. (1998). Developing shared ontologies in multi-agent systems. In ECAI’98 Workshop on Intelligent Information Integration, Brighton, U.K.

[Karp et al., 1999] Karp, P. D., Chaudhri, V. K., and Thomere, J. (1999). XOL: An XML-based ontology exchange language. http://xml.coverpages.org/xol-03.html.

[Kashyap and Sheth, 1998] Kashyap, V. and Sheth, A. (1998). Semantic heterogeneity in global information systems: The role of metadata, context and ontologies. In Papazoglou, M. P. and Schlageter, G., editors, Cooperative Information Systems: Current Trends and Directions, pages 139–178. Academic Press Ltd.

167

[Keller, 2000] Keller, S. F. (2000). INTERLIS Version 2.0, Reference Manual. Federal Office of Topography, Seftigenstrasse 264, CH-3084 Wabern, Switzerland. http:// www.interlis.ch/refdocs/iliv2-refman-09-04e.zip.

[KIF, 1998] Knowledge Interchange Format: Draft Proposal American National Standards, Stanford Logic Group. http://logic..stanford.edu/kif/depans.html.

[Kifer et al., 1995] Kifer, M., Lausen, G., and Wu, J. (1995). Logical foundations of object-oriented and frame-based languages. Journal of the (ACM) Association for Computing Machinery, 42(4):741–843.

[Kim et al., 1993] Kim, W., Choi, I., Gala, S., and Scheevel, M. (1993). On resolving schematic heterogeneity in multidatabase systems. Distributed and Parallel Databases, 1(3):251–277.

[Kim and Seo, 1991] Kim, W. and Seo, J. (1991). Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer, 24(12):12–18.

[Kuhn 1994] Kuhn, W. (1994). Defining semantics for spatial data transfers. In Waugh, T. C. and Healey, R. G., editors, Proc. of the Sixth Int’l. Symp. on Spatial Data Handling SDH94, pages 973–987.

[Kuhn, 2001] Kuhn, W. (2001). Ontologies in support of activities in geographic space. International Journal of Geographic Information Science, 15(7):613–631.

[Kuijpers et al., 1995] Kuijpers, B., Paredaens, J., and Vandeurzen, L. (1995). Semantics in spatial databases. In Thalheim, B. and Libkin, L., editors, Semantics in Databases, Lecture Notes in Computer Science 1358. Springer Verlag.

[Lakoff, 1987] Lakoff, G. (1987). Women, Fire, and Dangerous Things: What Categoreis Reveal about the Mind.. University of Chicago Press.

[Larson et al., 1989] Larson, J. A., Navathe, S. B., and Elmasri, R. (1989). A theory of attribute equivalence in database with application to schema integration. IEEE Transactions on Software Engineering, 15(4):449–463.

168

References

[Lenat, 1995] Lenat, D. B. (1995). CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33–38. http://www.cs.umbc.edu/471/ papers/cyc95.pdf.

[Lopez, 1999] Lopez, F. (1999). Overview of methodologies for building ontologies. In Proceedings of the IJCAI-99 Workshop on Ontologies and Problem-Solving Methods: Lessons Learned and Future Trends. CEUR Publication.

[Lord et al. 2003] Lord. P. W., Stevens R. D., Brass A., and Goble C. A. (2003). Semantic Similarity Measures as Tools for Exploring the Gene Ontology. In Proceedings of the eighth Pacific Symposium on Biocomputing, pages 601-612.

[Loucopoulos, 1992] Loucopoulos, P. (1992). Conceptual modeling. In Loucopoulos, P. and Zicari, R., editors, Conceptual Modeling, Databases, and CASE : an Integrated View of Information system development, chapter Introduction, pages 1–26. John Wiley & Sons, Inc.

[MacGregor et al., 1997] MacGregor, R. M., Chalupsky, H., and Melz, E. R. (1997). PowerLoom Manual. University of Southern California, http://www.isi.edu/isd/LOOM/PowerLoom/documentation/manual.pdf.

[Madhavan et al., 2001] Madhavan, J., Bernstein, P. A., and Rahm, E. (2001). Generic schema matching with cupid. In Apers, P. M. G., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., and Snodgrass, R. T., editors, VLDB 2001, Proceedings of 27th International Conference on Very Large Databases, September 11-14, 2001, Roma, Italy, pages 49–58. Morgan Kaufmann.

[Marcotty and Ledgard, 1987] Marcotty, M. and Ledgard, H. (1987). The World of Programming Languages. Springer-Verlag.

[May, 2000] May, W. (2000). How to Write Flogic Program in FLORID: A Tutorial for Database Language Flogic. Institute for Computer Science, University of Friburg, Germany, http://www.informatik.uni-freiburg.de/ dbis/florid/tutorial.ps.gz, version 3 (floxml) edition.

[Mena et al., 1998] Mena, E., Kashyap, V., Illarramendi, A., and Sheth, A. (1998). Domain specific ontologies for semantic information brokering on the global information infra-

169

structure. In Guarino, N., editor, Formal Ontology in Information Systems. IOS press.

[Miller et al., 2000] Miller, R. J., Haas, L. M., and Hernández, M. A. (2000). Schema mapping as query discovery. In Abbadi, A. E., Brodie, M. L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., and Whang, K.-Y., editors, VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10-14, 2000, Cairo, Egypt, pages 77–88. Morgan Kaufmann.

[Molenaar et al., 1994] Molenaar, M., Kufoniyi, O., and Bouloucos, T. (1994). Modeling topological relationships in vector maps. In waygh, T. C. and Healey, R. G., editors, Advances in GIS Research: Proceedings of the Sixth International Symposium on Spatial Data Handling, volume 1, pages 112–126.

[Mosses, 1990] Mosses, P. D. (1990). Denotational semantics. In Leeuwen, Y. V., editor, Formal Models and Semantics, volume B of Handbook of Theoretical Computer Science, chapter 11, pages 577–631. Elsevier.

[Mosses, 1992] Mosses, P. D. (1992). Action Semantics. Cambridge University Press. Number 26 in Cambridge Tracts in Theoretical Computer Science.

[Nodine et al., 1999] Nodine, M., Bohrer, W., and Ngu, A. H. H. (1999). Semantic brokering over dynamic heterogeneous data sources in InfoSleuth. In Kitsuregawa, M., Maciaszek, L., Papazoglou, M., and Pu, C., editors, Proceedings of the Fifteenth International Conference on Data Engineering, pages 23–26, Sydney, Australia.

[OGC, 1999a] OGC (1999). Features, The OpenGIS Abstract Specification Topic 5. OpenGIS Consortium, 35 Main Street, Suite5, Wayland, MA 01778, version 4 edition.

[OGC, 1999b] OGC (1999). Semantics and Information Communities, The OpenGIS Abstract Specification Topic 14. OpenGIS Consortium, 35 Main Street, Suite5, Wayland, MA 01778, version 4 edition.

[OntoWeb 2002] Dieter Fensel and Asunción Gómes Pérez (2002). A survey on ontology tools. http://ontoweb.aifb.uni-karlsruhe.de/About/Deliverables/D13_v1-0.zip.

170

References

[Palopoli et al., 1999] Palopoli, L., Sacca, D., and Ursino, D. (1999). Semi-automatic techniques for deriving interscheme properties from database schemes. Data and Knowledge Engineering, 30(3):239–273.

[Parent et al., 1996] Parent, C., Spaccapietra, S., and Devogele, T. (1996). Conflicts in spatial database integration. In Proceedings of the Ninth International Conference on Parallel and Distributed Computing Systems, pages 772–778, Dijon, France.

[Patel-Schneider et al., 1996] Patel-Schneider, P. F., Abrahams, M., Resnick, L. A., McGuinness, D. L., and Borgida, A. (1996). NeoClassic Reference Manual: Version 1.0. Artificial Intelligence Principles Research Department, AT&T Labs Research.

[RodrÌguez and Egenhofer, 2003] RodrÌguez, A. and Egenhofer, M. J. (2003). Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering, 15(2):442–456.

[Rosenthal and Sciore, 1995] Rosenthal, A. and Sciore, E. (1995). Description, conversion, and planning for semantic interoperability. In Meersman, R. and Mark, L., editors, Database Application Semantics, Proc. of Conf. on Data Semantics, IFIP WG6.2, pages 140–164.

[Sato and Fujimoto, 2000] Sato, H. and Fujimoto, K. (2000). A new approach to semantic word-matching for knowledge acquisition from text containing daily-used words. In Advances in Intellegent Systems: Theory and Applications, volume 59, pages 135–140. IOS Press.

[Sheth, 1998] Sheth, A. P. (1998). Changing focus on interoperability in information systems: from system, syntax, structure to semantics. In Goodchild, M. F., Egenhofer, M. J., Fegeas, R., and Kottman, C. A., editors, Interoperating Geographic Information Systems, pages 5–30. Kluwer.

[Singh et al., 1997] Singh, M. P., Cannata, P. E., Huhns, M. N., Jacobs, N., Ksiezyk, T., Ong, K., Sheth, A. P., Tomlinson, C., and Woelk, D. (1997). The Carnot heterogeneous database project: Implemented applications. Distributed and Paralleled Databases, 5(2):207–225.

[Smith and Mark, 1998] Smith, B. and Mark, D. (1998). Ontology and geographic kinds. In Peucker, T. and N.Chrisman, editors, Proceedings, International Symposium on Spatial

171

Data Handling (SDH’98), pages 308–320. Taylor and Francis. Vancouver, Canada.

[Sowa, 1984] Sowa, J. F. (1984). Conceptual Structures: Information Processing in Mind and Machine. Addison Wesley.

[Sowa, 2000] Sowa, J. F. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks/Cole - Thomson Learning.

[Uitermark et al., 1999a] Uitermark, H., Vogels, A., and van Oosterom, P. (1999). Semantic and geometric aspects of integrating road networks. In Vckovski, A., Brassel, K. E., and Schek, H.-J., editors, Interoperating Geographic Information Systems, Second International Conference, INTEROP ’99 (LNCS 1580), pages 177–188.

[Uitermark et al., 1999b] Uitermark, H. T., van Oosterom1, P. J. M., Mars, N. J. I., and Molenaar, M. (1999b). Ontology-based geographic data set integration. In Bˆhlen, M. H., Jensen, C. S., and Scholl, M. O., editors, Proceedings International Workshop on Spatio-Temporal Database Management STDBM’99 (LNCS 1678), pages 60–78, Scotland. Springer.

[Uschold, 1996] Uschold, M. (1996). Building ontologies: Towards a unified methodology. In 16th Annual Conf. of the British Computer Society Specialist Group on Expert Systems, Cambridge, UK. ftp://ftp.aiai.ed.ac.uk/pub/documents/1996/96-es96unified-method.ps.gz.

[Uschold and Grüninger, 1996] Uschold, M. and Grüninger, M. (1996). ONTOLOGIES: Principles, methods and applications. The Knowledge Engineering Review, 11(2):93–156.

[Uschold and King, 1995] Uschold, M. and King, M. (1995). Towards a methodology for building ontologies. In Workshop on Basic Ontological Issues in Knowledge Sharing, held in conduction with IJCAI-95. ftp://ftp.aiai.ed.ac.uk/pub/documents/1995/95-ontijcai95-ont-method.ps.gz.

[Visser et al., 1999] Visser, P. R., Jones, D. M., Beer, M., Bench-Capon, T., Diaz, B., and Shave, M. (1999). Resolving ontological heterogeneity in the KRAFT project. In 10th International Conference and Workshop on Database and Expert Systems Applications DEXA’99. University of Florence, Italy.

172

References

[Visser and Cui, 1998] Visser, P. R. S. and Cui, Z. (1998). Heterogeneous ontology structures for distributed architectures. In ECAI-98 Workshop on Applications of Ontologies and Problem-solving Methods, pages 112–119.

[Visser et al., 1998] Visser, P. R. S., Jones, D. M., Bench-Capon, T. J. M., and Shave, M. J. R. (1998). Assessing heterogeneity by classifying ontology mismatches. In Guarino, N., editor, Formal Ontology in Information Systems, Proceedings of FOIS’98, pages 148–162, Trento, Italy. IOS Press.

[Warren and Fernando, 1982] Warren, D. H. D. and Fernando, P. C. N. (1982). An efficient easily adaptable system for interpreting natural language querties. Computational Linguistics, 8(3-4):110–122.

[Weisstein, 1999] Weisstein, E. W. (1999). CRC Coincise Encyclopedia of Mathematics. CRC Press.

[Welty and Guarino, 2001] Welty, C. and Guarino, N. (2001). Supporting ontological analysis of taxonomic relationships. Data and Knowledge Engineering, 39(1):51–74.

[Welty and Smith, 2001] Welty, C. and Smith, B., editors (2001). Proceedings of the International Conference on Formal Ontology in Information Systems, volume 2001, Ogunquit, Maine, USA. ACM/SIGART, ACM Press.

[Woelk et al., 1996] Woelk, D., Cannata, P., Huhans, M., Jacobs, N., Ksiezyk, T., Lavender, R. G., Meredith, G., Ong, K. L., Shen, W. M., Singh, M., and Tomlinson, C. (1996). Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, chapter 18: Carnot Prototype, pages 621–651. Prentice Hall, http:// www.csc.ncsu.edu/faculty/mpsingh/papers/databases/omdbook.pdf.

[Woods and Schmolz, 1992] Woods, W. A. and Schmolz, J. G. (1992). The KL-ONE family. In Lehmann, F., editor, Semantic Networks in Artificial Intelligence, pages 133–177. Pergamon Press.

[Worboys and Deen, 1991] Worboys, M. F. and Deen, S. M. (1991). Semantic heterogeneity in distributed geographic databases. SIGMOD RECORD, 20(4):30–34.

173

174

INDEX

Symbols (KA)2 21, 68

E

A

F

Extensional relations 39

ABOX 58–59, 64 Affordance 21 ATKIS 8, 52, 56, 95, 100

FLORID 52, 60, 62–63, 65 Formal ontologies, See Ontologies Frame-based Logic 8, 52, 64, 120

B

G

Building ontologies 24, 95–99 C

Carnot project 17 Characteristic features 47, 55 COIN project 18 Community 2, 38, 41–43 Componential theory 47, 99 Concept 53 Non-primitive concepts 55 Primitive concepts 55 Conceptual model 4, 34 Conceptualization 7, 39–40 Consistency 4–5, 59–60 Constraints 54 D

Database Schema, see Schema Description Logic 8, 52, 64, 77, 120 Disjoint, See Semantic similarity Domain 39

GDF 8, 52, 57, 95, 100–?? Grammar 29 See also Syntax H

Higher-level ontology 42, 76, 77 Homonym 68, 76, 89 Hypernym 16 Hyponym 16, 74 See also Semantic similarity I

Inconsistency, See Consistency Individuals 39, 40, 48, 55 InfoSleuth project 17–18 Instance 54 Intensional definitions 44 Non-primitive definitions 55 Primitive definitions 55 Intensional relations 39 INTERLIS 15, 29, 32

175

K

Knowledge 38, 44, 48 Knowledge representation 27, 38, 55, 64

KRAFT project 18, 19, 76 L

Loom 55 M

Merged-ontology 77 METHONTOLOGY 20 Mini-world 33 Model 33, 35 N

NeoClassic 52, 55, 60 O

OBSERVER 18–19, 52, 55, 71, 76 OIL 23 On2Broker 21, 22–23, 52, 68, 71 OntoClean 20 Ontolingua 20, 52, 76 Ontologies 6–7, 38 Open GIS Consortium 2, 13–14 Overlap, See Semantic similarity P

PowerLoom 55, 64, 65, 77, 120 Prototype theory 47, 99 R

RDF 23, 25, 54 Relation 53–54 Resource Description Framework, See RDF S

Schema Conceptual schema 34–37, 38 Database schema 32–34, 37 Schema heterogeneity 4, 15–16 Semantic heterogeneity 4, 5, 31, 32 Semantic similarity 72 Disjoint 73 Equal terms 74 Overlap 73 Specialized terms 74 176

Semantic Web 21–23 Semantics 2, 5–6, 28 SHOE project 21, 22, 52, 55, 69, 71 SiLRI 52 Similarity relations 16, 19 See also Semantic similarity State of the mini-world 33 State of the world 39 Symbols 29, 38 Constant-symbols 31 Identifiers 31 Non-terminal symbols 29 Terminal symbols 31 Synonym 16, 68, 79, 85, 89 See also Semantic similarity Syntax 5–6, 29–31 T

TBOX 58–59, 64 Terminology 3, 19 Terms 38 Thesaurus 16, 24 TOVE project 19–20 U

UML 54 W

WordNet 39, 76

Curriculum Vitae

Personal Information Last name:

Hakimpour

First name:

Farshad

Date of Birth:

30, June, 1966

Place of Birth:

Teheran, Iran

Education 1999-2003

Ph.D. Degree from University of Zurich, Switzerland. Thesis topic:’Using Ontologies to Resolve Semantic Heterogeneity for Integrating Spatial Database Schemata’

1995-1996

M.Sc. Degree from International Institute for Aerospace Survey and Earth Science (ITC), The Netherlands. Thesis topic:’Metrics for Logical Consistency in Spatial Relations’

1993-1993

Postgraduate Diploma in ‘Integrated Map and Geoinformation Production’ from International Institute for Aerospace Survey and Earth Science (ITC), The Netherlands.

1984-1990

B.Sc. Degree in ‘Computer Engineering, Software Specialization’ from Teheran University, Iran.

1980-1984

High school Diploma in ‘Mathematics and Physics’, Iran