Integrating Terminological and Deductive Reasoning - ETH E-Collection

5 downloads 5127 Views 219KB Size Report
Fikes & Kehler 85]. KL-ONE separates domain knowledge into two components: In ..... We nally de ne the domain as D := fbill, john, sales, production, cg and the.
Integrating Terminological and Deductive Reasoning Urs Badertscher and Robert Marti Institut fur Informationssysteme ETH-Zentrum CH-8092-Zurich, Switzerland e-mail: fbadertscher, [email protected] March 1995

Abstract We present a data model which supports two kinds of reasoning. Firstly, it supports terminological reasoning in the style of KL-ONE, mainly subsumption of concepts (types) and instantiation of objects, a kind of type inference for objects. Secondly, it also supports deductive inferencing and integrity checking in the style of logic programming languages and deductive databases. This data model may also be considered as a typed deductive database with type inferences. In order to get a common semantics for both features of our data model, we map the terminological inferences to a logic program. The data model has been implemented as a prototype in Prolog.

Keywords: Terminological logics, deductive databases, integrity constraints, framebased languages, object-oriented databases.

3

Contents

1 Introduction 2 The Terminological Component

5 7

2.1 Syntax : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.2 Semantics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.3 Concept Subsumption : : : : : : : : : : : : : : : : : : : : : : : : : : 9

3 The Object Component

12

4 The Integrity Constraint Component

16

5 The Data Model as Deductive Database

19

6 Extensions of the Data Model

22

7 Comparison and Outlook Bibliography A Appendix

23 24 26

3.1 Syntax : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 3.2 Semantics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 3.3 Object Instantiation : : : : : : : : : : : : : : : : : : : : : : : : : : : 14

4.1 Syntax : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16 4.2 Semantics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 4.3 Updates : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17

5.1 The Pre-Dened and the User-Dened Database : : : : : : : : : : : : 19 5.2 Syntactic Restrictions of Deduction Rules : : : : : : : : : : : : : : : 20 5.3 Ecient Evaluation of Integrity Constraints : : : : : : : : : : : : : : 21 6.1 Terminological Cycles : : : : : : : : : : : : : : : : : : : : : : : : : : : 22 6.2 Subroles : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22 6.3 Minimum-1 Semantics : : : : : : : : : : : : : : : : : : : : : : : : : : 22

A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8

Denition of Concepts and Roles Compatibility Conditions : : : : : Concept Normalization : : : : : : The Subsumption Algorithm : : : Correctness of Subsumption : : : The Instantiation Algorithm : : : Pre-dened Integrity Constraints An Example Database : : : : : :

4

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

26 26 27 27 28 28 29 30

1 Introduction The goal of knowledge representation is to nd accurate means to formally represent structured knowledge and draw inferences from this knowledge. KL-ONE Brachman & Schmolze 85] is an important and inuential knowledge representation system. It has evolved from so-called frame-based languages Hayes 79, Fikes & Kehler 85]. KL-ONE separates domain knowledge into two components: In the terminological component (T-Box), the terminology to describe the real world has to be dened. The assertional component (A-Box) contains the world descriptions, namely individuals (objects) and their relationships to other individuals. KL-ONE-like systems have many features in common with object-oriented languages, in particular with object-oriented database languages Cattell 94]. For example, they both separate the meta level (T-Box, schema, types, classes, concepts) from the data level (A-Box, objects, instances, individuals). A second similarity is the presence of a type, class or concept hierarchy. Some object-oriented databases additionally distinguish between types and classes. The systems based on KL-ONE are commonly called terminological systems and their respective languages are typically termed terminological logic or sometimes concept language Levesque & Brachman 87, Nebel 90, Schmidt-Schauss & Smolka 91, Woods & Schmolze 92]. In the following, we use the notions of concepts and objects, as is customary in the context of terminological logics. Relations between concepts are called roles. An important dierence of terminological logics to most object-oriented languages is that the concept subsumption relationship (subclassing) and the classication of objects is inferred from the properties of the concept denitions rather than specied explicitly, e.g. using an IS-A hierarchy. However, some object-oriented databases also oer these inferences, e.g. COCOON Scholl et al. 92]. Research in deductive databases has evolved from eorts to integrate logic programming and databases. The main idea is to use logic as a uniform database language for representing factual data, deduction rules and integrity constraints as well as queries and updates Lloyd 87, Bry et al. 88, Ceri et al. 90, Das 92, Cremers et al. 94]. The aim of this work is the integration of terminological and deductive reasoning. For this purpose we have dened a data model with formal semantics and implemented it in Prolog. The main parts of our data model have also been implemented in ProQuel Pster 94]. ProQuel is a deductive database system supporting purely declarative semantics, setoriented computation and persistence Burse 92]. ProQuel is implemented on top of the commercial relational database system Oracle.

5

Our data model consists of three components. In the terminological component concepts are dened. From the denition of a concept it is possible to infer whether it is a subconcept of other concepts. This inference is called concept subsumption. In the object component, objects are specied which may be related to other objects via roles (relationships). The object component can be seen as a relational database with binary relations or - when incorporating deduction rules - as a deductive database. In the integrity constraint component, pre-dened and user-dened integrity constraints can be formulated declaratively. The integrity constraints are simplied and checked after a (semantic) transaction. If the database is not consistent after a transaction - i.e. not all integrity constraints are satised then all changes made within a transaction are rejected. Both the terminological and the object components have their own syntax and semantics and can be used separately. The instantiation algorithm associates objects with concepts, thus linking the two components together. Instantiation can be seen as concept (type) inference for objects. Alternatively, we can separate our data model into the following two parts: The pre-dened database consists of the terminological component, the fact part of the object component and the pre-dened integrity constraints. The user-dened database consists of the rest, namely the rule part of the object component and the user-dened integrity constraints. The main dierence of our data model to terminological logics is the presence of this user-dened database. This view of our data model as deductive database is further described in section 5. This work has emerged from the project Hybrides Wissensbanksystem within the Schwerpunktprogramm Informatik Reimer et al. 94] . The goals of this project are (1) to extend the frame model FRM Reimer 89] with deduction rules and integrity constraints and (2) to implement FRM eciently by mapping it to the object data model COCOON Scholl et al. 92]. This mapping is described in Norrie et al. 94]. The next three sections describe the three components of our data model - the terminological, object and integrity constraint components - with their corresponding inferences. In section 5, we consider the data model from the deductive database point of view and discuss the corresponding restrictions and semantics. Section 6 contains some extensions of the data model and section 7 a short comparison and outlook. The main text is kept rather informal. Technical and implementation details are covered in the appendix.

6

2 The Terminological Component 2.1 Syntax The two main syntactic constructs of the terminological component are concepts and roles, similar to entities and relationships in the ER-model or frames and slots in frame models. A concept can also be considered as a type or a class or as a unary relation. A role can be seen as a binary relation between concepts or as a function from a concept to a set of concepts. This follows from the semantics given in section 2.2. Atomic concepts are those concepts introduced explicitly as primitive - stating only necessary conditions for individuals to be considered as instances of them - or as dened - stating necessary and sucient conditions. Starting with atomic concepts and roles, one can construct complex concept expressions using various concept forming operators. Concept expressions are formed as follows. If C is an atomic concept (see below), then C is a concept. If C1 and C2 are concepts, then so are \C1 and C2" and \C1 or C2". If R is a role and C is a concept, then \all R : C " and \ex R : C " are also concepts. If R is a role and N is a number, then \min R : N " and \max R : N " are also concepts. The terminological language is thus very close to a subset of a rst order language, where and, or, all and ex stand for ^, _, 8 and 9 respectively. The main dierences are the absence of negation, of explicit variables and of function symbols. Primitive and de ned concepts are those concepts introduced explicitly. hprimitive-concepti hprimitive-concepti hdened-concepti

:< : :< hconcepti: := hconcepti:

Cyclic denitions of concepts are not allowed (but see the extension in section 6). Pre-de ned concepts are anything, nothing, integer, string and boolean. Primitive, dened and pre-dened concepts are called atomic concepts. Roles have to be introduced explicitly and are atomic. A terminology is a nite nonempty set of atomic concepts and roles. The Prolog-implementation of concept denition is straightforward and given in appendix A.1. Note, for example, that the denitions of the predicates concept and defined concept are mutually recursive. 7

Example 2.1 We present an example terminology with departments and employees. The data base is fully listed in appendix A.8. department office worker

:< . :< department. :< employee.

% primitive concepts

employee := all address : string and ex works_in : department and all age : integer and max age:1. leader := all address ex leads ex works_in all age

: : : :

% defined concepts

string and min address:1 and department and office and integer and max age:1.

allrounder := worker and leader. boss := leader. address , where D is any set (of objects), called the domain, and E is an extension function. E assigns an element of the domain to any object, i.e. E : T ! D (where T is the set of objects). It has to satisfy O

(Object Concept) rel(Object Role RelObject) inst

O

;! E Object] 2 E Concept] ;! < E Object] E RelObject] > 2 E Role]:

A model is called hybrid if it is both a model of an object base and of a terminology. For a given object base and terminology, there may exist no hybrid model (for example if an object is instance of a concept violating max-restrictions).

Example 3.2 We construct a hybrid model for terminology from example 2.2 and the following object base.

department:< . office :< department. employee := ex works_in:department.

inst(production, department). inst(sales, office). inst(bill, employee). rel(john, works_in, sales).

Similar to example 2.2, we have to dene a domain and an extension function. However, here we do not choose any set as domain, but we take as domain D a superset of the set of objects T = fbill, john, sales, productiong. Then we compute E for primitive concepts from instance facts only and nally we determine the extension of dened concepts and roles from instance and relation facts and the compatibility conditions of the extension function. By the instance fact inst(sales,office), we know that sales 2 E office] and further sales 2 E department] because E office]

E department]. Similarly we get production 2 E department]. Now, by the facts rel(john works in sales), inst(sales,office) and office :< department, we get john 2 E ex works in:department] = E employee]. We further conclude by the fact inst(bill,employee) that bill 2 E employee] = E ex works in:department]. So we know that there exists an element c 2 D such that < bill,c >2 E works in] O

13

and c 2 E department]. Observe that we could choose c = production or c = sales as well. We nally dene the domain as D := fbill, john, sales, production, cg and the extension function E department] := fsales, production, cg, E office] := fsalesg, E employee] := fbill, johng and E works in] := f< john,sales > < bill,c >g. Note that we can express some kind of incomplete knowledge: We know that Bill works in a department, but we do not know in which one. 2

3.3 Object Instantiation Objects can be assigned to concepts either by the user (predicate inst/2) or by the instantiation algorithm (predicates in/2, in dir/2). Predicate in dir determines the most specic concepts of which an object is an instance. Similar to the subsumption algorithm of section 2.3, we discuss the clauses for the constructs and and all of the predicate in dir/2. The instantiation algorithm is listed fully in appendix A.6. in_dir(I, C and D) :- in_dir(I,C), in_dir(I,D). % "and" in_dir(I, all R:C) :- \+ (rel(I,R,V), \+ in(V,C)). % "all" (CWA) in_dir(I, Cd) :- Cd := C, in_dir(I,C). % def.concept

The and-clause expresses that if an object is instance of both concepts C and D, then it is also instance of concept C and D. An object is instance of the concept all R:C if all its entries into relation facts of role R are instances of concept C . Note the use of the Closed World Assumption (CWA) in this case. The all-clause is the translation of 8I R C ((8V rel(I,R,V) ! in(V,C)) ! in dir(I,all R:C)) into Prolog notation. Observe that objects may not be inferred to be instances of primitive concepts, because primitive concepts do not state sucient conditions for objects to belong to them.

Example 3.3 We now present an example session for the use of the instantiation

algorithm and for querying. We use the object base of example 3.1 and the terminology of example 2.1. | ?- Object in department. Object = production ?  Object = sales ?  no | ?- bob in Concept.

14

Concept = employee ?  no | ?- bill in Concept. Concept = boss ?  Concept = employee ?  Concept = leader ?  no | ?- Object in employee. Object = bill ?  Object = bob ?  Object = john ?  no | ?- mary in Concept. no | ?- rel(E, works_in, production). E = bob ?  E = bill ?  E = john ?  no | ?- rel(E, works_in, production), rel(E, age, A), A < 38. A = 35, E = bill ?  no

is an employee because (1) all his addresses are strings (in fact he has none), (2) he works in a department (namely production) and (3) he has no known age. He is not a leader, because there is no rel(bob,leads,D) relation fact, where D is a department. Neither is he an allrounder. Bill is an employee because he works in the sales department (by the second deduction rule) further he is a leader (and therefore also a boss) because he leads the sales department and has an address (in fact, he has two). Note that John has at least one address (because he is said to be a leader), but it is unknown. Therefore, if there were no inst(john,leader) instance fact, we could not conclude that Bill is a leader. The last query asks for employees in the production department who are younger than 38. 2 Bob

15

4 The Integrity Constraint Component 4.1 Syntax In the integrity constraint component, a user may declaratively specify integrity constraints, which are checked after updates or on demand. A rst order formula is called in restricted quanti cation form if all 8-quantied variables are restricted by a range (see Bry et al. 88]). An integrity constraint is a rst order formula in restricted quantication form in which only the pre-dened predicate symbols of the terminological and object components plus the usual equality and inequality symbols may occur. We choose the syntax ic(Identier, Formula), where the Formula has to be in restricted quantication form. In order to distinguish the logical symbols from the terminological constructs, we use & (and), # (or),  (not), ! (implication) as logical connectors and forall and exists as quantiers in integrity constraints. We distinguish pre-de ned and user-de ned integrity constraints. An example of a pre-dened integrity constraint is ic(mic:unique-introduction-of-concepts, forall(C,D1,D2], C := D1 & C := D2

->

D1 = D2)).

which requires the unique introduction of concepts. Therefore it is a functional dependency constraint. Other pre-dened constraints guarantee that an object component together with its terminological component is consistent, namely that there are no instances of incoherent concepts and that the instances do not violate max-restrictions. The complete list of pre-dened integrity constraints is contained in appendix A.7. We allow abbreviations for the following three classes of user-dened classi cation constraints: disjoint(c1 c2 ) for : 8I in(I c1) & in(I c2) ! false cover(c1  c2 c) for : 8I in(I c) ! in(I c1) # in(I c2) partition(c1  c2 c) for : disjoint(c1 c2 ) & cover(c1 c2  c) The disjoint classication constraints are dened the following way. ic(udc:disjoint(C1,C2), forall(I], in(I,C1) & in(I,C2) -> false )) :disjoint(C1,C2).

Example 4.1 Some examples of user-dened integrity constraints. 16

disjoint(employee, department). ic(udc:exists-leader-in-department, forall(D], in(D,department) -> exists(L], rel(L,leads,D)))). ic(udc:only-young-salespersons, forall(E,A], rel(E,works_in,sales) & rel(E,age,A) -> A E1=E2)). ic(udc:exists-employee, exists(E], in(E,employee)))

The rst constraint is a classication constraint stating that an object may not be instance both of employee and of department. The second constraint is a referential integrity constraint requiring that every department has somebody leading it. The third constraint is a domain restriction of employees in the sales department to persons younger than 40. The fourth integrity constraint is a functional dependency constraint: The leader is functionally dependent on the department, i.e. there is at most one leader in any department. The last constraint requires that there is at least one instance of the concept employee. 2

4.2 Semantics The semantics of integrity constraints is dened according to interpretations in classical logic.

4.3 Updates We allow database updates by adding or removing of facts F and rules. The syntax is: add(F) remove(F)

A transaction is a set of updates U , i.e. transaction(U1 ::: U ]). A re-denition of a concept has to be executed as removal of the old concept denition and introduction of the new concept denition. i

n

Example 4.2 We present an example session for database updates. Note that updates may be performed on both the object and terminological component. 17

| ?- add(inst(administration, department)). The integrity constraint ucd:exists-leader-in-department : forall(A],in(A,department)->exists(B],rel(B,leads,A))) is not satisfied. Following instance(s) are violated : exists(A],rel(A,leads,administration)) = in(administration,department) -> exists(A],rel(A,leads,administration)) The fact has not been added. no | ?- transaction(add(inst(administration, department)), add(rel(john,leads,administration))]). yes | ?- rel(john, leads,X). X = production ?  X = administration ?  no | ?- add(manager := leader and min leads : 2). yes | ?- SubConcept where the set of nodes N consists of the set of predicates occurring in the rules and < p  p >2 E i predicate p occurs in the body of a deduction rule with p in the head. Edges are labelled as positive if p occurs positively in the rule body and negative if p occurs negated in the rule body. The dependency graph of the pre-dened deduction rules is given in gure 1. A set of deduction rules is said to be strati ed if the dependency graph does not contain any cycle having a negative edge. Stratication is needed to guarantee the existence of a distinguished minimal model, the so-called perfect model. The perfect model can be computed following a level mapping, which is derivable from the dependency graph. Unfortunately, a set of user-dened deduction rules containing negation in the rule body is not stratied, for example the dependency graph of the deduction rule i

j

j

i

j

j

rel(E,stock_bonus,0) :- rel(E,works_in,D), not rel(E,leads,D).

primitive_concept (: