Deductive Databases with Incomplete Information

8 downloads 0 Views 250KB Size Report
predicates, (ii) a nite set of Horn clause rules de ning the derived (IDB) pred- ..... answer ((joe);f?1= johng) to the query Q. In a similar way, we can get two more.
Appears in: Joint Int. Conf. and Symp. on Logic Programming, Washington, D.C., USA, November 1992, MIT Press.

Deductive Databases with Incomplete Information (Extended Abstract)

Fangqing Dong

and

Laks V.S. Lakshmanan

Dept. of Computer Science, Concordia University Montreal Quebec, Canada H3G 1M8 Abstract

We consider query processing in deductive databases with incomplete information in the form of null values. We motivate the problem of extracting the maximal information from a (deductive) database in response to queries, and formalize this in the form of conditional answers. We give a sound and complete top-down proof procedure for generating conditional answers. We also extend the well-known magic sets method to handle null values, and show that the transformed program executed by semi-naive evaluation (with minor extensions) is correct in the sense that it will generate all and only valid conditional answers w.r.t. the original program.

1 Introduction

Most of the works on deductive databases have only considered a complete information model for the set of facts available for the EDB (or base) relations. For many applications available information is typically incomplete. One form of incomplete information that has been researched extensively in the context of relational databases is the well-known null values (see [AKG 91] for a survey). Of the many di erent types of null values, the kind most researched are the so-called \exists but unknown" type of null values. Both logical (e.g., Gallaire et al [GMN 84], Reiter [Re 86], Vardi [Va 86]) and algebraic (e.g., Abiteboul et al [AKG 91]) approaches have been investigated in the literature. The main concerns have been completeness and complexity of query processing. It is well known that query processing in the presence of nulls is computationally intractable and tractability is achieved either by restricting the class of queries considered [Re 86, Va 86] or by sacri cing [Re 86, Va 86] or weakening [La 89] completeness. The question of query processing in deductive databases in the presence of incomplete information (e.g., in the form of nulls) has received relatively little attention. Demolombe and Cerro [DC 88], Liu [Li 90], and Abiteboul et al [AKG 91] are the representative works (see Section 6 for more details). In this paper, we consider query processing in deductive databases in the presence of nulls. Firstly, we set out the objective of extracting the maximal amount 1

of information from the database in answering queries. This means that even when  we may want to know that a tuple d is not provably an answer to a query p(X),  if certain conditions held, then d would be an answer. Aside from the theoretical interest, we believe information extracted in this manner will nd applications in hypothetical query answering (see Naqvi and Rossi [NR 90]) and in answering queries in the context of design databases where speci cations are often incomplete and one may want to know what would be the eventual outcomes if various design alternatives were chosen. We formalize the notion of extracting maximal information from databases using conditional answers (Section 2). The traditional approach of basing the semantics of programs on Herbrand models will obviously fail in the presence of null values. Indeed, unlike normal constants, nulls cannot be viewed syntactically. Also, during query processing care must be taken to ensure that the constraints on nulls are respected. To this end, we formalize the idea that nulls may be mapped to either some normal constant or to a completely new element in the domain, and preprocess a given datalog query program incorporating this idea (Section 3). We then give a sound and complete proof procedure called SLD?refutation for query processing (Section 4). On the bottom-up side, we develop a rewriting method which is an extension of the well-known magic sets method (see [BR 86, BR 87]) to handle null values. We also propose a complementary evaluation procedure, which is a simple extension of semi-naive evaluation. Finally, we show that all valid (conditional) answers will be generated by (extended) semi-naive evaluation of the rewritten program above and only valid answers will be generated in this manner (Section 5). We compare our work with related work in Section 6 and summarize our results and discuss future work in Section 7. We conclude this section with an example to motivate our approach of using conditional answers to queries. Consider a le system design situation where it is desired to make use of available le organization strategies and their strengths in terms of eciently supporting various types of queries. Suppose that information known to the database administrator (DBA) is represented in the form of the relations good for(Strategy; Query-type) and implemented(File; Strategy), where Strategy refers to le organization strategies and the other attributes and relations have the obvious meaning. Suppose the available knowledge is represented as the following facts together with the constraint C = f?16=?2g. Here, b = B + -tree, h = hashing, m = multilist, s = simple, r = range, bl = boolean, and f1 ; f2 ; f3 denote les. r2 : good for(b; r): r3 : good for(m; bl): r1 : good for(b; s): r4 : good for(h; s): r5 : good for(?2 ; r): r6 : implemented(f1 ; h): r7 : implemented(f2 ; ?1): r8 : implemented(f3 ; ?2): Here, r5 corresponds to the DBA's knowledge that there is a strategy which is good for range queries, and this strategy could be one of the known ones, or could be something he did not encounter before (perhaps a recent invention). Also, r7 , r8, and C correspond to the fact that the access strategies for les f2 , f3 have not been decided on yet, although there is a constraint to implement them with di erent strategies. Let supports(F; Q) mean that le F supports queries of type Q eciently. This can be de ned as r9: supports(F; Q) implemented(F; S); good for(S; Q). Now, consider the query supports(F; r), which asks for the les supporting range queries. The idea behind conditional answers is to extract tuples which would be answers if certain conditions held. Mechanically resolving the given query against rule r9 , and resolving the second subgoal in the resulting goal against r2 gives us the new goal implemented(F; b). Under the usual least Herbrand model semantics,

an attempt to unify this subgoal with r7 fails, essentially because b and ?1 are treated as distinct entities. However, what we really need is to be able to match the null ?1 with a (normal) constant like b as long as the constraints on the null values are not violated. In our case, since constraints are not violated, we would like to be able to conclude \supports(f2 ; r) provided the condition ?1= b holds". This reasoning is formalized in the next sections and we will eventually derive this conditional answer formally (Example 4.1). For lack of space, we suppress the proofs of our results in this extended abstract. Complete details are available in [DL 92] and will appear in the full paper as well.

2 Datalog? Theories

In this section, we formalize the intuition developed in the previous section. We assume the reader is familiar with the general notions of deductive databases and logic programming, SLD-refutation, bottom-up evaluation and the magic sets query rewriting method [Ul 89, Ll 87]. Datalog, the language of function-free Horn clauses, is the vehicle query language for deductive databases [Ul 89]. A datalog query program consists of (i) a nite set of unit clauses representing facts for the base (EDB) predicates, (ii) a nite set of Horn clause rules de ning the derived (IDB) predicates, and (iii) a goal clause, representing the query. In this paper, we restrict attention to \pure" datalog, in which only database (base/derived) predicates (and no arithmetic predicates) are allowed in the rules. We next extend datalog programs (whose EDB contains null values) to datalog? theories, using an extension of Reiter's [Re 86] formulation of extended relational theories. Consider a rst order language L, with a vocabulary consisting of nitely many constant symbols, denoted by D, nitely many predicate symbols pi ; qj , and in nitely many variables Xi ; Yj ; Zk . The constants di in D are either normal constants cj , or nulls ?k . We assume the vocabulary includes the arithmetic relations =; 6=; and  are redundant.) Let  be any datalog program. Then associated with  is a logical theory P of L, called the datalog? theory of , consisting of: (i) Unique Name Axioms (UNA): for every pair of distinct normal constants ci and cj in , an axiom ci 6= cj ; (ii) Domain Closure Axiom (DCA): the axiom 8X[X = d1 _  _ X = dn], where d1;    ; dn are all the constants mentioned in the program ; (iii) Completion Axioms (COMP): for any predicate symbol p in the program , de ned by the rules p(ti ) Ai1 ^    ^ Aim , i=1,  ,k, the  X)  $ (X = t1 ^ A11 ^   ^ A1m1 ) _   _ (X = tk ^ Ak1 ^   ^ Akm )] axiom 8X[p( (Aij 's are positive literals, ti = (ti1 ;    ; tin) and tij 's are constants or variables, X = (X1    Xn ), and X = ti is shorthand for X1 = ti1 ^    ^ Xn = tin ); (iv) Constraints: there is a set C of constraints of the form diRdj , where R is one of the arithmetic relations =; 6=; ). Given an ordered list E of conditions, the f-term representation Ef of E is de ned recursively as follows: Ef = , if E = fg; Ef = f(?1 ; d1; Ef ), if E = f?1= d1 ; ?2= d2;    ; ?n= dn g, where Ef is the f-term representing E = f?2= d2;    ; ?n= dng. However, for simplicity, we use the same symbol E to denote a set of conditions (or their conjunction) and sometimes the term corresponding to the condition set. Besides, in rewriting programs incorporating SIP, we (suggestively) use E; E ; E i;    as variables ranging over such condition terms. The intended meaning should be clear from the context. For any adorned datalog rule ra , we modify the rule ra by adding subgoals involving the predicate sip so that the resulting rule will implement SIP, respecting the semantics of nulls. For ease of exposition, in the following algorithm, we restrict attention to the case where no variable occurs in a subgoal more than once. The algorithm can be easily extended to deal with the general case. We remark that when we later apply the magic sets transformation, auxiliary predicates including magic and supplementary predicates will be added as subgoals in the rewritten rules. Clearly, these are derived predicates and hence conditions must be associated with each of them. Anticipating this, we suppose in the following algorithm that the initial condition set (corresponding to the magic predicate) in a rule body is E 0 . Algorithm SIPN transforms each adorned rule in order to implement SIP in a correct manner. 0

0

0

0

Algorithm SIPN (SIP for nulls)

Input: An adorned datalog rule ra : pa (t) B1 ;    ; Bm , where Bi 's are positive literals. Vb will denote the set of bound variables at any given time. Output: A modi ed adorned rule ra , which correctly incorporates SIP. 0

Begin

Vb := fbound variables in pa (t)g; E := E 0 ; For i=1 to m do f 1. If Bi is a derived literal q(t ), then modify this literal to q(t ; E  ), and add a literal union(E  ; E; E ) right after this subgoal; E := E ; /* The function of union is to check if the union of condition sets E = E  [ E is consistent.*/ 2. If Bi is a base literal q(t1 ;    ; tk ), then For j=1 to k do f (a) If tj is a variable X 2 Vb , then replace any occurrence of X in Bi or subsequent subgoals by a new variable X , and add a literal sip(X; X ; E; E ) just before Bi ; E := E ; (b) If tj is a constant d, then replace tj by a new variable X , and add a literal sip(d; X ; E; E ) just before Bi ; E := E ; g; 3. Vb := Vb [fX1 ;    ; Xk g; /*Here Xi 's are all the newly introduced variables.*/ g; Change the head literal to p(t; E). End 2 Example 5.1 Let us apply Algorithm SIPN to the following datalog rules, after adornment w.r.t. the query sg(a; Y ). sg(X; Y ) flat(X; Y ): sg(X; Y ) up(X; Z); sg(Z; W); down(W; Y ): Then we get the following modi ed rules: sgbff (X; Y; E ) sipbfbf (X; X ; E 0; E ); flatbf (X ; Y ): sgbff (X; Y; E ) sipbfbf (X; X ; E 0; E 1); upbf (X ; Z); sgbf (Z; W; E 2); union(E 2 ; E 1; E 3); sipbfbf (W; W ; E 3; E ); downbf (W ; Y ): Notice that the adornment is extended to the arguments representing the conditions. Clearly, this argument is free. 2 Let ra be a resulting adorned rule of the form: p(t; E) A1 ;    ; Ak ; sip(tl ; Xl ; El ; El );    ; sip(tn ; Xn ; En; En); Ak+1;    ; Am ; where Ai 's can be database literals or sip, or union literals. Specially, Ak+1 is a base literal with Xj ; j = l;    ; n among its arguments. For all j = l;    ; n: (1) If tj is a variable Xj , then corresponding to the literal sip(tj ; Xj ; Ej ; Ej ), we include a pair of rules representing the mapping patterns associated with nulls4: match(Xj ; Xj ) A1 ;    ; Ak ; null(Xj ); Ak+1; C(Xj ; Xj ); match(Xj ; Xj ) A1 ;    ; Ak ; Ak+1; null(Xj ); C(Xj ; Xj ); 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

(2) If tj is a normal constant cj , then corresponding to the literal sip(cj ; Xj ; Ej ; Ej ), we include a rule: match(cj ; Xj ) Ak+1 ; null(Xj ); C(Xj ; cj ); (3) If tj is a null ?j , then we include a rule match(?j ; Xj ) Ak+1; C(?j ; Xj ). The literal match(X; X ) expresses the mapping patterns between nulls and (normal or null) constants, and asserts that both X and X are mapped to the same 0

0

0

0

0

0

0

0

0

individual. We make use of a built-in predicate null(X) which distinguishes nulls from normal constants. Intuitively, C(X; X ) is a meta-literal whose function is to test the consistency of the mapping patterns between X and X . A precise de nition of this test as well as details of its implementation in a bottom-up framework are discussed in Section 5.3. Now, we give the precise de nition of the predicate sip. The literal sip(X; X ; E; E ) expresses SIP from X to X under condition set E. Given the binding for X and E (the current conditions), it rst generates a binding for X and then tests if E [ fX = X g is consistent. S Then (i) E = E if X = X is valid, or X = X 2 E; (ii) E = fX = X g E if X = X 62 E and fX = X g [ E is consistent. To generate a binding for X , sip uses match and con rms that the generated binding is legal w.r.t. the constraints on nulls. Finally, the de nition of the predicate sip is as follows: sip(X; X; E; E): sip(X; X ; E; E ) match(X; X ); null(X); verify(X; X ; E; E ): sip(X; X ; E; E ) match(X; X ); null(X ); verify(X ; X; E; E ): verify(X; X ; E; E) verify1 (X; X ; E; f(X; X ; E )): verify(X; X ; E; f(X; X ; E)) verify1 (X; X ; E; ): verify1 (X; X ; E; E ) verify1 (X; X ; E; f(Y; Y ; E )); X 6= Y; C (X; Y; X ; Y ): verify1 (X; X ; E; E ) verify1 (X; X ; E; f(Y; Y ; E )); X 6= Y ; C (X; Y; X ; Y ): verify1 (X; X ; E; E) m sip(X; E); match(X; X ); null(X): verify1 (X ; X; E; E) m sip(X; E); match(X; X ); null(X ): verify1 (X; X ; E; E) m union(f(X; X ; E ); E): union(; E; E): union(f(X; X ; E ); E ; E) verify(X; X ; E ; E0); union(E ; E0; E): The literal verify(X; X ; E; E ) veri es if condition X = X is consistent with the current condition set E, and generates a condition set E de ned as: (i) E = E if the constraint X = X 2 E; (ii) E = fX = X g [ E if X = X 62 E and fX = X g [ E is consistent. In fact, verify is de ned in a bottom-up manner using verify1 . The literal verify1 (X; X ; E; E ) expresses that condition X = X is consistent with the condition set E ? E , provided E is a subset of E (and that X = X remains to be con rmed to be consistent with E ). The \starting point" of the predicate verify1 is de ned using the magic predicates m sip (m union) associated with sip (union) predicates. The literal union(E1; E2; E) veri es if the union of E1 and E2 is consistent and rewrites the union (in term form) as E. Notice that we use a special constant `' to represent the empty set of conditions. C  (X; Y; X ; Y ) is a meta-literal, used to test the consistency between two mapping patterns from X and Y to X and Y respectively. Details associated with the two meta-predicates C and C  are discussed in the Section 5.3. We remark that some of the rules involving the auxiliary predicates (like sip) appear unsafe but because of the binding pattern (i:e: sipbfbf ) with which they will be accessed, no problem will arise. This will be clear when we discuss a complete example in Section 5.2. 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

00

0

0

0

0

00

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

5.2 Applying Magic Sets Transformation We integrate the techniques developed in Section 5.1 into the magic sets transformation and apply it to a given datalog? program P in three phases. Phase 1 Applying adornments to P to generate an adorned program P a.

Phase 2 Modifying the rules to correctly implement SIP, respecting the semantics

for nulls and their constraints. More speci cally, for every adorned rule ra of P a: (2.1) Applying algorithm SIPN to the adorned rule ra ; (2.2) For every newly introduced literal sip(X; X ; E; E ), generating rules to de ne mapping patterns. Phase 3 Applying magic-sets transformation to the resulting program, and using supplementary predicates to eliminate redundant subexpressions. Notice that every magic predicate should have an argument position associated with condition sets. Even through this argument position is free, we should not delete it, since it plays a crucial role in answer generation. (Also, we do not apply magic-sets transformation to the predicates verify, verify1 , and match, because the rules for these predicates have been designed with bottom-up computation in mind. Consequently, an application of magic sets to these predicates can not make their evaluation any more ecient.) 2 Example 5.2 Consider the program of Example 5.1. On applying the SIPN and magic sets transformation to the adorned program, we get the following program which consists of three groups of rules: the answer-generator group, conditionchecker group, and mapping-generator group. (We suppress the adornments of predicates to improve readability.)  Answer-generator Group sg(X; Y; E ) m sg(X; E); sip(X; X ; E; E ); flat(X ; Y ): sg(X; Y; E ) s sg2 (X; W; E 3); sip(W; W ; E 3; E ); down(W ; Y ): q(Y; E) sg(a; Y; E): sip(X; X; E; E) m sip(X; E): sip(X; X ; E; E ) m sip(X; E); match(X; X ); null(X); verify(X; X ; E; E ): sip(X; X ; E; E ) m sip(X; E); match(X; X ); null(X ); verify(X ; X; E; E ): m sg(a; ): m sg(Z; E ) m sg(X; E); sip(X; X ; E; E ); up(X ; Z): m sip(X; E) m sg(X; E): m sip(W; E 3 ) s sg2 (X; W; E 3): s sg1 (X; Z; E 1) m sg(X; E); sip(X; X ; E; E 1); up(X ; Z): s sg2 (X; W; E 3) s sg1 (X; Z; E 1); sg(Z; W; E 2); union(E 2 ; E 1; E 3):  Condition-checker Group - All rules for verify and verify1 (see Section 5.1) union(; E; E) m union(; E): union(f(X; X ; E ); E ; E) m union(f(X; X ; E ); E ); verify(X; X ; E ; E0); union(E ; E0; E): m union(E ; E0) m union(f(X; X ; E ); E ); verify(X; X ; E ; E0): m union(E 2 ; E 1) s sg1 (X; Z; E 1); sg(Z; W; E 2):  Mapping-generator Group match(X; X ) m sg(X; E); null(X); flat(X ; Y ); C(X; X ): match(X; X ) m sg(X; E); flat(X ; Y ); null(X ); C(X ; X): match(X; X ) m sg(X; E); null(X); up(X ; Y ); C(X; X ): match(X; X ) m sg(X; E); up(X ; Y ); null(X ); C(X ; X): match(W; W ) s sg2 (X; W; E); null(W); down(W ; Y ); C(W; W ): match(W; W ) s sg2 (X; W; E); down(W ; Y ); null(W ); C(W ; W): Notice that s sg1 and s sg2 are supplementary predicates introduced to avoid re0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

00

0

0

0

0

0

0

0

0

0

0

00

00

0

0

00

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

00

0

0

0

0

0

0

0

0

dundant computations of common subexpressions. Also, in the rules for match, the magic predicate m sg has been used in place of sg to improve the eciency. 2 A remark about the size of the transformed program is in order. Firstly, the numbers of rules de ning the predicates sip, verify, verify1 , and union are a constant, regardless of the size of the query program. Secondly, the numbers of rules de ning the predicates m sip, m union, and match are each proportional to the number of times SIP is performed in the program's rules. This in turn is determined by the number of subgoals in rule bodies and their argument sharing pattern. The numbers of rules for m sg, s sg1 , and s sg2 are just as in the usual magic sets transformation. Thus, the size of the transformed program is comparable to the size of the program obtained by traditional magic sets rewriting.

5.3 Bottom-Up Query Evaluation Based on Magic Sets In the de nition of predicates sip and match, we introduced two special metapredicates C  and C. In this section, we give formal de nitions for these two meta-predicates and discuss how to integrate their implementation into the seminaive evaluation algorithm. The meta-predicates C  and C are de ned as follows. The literal C  (X; Y; X ; Y ) tests the consistency between the two mapping patterns X 7?! X and Y 7?! Y by testing if the constraints C [X=X ; Y=Y ] are consistent. 0

0

0

0

0

0

The literal C(X; X ) tests the consistency for the mapping pattern X 7?! X by testing if the constraints C [X=X ] are consistent. The only extra implementationdetail imposed by the meta-predicates C and C  is to test the consistency of the conditions associated with their arguments. It turns out this checking can be incorporated with a minor modi cation of any bottom-up evaluation algorithm such as Semi-Naive (SN) evaluation. We de ne SN ? -evaluation as a modi cation to SN evaluation as follows. An ecient way of evaluating the meta-predicate C(X; X ) is to use it as the condition for taking the -join of the relations in the body of the rule in which it occurs (e:g:, see the rules in the mapping generator group in Example 5.2). As regards C  (X; Y; X ; Y ), it should be clear from the discussion (also see Example 5.2) that C  is always accessed with all its arguments bound. In this case, in each iteration of the SN evaluation the conditions represented by C  (X; Y; X ; Y ) are tested; tuples for the rule body (and head) will be generated only when the conditions being tested are satis ed. Thus, incorporating the two meta-predicates into SN evaluation can be done relatively easily while preserving the traditional advantages of bottom-up query processing. The following result establishes the correctness of the bottom-up query evaluation method based on magic sets transformation, followed by SN? -evaluation.  be a query, and MS(P) Theorem 5.1 Let P be a datalog? program, and Q  p(X) the transformed program obtained using the (extended) magic sets method. Let q(d1 ; E1);    ; q(dk ; Ek) be the answers to the query Q obtained from MS(P), using SN? -evaluation. Let cond(Ei) = f?j = dj j Ei has a subterm of the form f(?j ; dj ; t)g. Then kQkcP  f(d1; cond(E1));    ; (dk; cond(Ek ))g; P j= cond(Ei ) ! p(di ); i = 1;    ; k; 0

0

0

0

0

0

0

0

0

0

that is, bottom-up query evaluation based on magic sets transformation and SN? evaluation will generate only valid answers and all minimal answers will be generated

2 Notice that the minimalityof conditions associated with answers to a query based on this algorithm may not be guaranteed because in the proposed (extended) magic sets method, we do not provide a device to check redundant condition sets. Minimality can be achieved by comparing condition sets and deleting redundant ones.

by this method.

6 Comparison with Related Work

Although null values in relational databases have been studied in the framework of the so-called logical databases [GMN 84, Re 86, Va 86], it was only recently that deductive databases with null values have been considered. Demolombe and Cerro [DC 88] has extended conventional relational algebra with the idea of making it available for bottom-up evaluation of queries. However, unlike here, they have not proposed any query rewriting strategies to precede evaluation. In the absence of such strategies, bottom-up evaluation (even using semi-naive) can be prohibitively expensive. Besides, their algebra is only complete for a restricted class of ( rstorder) queries. In some sense, the answers generated by Liu [Li 90] (also see section 1) are similar to conditional answers. However, there are important di erences between his work and ours. Firstly, Liu's framework of S-constants corresponds to assuming that null values always assume the value of one of the known individuals, unlike our approach. Secondly, unlike us, [Li 90] does not provide any optimization strategy for bottom-up query processing computation. For deductive databases, this is particularly important. Finally, in our framework conditional answers are generated without complicating the existing notions of uni cation and SLD-resolution while Liu's approach uses a complex form of uni cation which tries to incorporate consistency checking as part of it. Also, he does not explicitly handle constraints on nulls, although they are implicit in the value ranges of S-constants. Our approach in SLD? -refutation was inspired by constraint logic programming(see Stuckey [St 90]). Since this paradigm concerns reasoning with and about constraints, an answer to a query from a constraint logic program is essentially the answer under any model of the program generated by instantiating any variables in the constraints by constants which satisfy them. Analogously, we view null values as placeholders which can be mapped to any (normal or null) constant as long as the constraints are not violated. We remark a direct approach of transforming a datalog? program into a constraint logic program is inappropriate for the purpose on hand. The diculties stem from the facts: (i) due to the modularity of rules, the bindings for local variables in rule bodies which associate with nulls can not be delivered to the heads as well as to other rules; (ii) without evaluating a (sub)goal, it is dicult for a subgoal to reason about the nulls associated with other subgoals. Indeed, a straightforward approach would produce a program which is exponentially larger than the original program [DL 92]. Two well-known query rewriting strategies are generalized magic sets [Ra 88] and magic conditions [M* 90] which appear to be capable of handling \generalized" bindings corresponding to conditions. A natural question is whether these methods can be directly used to handle (conditions involving) null values. A close examination will reveal that these methods can only deal with conditions operating at the level of relations, rather than (di erent) conditions applying to individual tuples. Thus, there was a need for a genuine extension to a method such as magic sets to

deal with null values. Abiteboul et al [AKG 91] has shown that for their basic model of tables, the question of deciding whether a tuple is an answer to a datalog query in some possible world is NP-hard. Generation of conditional answers for recursive queries can solve the above problem and the lower bound above trivially applies to it. However, we believe there are good reasons for considering this problem and developing query processing techniques for it. Firstly, incompleteness in (deductive) databases is a real problem in practice and we do need the functionality to deal with it in processing queries. Secondly, when the number of null values is bounded, we can show that conditional answers can be generated in polynomial time in the database size. Thirdly, we believe a framework such as the one developed in this paper can be the rst step (i) in identifying types of queries/databases on which queries can be processed eciently, and (ii) for devising strategies for deriving approximate answers (i.e. a subset of valid answers) to queries while achieving eciency.

7 Summary and Future Research

We motivated the problem of generating conditional answers to queries on deductive databases containing null values. We developed a sound and complete proof procedure called SLD? -refutation. We also developed an extension to the basic magic sets rewriting method and showed that the rewritten program evaluated bottomup using semi-naive evaluation (with minor extensions) will generate all minimal conditional answers and will only generate valid conditional answers. We are currently working on an implementation of the extended magic sets method and SN? -evaluation on top of the LDL deductive DBMS [C*90]. In future research, we would like to characterize query classes and databases (based on their structure) for which conditional answers can be generated eciently. Another attractive direction is to identify weaker forms of completeness (as was done for rst-order queries { see [La 89]) w.r.t. which conditional answers for recursive queries can be generated eciently.

Acknowledgments

The authors wish to thank V.S.Alagar for stimulating discussions. The research was supported by grants from NSERC (Canada) and FCAR (Quebec).

Footnotes

Since we suppose that the constraints are fully speci ed, E is inconsistent i it has a constraint of the form 6= 6= , or . Fully speci ed constraints are used only as a theoretical device to simplify consistency checking. More ecient implementation of consistency checking for constraints is possible using graph-theoretic techniques. 2 Notice that answers obtained using SLD? -refutation could contain redundant ones, as we do not check minimality in the refutation process. 3 In general, minimality of the conditions is not guaranteed. 4 Evidently, there are many joins which are redundantly computed several times. This can be easily avoided by using the supplementary predicates, used with magic sets rewriting. The details are discussed in Section 5.2. 1

0

c

c; c < c; x

X

X < X

E

References

[AKG 91] Abiteboul,S., Kanellakis,P. and Grahne,G.: \On the representation and querying of sets of possible worlds," Theoretical Computer Science 78 (1991), 159-187. [BR 86] Bancillion,F. and Ramakrishnan,R.: \An amateur's introduction to recursive query processing strategies," Proc. ACM-SIGMOD Int. Conf. on Management of Data (1986), 16-52. [BR 87] Beeri,C. and Ramakrishnan,R.: \On the power of magic," Proc: of 6th ACM SIGMOD Symposium on PODS (1987), 269-283. [C*90] Chimenti, D. et al, \The LDL system prototype" in IEEE Trans. on Knowledge and Data Eng., Vol. 2. No. 1 (1990), pp. 76-90. [DC 88] Demolombe,R. and Cerro, L.F.D.: \An algebraic evaluation method for deduction in incomplete data bases," The Journal of Logic Programming, No.5 (1988), 183-205. [DL 92] Dong,F. and Lakshmanan,V.S.: \Deductive databases with incomplete information," Tech. Report, Dept. of Computer Science, Concordia University (March 1992). [GMN 84] Gallaire,H., Minker, J., And Nicolas,J.-M.: \Logic and databases:a deductive approach," Computing Surveys, Vol.16, No.2 (June 1984), 151-185. [La 89] Lakshmanan, V.S.: \Query evaluation with null values: how complex is completeness?," Proc. 9th Int. Conf. Foundation of Software Technology and Theoretical Computer Science, LNCS vol. 405, Springer-Verlag (1989), 204-222. [Li 90] Liu,Y.: \Null values in de nite programs," Proc: North American on Logic Programming Conference (1990), 273-288. [Ll 87] Lloyd,J.W.: Foundations of Logic Programming, Springer-Verlag, New York (1987). [M* 90] Mumick, I.S., Finkelstein, S.J., Pirahesh, H., and Ramakrishnan, R.: \Magic conditions," Proc. of 9th ACM SIGMOD Symposium on POPS (1990), 161-171. [NR 90] Naqvi,S.A. and Rossi,F.: \Reasoning in inconsistent databases," Proc. North American on Logic Programming Conference (1990), 255-272. [NR 91] Naughton, J.F. and Ramakrishnan,R.: \Bottom-up evaluation of logic programs," To appear in Journal of Logic Programming. [Ra 88] Ramakrishnan,R.: \Magic templates: a spellbinding approach to logic programs," Proc. Int. Conf. and Symp. on Logic Programming (1988), 140-159. [Re 86] Reiter,R.: \A sound and sometimes complete query evaluation algorithm for relational databases with null values," JACM, Vol.33 No.2 (April 1986), 349-370. [St 90] Stuckey, P.J.: \Constructive negation for constraint logic programming," manuscript (1991). [Ul 89] Ullman,J.D.: Principles of Database and Knowledge-Base Systems, vol. I & II, Comp. Sci. Press, MD (1989). [Ul 89a] Ullman,J.D.: \Bottom-up beats top-down for datalog," In Proc: ACM Symposium on PODS (1989).

[Va 86] Vardi, M.Y.: \Querying logical databases," Journal of Computer and System Sciences, No.33 (1986), 142-160.