updating relational databases through weak ... - Semantic Scholar

UPDATING RELATIONAL DATABASES THROUGH WEAK INSTANCE INTERFACES

Paolo Atzeni Dipartimento di Informatica e Sistemistica Universita di Roma \La Sapienza" Via Salaria 113 00198 Roma, Italy Riccardo Torlone IASI{CNR Viale Manzoni 30 00185 Roma, Italy

A preliminary version of this paper appears in the Proceedings of the Eight ACM SIGACT SIGMOD SIGART Symposium on Principles of Database Systems, Philadelphia, March 1989. This work was partially supported by Ministero della Pubblica Istruzione, within the Project \Metodi formali e strumenti per basi di dati evolute", and by Consiglio Nazionale delle Ricerche, within \Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo, Obiettivo LOGIDATA+". Part of this work was performed while the rst author was with IASI{CNR and then with Universita di Napoli.

Abstract The problem of updating databases through interfaces based on the weak instance model is studied, thus extending previous proposals that considered them only from the query point of view. Insertions and deletions of tuples are considered. As a preliminary tool, a lattice on states is de ned, based on the information content of the various states. Potential results of an insertion are states that contain at least the information in the original state and that in the new tuple. Sometimes there is no potential result, and in the other cases there may be many of them. We argue that the insertion is deterministic if the state that contains the information common to all the potential results (the greatest lower bound, in the lattice framework) is a potential result itself. Eective characterizations for the various cases exist. A symmetric approach is followed for deletions, with fewer cases, since there are always potential results; determinism is characterized consequently.

1 Introduction In a relational database, the universe of discourse is represented by means of a set of relations. The weak instance model [5, 24, 31, 32, 33, 35] provides a framework to consider a database as a whole, regardless of the way attributes appear in the various relation schemes. The information content of a database state can be considered to be embodied in the representative instance, a sort of relation with variables, obtained by extending all relations to the global set of attributes (called the universe) and then by chasing [30] their union by means of the dependencies de ned on the database scheme. The weak instance model has been used as the basis for an interface for querying databases with reference only to attributes, ignoring relation names, and therefore obtaining a high level logical independence [35]: query answering is performed by rst computing the total projection of the representative instance on the set of attributes involved in the query, and then executing whatever further operations are needed. Any subset of the universe can in principle be queried.

Example 1 Figures 1, 2, and 3 respectively show a database state, the corresponding representative instance, and a total projection, which refer to a database scheme with the relation schemes R1 (EP ), R2 (ED), R3 (DM ), and the functional dependencies E ! D, D ! M as constraints.

Other interfaces exist that allow the formulation of queries referring only to attribute names; they are indicated under the generic name of \universal relation interfaces" [31, 38, 40]. The dierence between the weak instance interface and the others is that in the latter query answering is based on heuristics, whereas in the former it relies on a solid basis [32, 33]; the various approaches are compared by Maier et al. [31, 32]. Weak instance interfaces (as well as other universal relation interfaces) have been essentially proposed as tools for querying databases: in most of the proposals, it is assumed that updates are performed only on the database relations. There is therefore a marked contrast with queries, which can be asked without even referring 3

Employee Project John A John B Bob B Tom C

Employee John Bob Jim

Dept CS EE MS

Dept Manager CS Smith IE White

Figure 1: A database state. Employee Project Dept Manager John A CS Smith John B CS Smith Bob B EE v1 Tom C v2 v3 John v4 CS Smith Bob v5 EE v1 Jim v6 MS v7 v8 v9 CS Smith v10 v11 IE White Figure 2: A representative instance. Employee Manager John Smith Figure 3: A total projection of the representative instance.

4

to relation schemes. Various authors (including Maier et al. [31], Codd [15], and Vardi [39]) have noted this asymmetry, observing that even the basic de nitions are problematic. Let us discuss the main diculties. In order to obtain a uniform management of queries and updates, the latter should be de nable without referring to relation schemes. Let us consider insertions: the interface should allow the user to insert tuples de ned over any subset of the universe. Intuitively, the result of an insertion of a tuple in a state should be a state that contains (1) the information in the original state and (2) the information embodied in the new tuple. Part (1) can be implemented by requiring that the result be obtained only by adding tuples to the relations in the original state; part (2), by requiring the new tuple to appear in the appropriate total projection of the representative instance of the result. Some problems arise if we want to insert a tuple that is de ned on a set of attributes that does not coincide with any relation scheme.

Example 2 If we want to insert into the state in Figure 1 a tuple de ned on EM ,

with values Dan for E and Moore for M , we obtain a state that satis es conditions (1) and (2) above only if we add one tuple to r2 and one tuple to r3, with the same value for D; however, there is no hint on which this value should be | any value could be acceptable, provided that the constraints are not violated.

On the other hand, there are insertions that do not refer to relation schemes, and at the same time can be performed in a reasonable way.

Example 3 Suppose we want to insert, again in the state in Figure 1, a tuple

de ned on EM , with values Jim for E and White for M ; by just adding a tuple t to r3 with values MS for D and White for M , we obtain a state that satis es conditions (1) and (2) above, and with a minimal eort: because of the dependency D ! M , the chase would combine t with the tuple < Jim; MS >, already in r2 , generating a tuple in the representative instance with values Jim for E and White for M .

In sum, Examples 2 and 3 con rm that, among the updates involving sets of attributes that span over two or more underlying relations, some are problematic, whereas others can be performed soundly. Therefore, we believe that it would be 5

important to extend the weak instance approach, in order to give a reasonable understanding of updates. As a matter of fact, there have been a few recent proposals [12, 27] for managing updates, but they are constrained by rather restrictive assumptions. In this paper, we provide a sound foundation to the problem of updates in the weak instance model, by giving general de nitions, which distinguish the easy cases from the dicult ones, and eective characterizations. We consider insertions and deletions of tuples, de ned over any subset of the universe. As we said, the general idea is that the result of an insertion (a somehow dual approach can be taken for deletions) of a tuple in a state is a state that contains (1) the information in the original state and (2) the information embodied in the new tuple. In general, there are many potential results, with dierent modi cations with respect to the original state: for example, other tuples may be added to the relations, beside those strictly needed to generate the new tuple as a result of queries. To compare the information embodied in the various potential results, it is useful to introduce a partial order among states, which extends a known notion of equivalence [33]. Therefore, property (1) above can be formalized by saying that a potential result follows the original state in the partial order. As a consequence, it makes sense to say that, if there is a minimum potential result (that is, a result that precedes, in the partial order, all the others), then it is an \ideal" result, because it includes only information that is strictly needed. Unfortunately, the minimum result need not exist, as sometimes (as in Example 2) there may be several, incomparable \minimal" results; so, we distinguish between deterministic and nondeterministic insertions, depending on the existence of the minimum result. As integrity constraints are de ned on databases, it is important to concentrate on consistent states (states that satisfy the integrity constraints) and to require the potential results to be consistent. It turns out that there are cases where there is no consistent result, because of con icts between the original state and the new tuple. This is demonstrated in the next example.

Example 4 If the tuple to be inserted in Example 3 had values John for E and

White for M , we could not nd any consistent state that contains both the infor6

Professor Course White CS101

Student Course John CS101

Professor Student Course White v1 CS101 v2 John CS101 Figure 4: A database state and its representative instance for the scheme in Example 5. mation in the original state and that in the new tuple, because, given the functional dependencies, they contradict each other, assigning two distinct managers to the same employee.

There are other cases in which no potential result exists, which arise for schemes that show, with respect to the weak instance approach, clear de ciencies, as shown in the following example.

Example 5 Consider the universe P(rofessor), S(tudent), C(ourse), and the data-

base with the relation schemes R1 (PC ), R2(SC ), with the empty set of constraints. Since there are no constraints, the representative instance is produced with a degenerate chase (no modi cations), and so it contains only tuples from each relation extended with variables, and no total tuples over the universe PSC (see an example in Figure 4). It follows that the insertion of a tuple t over PSC has no potential results, as there is no state for which the PSC -total projection of the representative instance contains t. Clearly, we can argue that this database scheme is de nitely not suited for being accessed through a weak instance interface, as nothing can be derived beside the information in base relations.

This paper is devoted to the formalization and complete characterization of the issues discussed above. It is organized as follows. In Section 2 we review the needed 7

background. In Section 3 we introduce the partial order on states and show that it is indeed a complete lattice. In Sections 4 and 5 we give the formal de nitions concerning insertions and deletions, respectively, and characterize the main properties in an eective manner. In Section 6 we brie y compare our approach with relevant literature. Finally, in Section 7 we summarize our results and sketch further research directions.

2 Background De nitions and Notation 2.1 Relations, Databases, and Tableaux The universe U is a nite set of symbols fA1A2:::Amg, called attributes. Following common practice in relational database literature, we use the same notation A to indicate both the single attribute A and the singleton set fAg; also, we indicate the union of attributes (or sets thereof) by means of the juxtaposition of their names. A relation scheme is an object R(X ), where R is the name of the relation scheme and X is a subset of U . A database scheme is a collection of relation schemes R = fR1(X1); :::; Rn(Xn )g, with distinct relation names (which therefore can be used to identify the relation schemes) and such that the union of the Xi's is the universe U . The domain D is the disjoint union of two countably in nite sets, the set of constants and the set of variables (for the sake of simplicity we assume that all attributes have the same domain). A tuple on a set of attributes X is a function t from X to D. If t is a tuple on X , and Y X , then t[Y ] denotes the restriction of the mapping t to Y , and is therefore a tuple on Y . With the usual abuse of notation, if A is an attribute in X , then t[A] indicates both a value and the restriction of a tuple. A tuple t on X is Y -total if t[Y ] does not involve variables; if t is X -total, we just say that it is total. An (ordinary) tableau1 T is a set of tuples on the universe U ; a variable is unique We use ordinary to distinguish these tableaux from the inconsistent tableau, to be introduced in Section 2.3. However, as in most cases no confusion may arise, the adjective \ordinary" will be 1

8

in T if it appears only once in it. A relation on a relation scheme R(X ) is a nite set of total tuples on X . A state (or a database) of a database scheme R is a function r that maps each relation scheme Ri(Xi) 2 R to a relation on Ri (Xi); with a slight abuse of notation, given R = fR1; :::; Rng, we write r = fr1; :::; rng. The total projection (#) is an operator on tableaux that produces relations, generating, given a tableau T and a subset X of U , the set of total tuples on X that are restrictions of tuples in T : #X (T ) = ft[X ] j t 2 T and t[X ] is totalg. We will also use a generalization of the total projection that still operates on tableaux, but produces database states: the projection of a tableau T on a database scheme R (indicated with #R (T )) is the state obtained by totally projecting T on the various relation schemes.

2.2 Constraints: local satisfaction and global consistency Associated with a database scheme there is usually a set of constraints, that is, properties that are satis ed by the legal states. In the framework of the weak instance model, there are two notions of satisfaction: local satisfaction, de ned on individual relations, and global satisfaction, or consistency, de ned on the database state. We introduce the two concepts in turn. Various classes of constraints have been de ned in the literature [29, 37]; here, we limit our attention to functional dependencies. Let Y Z X U ; a relation r over a scheme R(X ) (locally) satis es the functional dependency (FD) Y ! Z if, for every pair of tuples t1; t2 2 r such that t1[Y ] = t2[Y ], it is the case that t1[Z ] = t2[Z ]. Without loss of generality, we will often assume that all FDs have the form Y ! A, where A is a single attribute. Let R = fR1(X1 ); : : :; Rn(Xn )g be a database scheme; we associate with R a set of FDs F = [ni=1Fi, where, for every 1 i n, the FDs in Fi are de ned on Ri(Xi ). A state r = fr1; : : : ; rng globally satis es [24] a set of FDs F if there is a relation often omitted.

9

w on the universe U (called a weak instance for r with respect to F ) that (locally) satis es F and contains the relations of r in its projections over the respective relation schemes: Xi (w) ri, for 1 i n. A state that globally satis es the set of dependencies associated with the database scheme is also said (globally) consistent.

2.3 The Chase Procedure and the Representative Instance The de nition of global satisfaction is interesting, but not practical: in general there may be many weak instances (often in nitely many), and there is no direct way to nd any of them. However, the existence of a weak instance (and some other interesting properties) can be studied by means of the notion of representative instance, a tableau over the universe U of the attributes, de ned, for each database state r, by means of the notions of state tableau and chase. For each database state r, the state tableau for r is a tableau (indicated with Tr) formed by taking the union of all the relations in r extended to U by means of unique variables. The chase [30] is a procedure that receives as input a tableau T and generates a tableau CHASE F (T ) that, if possible, satis es the given dependencies F ; if only functional dependencies are considered, the process modi es values in the tableau, by equating variables and \promoting" variables (that is, changing them to constants). If a functional dependency tries to identify two constants, then we say that the chase encounters a contradiction, and the process stops. In the literature [13, 31, 32], it is assumed that, in this case, the result of the chase is the empty tableau. For technical reasons, which will become clear in Section 3, this is not convenient in our approach. Therefore, we extend the set of tableaux with another element, the inconsistent tableau, indicated with T1 . So, a tableau is either an ordinary tableau (as de ned in Section 2.1), or the inconsistent tableau. The latter is an abstract entity, de ned by means of its properties, which we will introduce when relevant. The rst property is the following: if a contradiction is encountered while chasing a tableau T with respect to a set of FDs F , then CHASE F (T ) = T1; also, for every F , 10

CHASE F (T1 ) = T1 .

The other important properties will be de ned at the beginning of Section 2.4 (about tableau containment) and at the end of Section 3.2 (de nition of inconsistent relation and complete inconsistent state). The representative instance for a state r, indicated with RIr, is the tableau obtained by chasing the state tableau Tr of r with respect to the dependencies associated with the database scheme: RIr = CHASEF (Tr). The main property of the representative instance is that a database state is consistent if and only if the corresponding representative instance is not the inconsistent tableau (that is, it is generated without encountering contradictions during the chase process) [24]. Also, for every consistent state r and for every X , the X -total projection of the representative instance of r is equal to the set of tuples that appear in the projection on X of every weak instance of r [32]. The weak instance approach to query answering allows queries to be formulated on databases as if they were composed of just one relation over the universe U . For every query, being X U the set of attributes involved, the evaluation requires a rst step that computes the relation over X implied by the current state: for the above consideration, it follows that the X -total projection of the representative instance is the natural content of this relation. We say that a tuple t over a set of attributes X x-belongs (in symbols t2b r) to a consistent state r of a database scheme R with a universe U X if t belongs to the X -total projection of the representative instance of r.

2.4 Tableau Containment and Equivalence Given two tableau T1, T2, we say that T1 is contained in T2 (in symbols T1 T2) if T2 is the inconsistent tableau T1, or both T1 and T2 are ordinary tableaux and there is a partial function (called containment mapping) from D to D that (i) is de ned on all the symbols appearing in T1, (ii) is the identity on constants, and (iii) if extended to rows and tableaux, maps T1 into a subset of T2 (that is, for each row t1 2 T1, there is a row t2 2 T2, such that (t1[A]) = t2[A], for every A 2 U ). If both 11

T1 T2 and T2 T1, the two tableau are equivalent (T1 T2).2 The following are useful properties of the chase that we will often use in the following.

Lemma 1 For every tableaux T , T1 and for every set F of FDs the following statements hold:

(a) T CHASE F (T ) (b) if T T1, then CHASEF (T ) CHASE F (T1) (c) if T T1 and T1 = CHASE F (T1), then CHASE F (T ) T1. Proof.

(a) If T = T1, then CHASE F (T ) = T1 and the claim follows; if CHASE F (T ) = T1, then T CHASEF (T ) follows by de nition of containment; otherwise, let be the function that maps each symbol s appearing in T to the symbol to which it is changed by the chase (which is the same for all occurrences of s in T ); then, (i) is de ned on all symbols in T ; (ii) it is the identity on constants (as the chase does not modify constants); (iii) it maps T into CHASE F (T ); that is, is a containment mapping, and so T CHASEF (T ). (b) Let T T1, assuming T 6= T1, as the case T = T1 is immediate; we prove the claim by induction on the number n of steps needed to chase T ; that is, we show that, for every 0 i n, indicating with T i the tableau produced after the i-th step in chasing T , there is a tableau T1i produced as an intermediate result in chasing T1 such that T i T1i. The basis is immediate as T 0 = T T1 by hypothesis, and we may assume T10 = T1. With respect to the induction, let T i?1 T1i?1; if T1i?1 = T1, then the claim follows directly; otherwise, there is a containment mapping from T i?1 to T1i?1. Then, consider the i-th step in the chase process of T : it equates two symbols, say s1 and s2, in T i?1, because Dierent, but equivalent de nitions of containment and equivalence of tableaux can be found in the literature [2, 3]. 2

12

of an FD Y ! A 2 F and of two tuples t1; t2 2 T i?1, with t1[Y ] = t2[Y ] and t1[A] = s1 6= s2 = t2[A]; since is a containment mapping, we have that t01 = (t1) and t02 = (t2) are tuples in T1i?1, with t01[X ] = t02[X ]. Now, we have two cases:

if t1[A] and t2[A] are distinct constants, we have that the chase of T

encounters a contradiction, and so T i = T1, and, since is a containment mapping, also t01[A] and t02[A] are distinct constants, and so also the chase of T1 encounters a contradiction, and therefore T1 is an intermediate tableau in the chase of T1, and the claim is proved. if s1 = t1[A] and s2 = t2[A] are not distinct constants, then the chase equates them, and T i is obtained from T i?1 by replacing all the occurrences of one of the symbols, say s2, with occurrences of the other; we have two subcases again: { if (s1) = (s2), then is also a containment mapping from T i to T1i?1, and so we can de ne T1i as identical to T1i?1; { if (s1) 6= (s2), then a chase step that equates them can be applied to T1i?1 (because of t01; t02 and Y ! A); let T1i the tableau obtained as a result, with all occurrences of (s2) changed to (s1); it follows that is a containment mapping from T i to T1i.

Therefore, for i = n, we have that CHASEF (T ) = T n T1n, and so, by part (a) of this lemma, we have that CHASEF (T ) CHASE F (T1i); now, T1n is an intermediate tableau in the chase of T1, and so (because the chase enjoys the Church-Rosser property, that is, its result is independent of the order of application of the individual transformations [30]), CHASE F (T1) = CHASEF (T1n), and so the claim is proved. (c) Immediate from (b). 2

13

3 A lattice on states 3.1 State equivalence and completeness In the weak instance approach, two states r1, r2 are equivalent (r1 r2) if they have the same set of weak instances [33]; this holds if and only if their representative instances are (tableau) equivalent. Therefore, two states are equivalent if and only if, for every X , their X -total projections are equal, that is if they have identical query answering behaviour. Therefore, it makes sense to consider equivalence classes of states; as representatives of the various classes, we will often use states that enjoy the interesting property of completeness, as follows. A consistent state r is complete if it coincides with the projection of its representative instance on the database scheme [33], that is, if r = #R (RIr). It is known that each consistent state is equivalent to one and only one complete state [33, Section 3].

3.2 A partial order on states We say that a state r1 is weaker than a state r2 (r1 r2) if every weak instance of r2 is also a weak instance of r1. The relation is a partial order on the set of the complete states, since it is re exive, antisymmetric, and transitive. (Note that it is not antisymmetric on the set of consistent states). Also, it is strongly related to (tableau) containment of representative instances, and (set) containment of total projections, and relations, as stated in the next theorem, whose proof requires a few lemmata, which refer to a pair of consistent states r1 = fr1;1; : : : ; r1;ng and r2 = fr2;1; : : : ; r2;ng over the same scheme.

Lemma 2 If r1 r2 then #X (RIr1 ) #X (RIr2 ), for every X U . Proof. By [32, Theorem 1], for every consistent state r, for every X U the X ?total projection of RIr equals the intersection of the projections over X of the weak instances of r. Then, by de nition of the partial order , if r1 r2 then the set of weak instances of r1 contains the set of weak instances of r2; and so, for every X ,

14

the intersection of the projections over X of the weak instances of r1 is a subset of the intersection of the projections over X of the weak instances of r2, and so the claim follows. 2

Lemma 3 If r1 and r2 are complete and r1 r2 then r1;i r2;i, for every Ri 2 R. Proof. By Lemma 2, if r1 r2, then #X (RIr1 ) # X (RIr2 ), for every X U ; therefore, for every Ri 2 R, #Xi (RIr1 ) #Xi (RIr2 ), and so, since, by de nition of completeness, r1;i = #Xi (RIr1 ), r2;i = #Xi (RIr2 ), we have r1;i r2;i, for every Ri 2 R. 2 Lemma 4 If r1;i r2;i for every Ri 2 R then r1 r2. Proof. Let w be a weak instance of r2 : by de nition Xi (w) r2;i for every Ri 2 R; since r2;i r1;i, for every Ri 2 R, we have Xi (w) r1;i, for every Ri 2 R; then, since w satis es F (since it is a weak instance for r2), we have that it is also a weak instance for r1. So, every weak instance of r2 is also a weak instance of r1, that is, by de nition, r1 r2. 2 Theorem 1 Let r1 = fr1;1; : : :; r1;ng and r2 = fr2;1; : : : ; r2;ng be two consistent

states. Properties (1), (2), and (3) below are equivalent; if the states are complete, then (4) is also equivalent to the others.

(1) The state r1 is weaker than the state r2: r1 r2. (2) The representative instance of r1 is contained in the representative instance of r2: RIr1 RIr2 . (3) For every X U , the X -total projection of RIr1 is a subset of the X -total projection of RIr2 : #X (RIr1 ) #X (RIr2 ). (4) The state r1 is a relationwise subset of the state r2: for every Ri 2 R, it is the case that r1;i r2;i. Proof. We show that (1) ) (2) ) (3) ) (1), thus proving the claim for general states; with respect to complete states, the claim follows by Lemma 3 ((1) ) (4) for complete states) and Lemma 4 ((4) ) (1)).

15

(1) ) (2) A known property of representative instances is that, for every state r, for every weak instance w of r, there is a containment mapping from RIr to w [24]. Let r1 r2; let r1, r2 be the complete states equivalent to r1 and r2 respectively. In order to prove the claim, since equivalent states have equivalent representative instances, it suces to show that RIr1 RIr2 . By Lemma 3, since r1 r2 and both are complete, we have that r1 is a relationwise subset of r2; therefore, if Tr1 and Tr2 are their respective state tableaux, we have Tr1 Tr2 , and so, by Lemma 1(b), CHASE F (Tr1 ) CHASEF (Tr2 ), that is, RIr1 RIr2 . (2) ) (3) It follows directly from the de nitions of containment mapping and total projection. (3) ) (1) Again, consider the complete states r1 and r2 equivalent to r1 and r2 respectively. If #X (RIr1 ) #X (RIr2 ), for every X , then, by equivalence of states, we have #X (RIr1 ) #X (RIr2 ), for every X , and so #Xi (RIr1 ) #Xi (RIr2 ), for every Ri 2 R; by de nition of completeness, this implies r1;i r2;i, for every Ri 2 R, and so, by Lemma 4, r1 r2; nally, since r1 r1 and r2 r2, we have r1 r2. 2

Example 6 Consider the database states r1, r2 and r3 in Figure 5, and the corre-

sponding representative instances, in Figure 6. They all refer to the database scheme R = fR1(ED); R2(DM ); R3 (EM )g, with the only functional dependency D ! M as constraint. It is easy to see that r1 r2 since RIr1 RIr2 (as there are containment mappings in both directions, with v1 always mapped to `John' and v2 to `CS'); note that r2 is complete whereas r1 is not. Moreover, we have r1 r3 and r2 r3, since RIr1 RIr3 and RIr2 RIr3 ; this is because r3 contains a further piece of information besides those contained in both r1 and r2, namely: \a manager of John's is White".

Theorem 1 can be extended to inconsistent states if the notions of relation and state are extended, in the same way as we did for that of tableau: 16

r1: Employee Dept

Dept Manager CS Smith

Employee Manager

r2: Employee Dept


Employee Manager John Smith

r3: Employee Dept


Employee Manager John White

John

John

John

CS

CS

CS

Figure 5: Database states of Example 6. RIr2 Employee Dept Manager John CS Smith v1 CS Smith John v2 Smith

RIr1 Employee Dept Manager John CS Smith v1 CS Smith

RIr3 Employee Dept Manager John CS Smith v1 CS Smith John v2 White Figure 6: The representative instances of the states of Example 6.

17

for every set of attributes X U , the inconsistent relation over X is de ned as the X ?total projection of the inconsistent tableau; we assume that the

inconsistent relation over X is a superset of every relation over X ; also, every tuple over X belongs to it; it follows that every tuple t over X x-belongs to every inconsistent state;

the complete inconsistent state r1 is the state obtained by projecting the inconsistent tableau on the database scheme; its representative instance is the inconsistent tableau; note that the complete inconsistent state is unique.

It follows that r r1 for every state r.

3.3 The partial order is a lattice We show that, for each database scheme R, the partial order is indeed a lattice [11] with respect to the set of states obtained by adding the complete inconsistent state to the set of the complete consistent states of R. That is, every pair of complete states r1 and r2 has both a greatest lower bound and a least upper bound. A lower bound of r1 and r2 is a state that is weaker than both r1 and r2 ; the greatest lower bound (glb) of r1 and r2 is a lower bound of r1 and r2 such that every other lower bound is weaker than it. It follows that, if there is a glb, it is unique. The least upper bound (lub) of r1 and r2 is de ned dually: an upper bound is a state such that both r1 and r2 are weaker than it; the lub is an upper bound that is weaker than all other upper bounds. A known property of lattices is the associativity of both the glb and lub operations: therefore, we can speak of glb and lub of nite sets of states. A lattice over a domain D is complete [11] if each ( nite or in nite) subset of D has both a glb and a lub. The next two lemmata con rm that the partial order induces a complete lattice on the set of complete states.

Lemma 5 For each set R of complete states there is a glb; if R contains at least a consistent state, then the glb of R is equivalent to the state obtained as the relationwise intersection of the states in R; otherwise, the glb is the inconsistent state. 18

Proof. As there is only one complete inconsistent state, R contains only inconsistent states only if it is a singleton: R = fr1g; in this case, the glb of R is clearly r1. Otherwise, for every 1 j n, let rj be the relation obtained by intersecting, for all the consistent states in R, all the relations on the scheme Rj . The state r = fr1; r2; : : : ; rng is consistent (because each weak instance for any consistent state in R is also a weak instance for it) and weaker than each of the states in R (by de nition of r1 and by Lemma 1(b) and the equivalence of (1) and (2) in Theorem 1, since the state tableau of r is weaker than all the state tableaux of the elements of R). The state r need not be complete, but the projection rg of its representative instance on the database scheme is equivalent to it and complete; therefore, rg is a lower bound of R. Let r0 be any other lower bound: since it is complete, by the equivalence of (1) and (4) in Theorem 1, it follows that each of its relations is a subset of the intersection of the respective relations of the states in R; therefore, it is weaker than rg . 2

Lemma 6 For each set R of complete states there is a lub, equivalent to the state obtained as the relationwise union of the states in R or to the inconsistent state. Proof. First of all, if R includes r1 , then the lub is r1 as it is not weaker than any

other state. Otherwise, we can follow an argument that is dual than in the proof of Lemma 5. Let r = fr1; r2; : : :; rn g be the relationwise union of the states in R; let us distinguish three cases (note that in Section 2 we assumed our relations to be nite): 1. if some ri 2 r is an in nite sets of tuples,3 then, by the equivalence of (1) and (4) in Theorem 1, no complete state r0 with nite relations can be an upper bound for R; then, since r1 is always an upper bound, then it is the lub for R;

2. if r contains only nite relations, but it is inconsistent, then, by the equivalence of (1) and (4) in Theorem 1, no complete consistent state r0 can be an upper bound for R; then, as above, r1 is the lub for R;

We recall that in our de nitions a relation is a nite set of tuples, and therefore an in nite set of tuples is not a relation and cannot belong to a state. 3

19

3. otherwise, we can proceed as in the proof of Lemma 5, with a dual argument, and show that the lub is the complete state equivalent to r. 2 Lemmata 5 and 6 con rm the existence of a complete lattice over the set of complete states. For this reason, in the following we will essentially refer to complete states, but this will not cause a loss of generality, as for each state there is one and only one equivalent complete state.

4 Insertions 4.1 De nitions Let R = fR1; : : :; Rk g be a database scheme, with U = R1R2 : : :Rk . Given a state r of R and a tuple t over a set of attributes X U , we consider the insertion of t into r.

De nition 1 A state rp is a potential result for the insertion of t into r if r rp and t2b rp. For each state r and each tuple t, there is at least a potential result for the insertion of t in r, since the complete inconsistent state r1 is a potential result for each r and t. It is also possible that r1 is the only potential result; as shown in the examples in the introduction, this may have very dierent causes. In Example 5, the reason for the nonexistence of a consistent potential result is that the dependencies cannot generate any X -total tuple in the representative instance. In Example 4, the reason is a violation of the constraints.

De nition 2 The insertion of a tuple t in a state r is possible if there is a consistent state r0 such that t2b r0. De nition 3 A possible insertion is consistent if it has a consistent potential result. So, the insertion in Example 5 is impossible, whereas that in Example 4 is inconsistent. 20

The de nition of a potential result only requires the state to contain the information in the original state and that in the new tuple; any other piece of information may be included. It is interesting to consider the information that is \common" to all potential results: in the lattice framework described in the previous section, this is the glb of the potential results. It should be noted that this state need not be a potential result: the glb of the potential results of the insertion of the tuple < Dan ; Moore > in Example 2 is the original state itself, to which the new tuple does not x-belong. If the glb of the potential results is not a potential result, then there are various potential results each of which represents a way to insert the new tuple altering the state as little as possible.

De nition 4 A state rm is a minimal potential result for an insertion if there is no other potential result rp for the same insertion such that rp rm . Since each minimal potential result would have the same rights to be considered as \the result" for the insertion, it is reasonable to talk of determinism and nondeterminism.

De nition 5 A possible and consistent insertion is deterministic if the glb of the potential results is a potential result.

4.2 Characterizations In this section, we show necessary and sucient conditions for possibility, consistency, and determinism. We will show that possibility can be tested at scheme level since such a property depends only on the FDs de ned on the scheme and it is independent of the database state. For consistency and determinism, the characterizations are based on the construction of a particular tableau obtained by adding the tuple to be inserted to the representative instance of the initial state. Such a tableau enjoys nice properties with regard to our purpouses: we will show that (i) an insertion is consistent if and only if, applying the chase procedure to the tableau, we do not encounter contradictions, and (ii) it is deterministic if and only if the chased tableau can be reconstructed by means of its projection on the database scheme | 21

in this case such a projection is the glb of the potential results for the insertion (and so the \natural" result). Let t be a tuple on X , and consider the insertion of t into the state r of the database scheme R = fR1; : : :; Rk g.

Theorem 2 The insertion of t in a state is possible if and only if there is a relation scheme Ri (Xi ) 2 R such that F implies the FD Xi ! X . Proof. (Only if) Let the insertion be possible, r0 be any state such that t2b r0, and Tr0 its state tableau; also, let t0 be the tuple in Tr0 that is transformed by the chase into a tuple t0 such that t[X ] = t0[X ], ri00 the relation (over the scheme Ri0 (Xi0 )) it originates from, and Z the set of attributes A 2 U on which t0 is \signi cant", that is, either t0[A] is a constant or it is a variable that appears elsewhere in CHASE F (Tr0 ) (clearly, X Z ). During the esecution of the chase, the set of attributes on which t0 is signi cant evolves from Xi0 to Z , with intermediate values W0 = Xi0 ; W1; :::; Wn = Z , with Wj = Wj?1Aij and Aij 62 Wj?1 . We show that F implies Xi0 ! Wj , for every 0 j n, proceeding by induction. The basis is immediate as Xi0 ! Xi0 is a trivial FD. With regard to the induction step, assume that F implies Xi0 ! Wj?1 and consider the chase step that transforms t0[Aj ] into a signi cant symbol: it involves t0, another tuple tj in the current tableau, and an FD Yij ! Aij 2 F such that tj [Yij ] = t0[Yij ]; since tj [Yij ] = t0[Yij ], t0 has signi cant symbols for all the attributes in Yij and so Yij Wj?1; then, by the properties of FDs, since Yij ! Aij 2 F , and F implies Xi0 ! Wj?1 (by the inductive hypothesis), we have that F implies Xi0 ! Wj?1Aij , that is, Xi0 ! Wj . For j = n, F implies Xi0 ! Z and therefore Xi0 ! X . (If) Let t0 be any tuple obtained by extending t, by means of constants, to the universe U , and r0 the state obtained by projecting it on the various relation schemes. Also, let =< f1 : Y1 ! A1; :::; fm : Ym ! Am > be a derivation sequence of Xi covering X , that is, a nite sequence of FDs used in deriving Xi ! X , such that, for all 1 j m : (i) Yj Xi A1:::Aj?1, (ii) Aj 62 XiA1:::Aj?1 and (iii) Xi A1:::Am X . By induction on the length m of , we show that, for every 0 j m, the projection ti of t0 onto Xi can be extended by the chase in j steps to XiA1:::Aj, with

22

the same values as t0. The basis is trivial as for j = 0, ti = t0[Xi]. Regarding to the induction, suppose that after j ? 1 steps ti[XiA1:::Aj?1] = t0[XiA1:::Aj?1]. Consider the FD Yj ! Aj : for (i) we have Yj XiA1:::Aj?1, and so ti[Yj ] = t0[Yj ]. Moreover, by the initial hypothesis, Yj ! Aj is embedded in some Rkj 2 R, therefore, the projection tkj of t0 onto Xkj is in the tableau with tkj [Yj Aj ] = t0[Yj Aj ]. So, since for (ii) Aj 62 Xi A1:::Aj?1, we can apply a chase transformation to ti because of tkj and the FD Yj ! Aj , thus obtaining ti[XiA1:::Aj] = t0[XiA1:::Aj ]: this completes the induction proof. For j = m, this implies that t0[Xi] can be extended to XiA1:::Am, and therefore to X . Since t0[X ] = t[X ], it follows that t2b r0, and so the insertion is possible. 2 From the above theorem it turns out that, given a scheme R over a universe U , there may exist a set X U , such that, for every tuple t over X and for every state r of R, the insertion of t in r is not possible. On the other hand, it turns out that it is possible to give conditions on R in such a way that insertions are possible on all the subset of the universe. We recall that a database scheme has the lossless join property (or simply it is lossless) if, for all relations u over the universe U , it is the case that u = 1ni=1 Xi (u), that is, u can be reconstructed by joining its projections on the various relation schemes. We have the following result.

Corollary 1 Insertions in a state with scheme R are possible on all the subsets of the universe if and only if R is lossless. Proof. (Only if) Let t be any tuple on U and r = R(t). Chasing Tr corresponds exactly to testing for losslessness [1]: it turns out that if CHASE F (Tr) contains a row t0 entirely composed of constants, then R is lossless. Now, by hypothesis, the insertion of t is possible, and therefore, by Theorem 2, there is a relation scheme Ri(Xi ) 2 R such that F implies the FD Xi ! U . Then, as in the (if) part of the proof of Theorem 2, we can show by induction on the length of the derivation of Xi ! U that the projection of t onto Xi can be extended to U by chasing Tr. It follows that the above test is successful and R is lossless. (If) Let t be a tuple on any X U ; consider the tuple tu obtained by extending t

23

to U by means of constants not appearing in t and let r = R(tu). Again, chasing Tr corresponds to testing for losslessness [1]. It turns out that if R is lossless, then RIr contains a row t0 entirely composed of constants. Clearly t0 = tu, therefore t 2 #X (RIr), and so the insertion is possible. 2 Testing for possibility requires the computation of closures of sets of attributes (Theorem 2), an operation that can be performed in time O(kF k), where kF k is the size of the description of F , by using the algorithm in [10]. Since the closure has to be performed jRj times in the worst case, where jRj is the number of relation schemes in R, it follows that testing for possibility is bounded by O(jRjkF k). Note that it requires time polynomial with respect to the size of the database scheme; therefore, possibility can be veri ed very eciently. We now turn our attention to consistency, assuming that the insertion is possible. Let RIr be the representative instance of r, and Tt;r be the tableau obtained by adding to RIr a tuple te obtained by extending t to the universe U by means of unique variables.

Lemma 7 For every potential result rp, it is the case that Tt;r CHASE F (Tt;r) RIrp .

Proof. We prove Tt;r RIrp ; then the two containments in the claim will follow from Lemma 1, parts (a) and (c), respectively. If rp is an inconsistent state, then its representative instance RIrp is the inconsistent tableau, and, by de nition, Tt;r RIrp . Therefore, assume rp to be a consistent potential result for the insertion of t into r; then, by de nition, we have r rp and t2b rp; therefore, RIr RIrp and t 2 #X (RIrp ), and so, since all the variables in te are unique, Tt;r = RIr [ fteg RIrp . 2

Theorem 3 Let the insertion of t in r be possible. It is consistent if and only if CHASE F (Tt;r) = 6 T1 . Proof. (Only if). Let the insertion be possible and consistent: there is a potential result rp such that RIrp 6= T1; by Lemma 7, we have CHASE F (Tt;r) RIrp ; then,

24

assume, by way of contradiction, that CHASE F (Tt;r) = T1: by de nition of tableau containment, it would follow that RIrp = T1 | against the hypothesis. (If). Assume that the insertion be possible and that CHASE F (Tt;r) 6= T1 ; consider the tableau T 0 obtained by changing all the variables in CHASE F (Tt;r) to distinct constants not appearing in r nor in t, and the state rp obtained by projecting T 0 onto the various relation schemes. By construction, t 2 #X (T 0); then, since the insertion is possible, by Theorem 2, there is a relation scheme Ri(Xi) 2 R such that F implies the FD Xi ! X , and so t can be reconstructed by the chase from the projections on the database scheme of a U -total tuple that contains it: therefore, t2b rp. Also, Tt;r T 0, and so #R (Tt;r) #R (T 0); since r #R (Tt;r) (by construction of Tt;r), and rp = #R(T 0) (by de nition of rp), it follows that r rp, and therefore rp is a consistent potential result. 2 We turn now our attention to the characterization of deterministic insertions. Let r+ be the state obtained by (totally) projecting CHASEF (Tt;r) on the database scheme: r+ = #R (CHASEF (Tt;r)).

Lemma 8 Let the insertion of t in r be possible and consistent. Then, r+ is the glb

of the potential results.

Proof. Let us rst prove that r+ is a lower bound of the potential results: it suces to prove that, if rp is a potential result, then r+ is weaker than rp. We have CHASE F (Tt;r) RIrp (by Lemma 7) and RIr+ CHASE F (Tt;r) (by Lemma 1(c)), since, by construction, Tr+ CHASEF (Tt;r)), and therefore RIr+ RIrp . Now, let rg be the glb of the potential results; we will complete the proof by showing that rg r+. Consider CHASE F (Tt;r), and two injective functions 1, 2, that map all the variables in CHASEF (Tt;r) to constants that do not appear in CHASE F (Tt;r); also, let the set of new constants of each mapping be disjoint from the homologous set of the other. Let T1 = 1(CHASEF (Tt;r)), T2 = 2(CHASEF (Tt;r)), and r1 = #R (T1), r2 = #R(T2). Since the insertion is possible, by Theorem 2, there is a relation scheme Ri such that F implies Xi ! X . As a consequence, the chase can reconstruct t from the projections of total tuples that contain it, and therefore, since T1 and T2

25

contain only total tuples, t x-belongs to both r1 and r2. So, since, by construction, r r1 and r r2, it follows that both r1 and r2 are potential results. Since all constants in each of r1 and r2 are either new constants (not appearing in the other state nor in CHASE F (Tt;r)) or constants in CHASE F (Tt;r), it follows that r+ is the relation-wise intersection of r1 and r2; therefore, r+ is the glb of r1 and r2. Now, rg is the glb of the set of the potential results, a set of states that includes r1 and r2, and therefore it is weaker than the lower bound of r1 and r2: that is, rg is weaker than r+, and therefore, since we already established the converse (r+ rg ), we have r+ rg , and so, since r+ is complete (being constructed as the projection of a tableau) it is the glb of the potential results. 2

Theorem 4 Let the insertion of t in r be possible and consistent. It is deterministic if and only if CHASE F (Tt;r) RIr+ . Proof. Since the insertion is possible and consistent, then, by Theorem 3, we have that CHASEF (Tt;r) 6= T1. (If) It suces to show that if CHASE F (Tt;r) RIr+ 6= T1 , then r+ is a potential result.

1. t2b r+: by construction of Tt;r, we have t 2 #X (CHASE F (T )); then, since CHASE F (Tt;r) RIr+ , it follows that t 2 #X (RIr+ ); 2. r r+ : by construction of Tt;r, again, we have RIr CHASEF (Tt;r); then, since (by hypothesis) CHASEF (Tt;r) RIr+ , we have RIr RIr+ , and so r r+ . (Only if) If the insertion of t into r is deterministic, then, by de nition, the glb of the potential results is a potential result itself; as a consequence, by Lemma 8, r+ is a potential result, and so, by Lemma 7, CHASE F (Tt;r) RIr+ ; since, by construction of r+, we also have RIr+ CHASE F (Tt;r), it follows CHASE F (Tt;r) RIr+ .2

Corollary 2 Let the insertion of t in r be possible and consistent; it is deterministic if and only if t 2 #X (RIr+ ). Proof. (If) If the insertion is possible and consistent, then, by Theorem 3, RIr+ is a consistent tableau and RIr RIr+ ; if t 2 #X (RIr+ ), then Tt;r RIr+ , and, so,

26

by Lemma 1(c), we have that CHASE F (Tt;r) RIr+ ; by Lemma 8, we also have RIr+ CHASEF (Tt;r), and so RIr+ CHASE F (Tt;r); therefore, by Theorem 4, the insertion is deterministic. (Only if) If the insertion is possible, consistent and deterministic, then, by Theorem 4, RIr+ CHASEF (Tt;r), and so, since (by construction of Tt;r) t belongs to #CHASE F (Tt;r), we have that t 2 #X (RIr+ ). 2 Corollary 2 gives an eective characterization of determinism: given r and t, we can build Tt;r, chase it with respect to the given constraints, then generate r+ and compute its representative instance RIr+ , and nally check whether the total projection #X (RIr+ ) contains t.

Example 7 Consider again the insertion in Example 3. Following the de nitions,

we can build the tableau Tt;r (Figure 7), chase it (Figure 8), project the result on the database scheme(Figure 9), thus obtaining exactly the state we suggested as a result.

The basic operation to be performed for testing both consistency and determinism of an insertion is the chase of Tt;r, that is the chase of a tableau that involves the whole database state. Since computing the chase of a tableau takes polynomial time in the number of the rows of the tableau [1], it follows that, in the general case, the tests for consistency and determinism of an insertion require time (and size) polynomial with respect to the size of the database state. On the other hand, we note that no restrictions are given to the database schemes and the above characterizations work for any relational database, whichever it be. Besides, we have recentely shown [7] that, similarly to what has been done with respect to query answering, ecient algorithms to characterize updates can be given for speci c classes of schemes. In particular, for the signi cant class of independent schemes [23], we proved that testing for consistency and determinism does not require the chase of Tt;r since in this case it is sucient to consider just the relevant portion of the database.

27

Employee Project Dept Manager John A CS Smith John B CS Smith Bob B EE v1 Tom C v2 v3 John v4 CS Smith Bob v5 EE v1 Jim v6 MS v7 v8 v9 CS Smith v10 v11 IE White Jim v12 v13 White Figure 7: Tt;r for Example 7. Employee Project Dept Manager John A CS Smith John B CS Smith Bob B EE v1 Tom C v2 v3 John v4 CS Smith Bob v5 EE v1 Jim v6 MS White v8 v9 CS Smith v10 v11 IE White Figure 8:

CHASE F (Tt;r)

28

for Example 7.

Employee Project John A John B Bob B Tom C

Employee John Bob Jim

Dept CS EE MS

Dept Manager CS Smith IE White MS White

Figure 9: r+ for Example 7.

5 Deletions 5.1 De nitions The de nitions will be somehow symmetric with respect to those concerning insertions. However, the case will be easier, because no problem will arise regarding consistency and possibility.

De nition 6 A state rp is a potential result for the deletion of a tuple t from a state r if rp r and t62b rp. The empty state is a consistent potential result for every deletion, and so there is no need to de ne the notions of possible and consistent results for deletions. Again, we consider as candidates for the result of a deletion the potential results obtained by altering the original state \as little as possible".

De nition 7 A state rM is a maximal potential result for a deletion if there is no other potential results rp for the same deletion such that rM rp . However, as for insertions, there may be several, incomparable maximal potential results.

Example 8 If we want to delete the tuple < John; Smith > on EM from the

database in Figure 1, then it suces to eliminate either the tuple < John; CS > from r1 or the tuple < CS; Smith > from r2. The two states obtained in these ways are incomparable and both potential results and maximal among the potential results.

29

It is therefore reasonable to introduce a notion of deterministic result, in a way that is dual with respect to that followed for insertions.

De nition 8 A deletion is deterministic if the lub of the potential results is a po-

tential result.

5.2 Characterization of Determinism It turns out that deletions are deterministic only in very restricted cases.

Lemma 9 Let r be a consistent state and t be a tuple on X that x-belongs to r. The deletion of t from r is deterministic only if there is a relation scheme Ri(Xi) such that X Xi . Proof. If there is no relation scheme Xi that contains X , then t2b r only if there is a set of tuples T = ft1 2 ri1 , : : :, th 2 rih g that allows the chase to generate a tuple t0 2 RIr such that t[X ] = t0[X ]. In general, there may be several nonredundant sets of tuples with this property: T1 = ft1;1; : : :; t1;H1 g, : : :, Tq = ftq;1; : : :; tq;Hq g. The states obtained by removing from the relations in r one tuple from each of the Tj 's are all potential results (since they are weaker than r and, by the nonredundancy assumption, t does not x-belong to any of them). It is clear that the lub of these set of states is r itself, which is not a potential result, since t2b r. 2

The condition in Lemma 9 is not sucient, in general, to guarantee determinism (although it is indeed sucient in some restricted cases, as we will see); a further condition on the state is needed, similar to the one we found for insertions. Let r? be the state obtained by removing, from each relation ri such that X Xi, each tuple t0 such that t0[X ] = t[X ].

Lemma 10 Let r be a consistent state and t be a tuple on X that x-belongs to r. The deletion of t from r is deterministic only if t does not x-belong to r? . Proof. If t2b r? , then it is constructed by the chase from a set of tuples neither of which coincides with t on X ; therefore, we can reason as in the proof of Lemma 9. 2

30

Lemma 11 Let r be a consistent state and t be a tuple on X that x-belongs to r. If (i) there is a relation scheme Ri (Xi ) such that X Xi and (ii) t does not x-belong to r? , then the deletion of t from r is deterministic. Proof. By construction, the state r? is weaker than r; by hypothesis, we have t62b r? ; therefore, r? is a potential result. At the same time, no potential result can contain

any tuple that coincides with t on X , and therefore, every potential result is weaker than r? , and therefore it is the lub of the potential results, and so the deletion is deterministic. 2

Theorem 5 Let r be a consistent state and t be a tuple on X that x-belongs to r. The deletion of t from r is deterministic if and only if (i) there is a relation scheme Ri(Xi ) such that X Xi and (ii) t does not x-belong to r? . Proof. Follows from Lemmata 9, 10, and 11. 2

If a further assumption on the database scheme is made, then the condition in Lemma 9 is also sucient to guarantee determinism of deletions. This is important, because such a condition involves only the database scheme and not the current state, and can therefore be tested more eciently. A database scheme R is embeddedcomplete [14] if for every state r of R, and for every set of attributes X such that X Xi , for some relation scheme Ri(Xi ), the X -total projection of the representative instance equals the union of the X -projections of the relations whose scheme contains X . Informally, a database scheme is embedded-complete if every piece of information that is de ned on a subset of a relation scheme and follows from the state and the constraints is explicitly contained in one of the relations in the state.

Corollary 3 Let R be embedded-complete, r be a consistent state and t be a tuple on X that x-belongs to r. The deletion of t from r is deterministic if and only if there is a relation scheme Ri(Xi ) such that X Xi . Proof. The claim follows from Theorem 5 and the fact that, if the database scheme is embedded-complete, then no tuple over X Ri can be generated by the chase from other tuples, and so condition (ii) in the claim of the theorem follows from condition (i). 2

31

Example 9 Consider the database scheme of Example 6. If we want to delete the tuple t =< John; Smith > on EM from the state r2, it does not suce to remove such a tuple from r3 (thus obtaining the state r1), since t can be reconstructed by the chase from the tuple < John; CS > of r1 and < CS; Smith > of r2 (note that t2b r1). On the other hand, the database of Example 1 is de ned on an embeddedcomplete scheme; therefore, if we want to delete the tuple t0 =< John; CS > on ED from the database, we can argue that it is a deterministic deletion by observing that ED X2, and we can perform such a deletion by simply removing t0 from the relation r2.

6 Comparison with the related work In this section we compare our work with a number of approaches presented in the literature for problems more or less closely related with the problem of updating databases through weak instance interfaces. The comparison is organized in four parts, beginning with the approaches that are more similar to ours: 1. other approaches to updates in the weak instance model; 2. updates in the partition model; 3. updates to databases represented as logical theories; 4. updates to databases through views.

6.1 Updating representative instances Two recent papers have studied updates in the weak instance model [12, 27]. Langerak [27] studied updates through a representative instance interface, restricting his attention to the class of BCNF (that is, with FDs corresponding to keys) independent schemes; also, he allows null values in base relations. His update algorithms are based on the expressions that compute the total projections of the representative instance. For BCNF independent schemes, these expressions always exist, as shown by Sagiv [34, 35]. However, there exist schemes, outside this class, 32

for which the total projections cannot be computed by means of xed expressions: Langerak's approach cannot be easily extended to them. As a comment, we can say that this approach is \operational", whereas ours is \semantical": our de nitions are based on basic concepts in the weak instance model, and therefore hold in general. Also, it turns out that, for the comparable cases, our approach essentially reduces to Langerak's. Brosda and Vossen [12] also follow an \operational" approach; they allow insertion and deletion of tuples de ned over sets of attributes (called \objects") de ned during the design of the database scheme. In fact, the design itself is in uenced by the objects: each object is required to be included in a relation scheme and to contain its key. This approach is somehow orthogonal to ours, as we put no a priori limitation on the sets of attributes involved in an update, but characterize the desirable (deterministic) cases a posteriori. Another dierence is that Brosda and Vossen studied eective algorithms for implementing updates only under (a variation of) the uniqueness condition [34], whereas we do not impose any kind of restriction. Summarizing, what makes the main dierence between our proposal and the two discussed in this subsection, is the semantic foundation of ours, based on the lattice of states, which holds for all schemes in the weak instance model, and is not related to any speci c class. As a matter of fact, both Brosda and Vossen [12] and Langerak [27] refer to the \representative instance model", which is the operational counterpart of the \weak instance model", and to its speci c properties with respect to independent schemes: this con rms the operational nature of their approaches.

6.2 Updates in the partition model Lecluse and Spyratos [28] studied the problem of updating databases organized by means of a version of the weak instance model derived from the partition model. The partition model [36] is an original approach to databases, with clean set-theoretic semantics, which builds a sound basis for deductive query answering and updates. It turns out [16, 28] that it can be seen as a variant of the weak instance model that (i) considers a weaker version of the chase (which does not equate nulls), and 33

(ii) allows the original tuples to be de ned on any subset of the universe, and not only on those xed by the database scheme; this second dierence is in essence more important in this framework, as the rst one is often only apparent [4, 5]. In this context a database can be seen as a set of tuples, de ned over any subset of the universe, with an associated set of dependencies; it turns out that every consistent state in the weak instance model has a counterpart in the partition model, whereas the converse is not true, because in the partition model there is no xed database scheme, and so tuples can be de ned on every subset of the universe. Updates in the partition model have been considered as insertions and deletions of single tuples [28]. The approach is also based on a partial order on states, and the de nitions of insertions and deletions are based on the same principles as those we presented in the previous sections. The main dierences between the two approaches are consequences of the dierences between the two models, and speci cally of the existence of the xed database scheme in our approach and its absence in the partition approach; a more detailed discussion can be found elsewhere [6]. It should be noted that, although the two approaches were developed independently, Lecluse and Spyratos's was published rst; however, the lattice on states in the weak instance model is de nitely a nontrivial extension of the lattice in the partition model, again because of the xed database scheme. Also, the similarity ends at the de nition level, as the dierences between the models imply very dierent characterizations: our main results on possibility, consistency, and determinism have no counterpart in the partition approach.

6.3 Updates to logical theories Fagin et al. [19, 20] have studied the problem of updating databases represented as logical theories and sets thereof. As it is possible to associate logical theories with database states in the weak instance model [32], it is interesting to compare the approach of Fagin et al. with ours. Their de nitions are also based on an implicit partial order among theories, with a notion of accomplishment of an operation, which is similar to our notion of potential result, and a notion of minimal accomplishment, 34

which corresponds to our minimal (maximal) result for insertions (deletion). However, there are a number of important dierences between the two approaches. First of all, ours is more restricted, as we do not consider general logical databases, but only those corresponding to ordinary databases accessed through weak instances; however, in this way we have been able to nd more eective characterizations, based on the properties of the weak instance model. As a consequence of this freedom, Fagin et al. actually require the fact (tuple) to be inserted to appear explicitely in the theory; this is in general impossible in our approach, because the tuple need not correspond to a relation scheme. Second, in the case of multiple minimal results, they propose two alternative approaches, one considering the logical disjunction of the minimal results as \the" result, and the other based on viewing the database as a set of theories (called ock), rather than a single theory. However, in neither case there is an eective characterization of determinism. Finally, at least in principle, they handle updates leading to inconsistent results, by giving de nitions that correspond to a minimal correction of the theory.

6.4 View updates The weak instance interface can be considered as a view (or a multiview), and therefore our proposal could be compared with the various approaches to view updating [9, 17, 18, 21, 22, 25, 26]. The view update problem can be formulated as follows: a view is de ned by means of a function f that associates a view state with each database state; given a view state s0 = f (s), corresponding to a database state s, an update operation u0 on the view de nes a new state u0(s0) for the view; the problem is to nd an update operation u on the database such that f (u(s)) = u0(s0), for each state s of the database. Both Fagin et al.[20, p.360] and Gottlob et al.[22, p.521] noted that there is a major dierence between the \classical" approach to view updates [9, 17, 18, 21, 22, 25, 26] and the problem of updating logical databases as considered by Fagin et al.[19, 20], because the rst approach is essentially \imperative", as updates indicate exactly how the view should be changed as a result of their applications, whereas the second is \declarative", as the requests indicate that 35

some facts have to appear (or not appear) in the respective results, without further speci cations on them. Our approach is also declarative, as we require the result of an insertion (deletion) to contain (not contain) a tuple, with no further speci cation, except for consistency, minimality and, if possible, determinism. As a consequence, despite an apparent similarity, the problem tackled in this paper is signi cantly different from the view update problem, and so no quantitative comparison is possible; a general comparison could be possible, but it should be reconducted to a comparison of the declarative and imperative programming paradigms, and it is therefore outside the scope of this paper.

7 Conclusions In this paper we have proposed an extension to the weak instance model that puts the basis for a sound semantics of updates operations (insertions and deletions of tuples). This is particularly signi cant, as it had been argued that weak instance interfaces (as well as all other interfaces in the wider \universal relation" family) could be used only for querying databases [15, 39]. As a preliminary tool, we have de ned a lattice on states, based on a partial order that extends a known notion of equivalence of states [33]. Potential results of an insertion are states that contain at least the information in the original state and that in the new tuple. Sometimes there is no potential result (because of impossibility or inconsistency), and in other cases there may be many of them. In the latter case, we have argued that the insertion is deterministic if the greatest lower bound of the set of potential results is a potential result itself. We have followed a symmetric approach for deletions, with fewer cases, since there are always potential results. We have shown eective characterizations for possibility, consistency, and determinism of insertions, and for determinism of deletions. With respect to nondeterministic updates, we have not suggested an implementation policy, as we have just de ned and characterized the fundamental property. We can say that nondeterminism corresponds to ambiguity at the instance level (that is, actual values). Consider the second insertion in Example 2: the name of 36

the department is missing and would resolve the case. Therefore, disambiguation could be performed at operation time, for example by means of a dialogue with the user: the system should obviously try to minimize the number of requests. Two research directions have been pursued as follow-ups of the work reported in this paper. As we said at the end of Section 4, we have specialized our results to the highly signi cant class of independent schemes (with respect to any set of embedded FDs), showing interesting characterizations and ecient algorithms [7]. We have studied the problem of updating intensional predicates in Datalog databases, with an approach based on a lattice de ned on the set of states (and their corresponding least Herbrand models); again, there are notions of possibility, minimality, and determinism; the operational counterpart, in this paper based on the chase, is based on SLD-resolution [8].

Acknowledgement We would like to thank the anonymous referees, who provided very helpful comments.

References [1] A.V. Aho, C. Beeri, and J.D. Ullman. The theory of joins in relational databases. ACM Trans. on Database Syst., 4(3):297{314, 1979. [2] A.V. Aho, Y. Sagiv, and J.D. Ullman. Ecient optimization of a class of relational expressions. ACM Trans. on Database Syst., 4(4):435{454, 1979. [3] A.V. Aho, Y. Sagiv, and J.D. Ullman. Equivalence of relational expressions. SIAM Journal on Computing, 8(2):218{246, 1979. [4] P. Atzeni and M.C. De Bernardis. A new basis for the weak instance model. In Sixth ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems, pages 79{86, 1987. 37

[5] P. Atzeni and M.C. De Bernardis. A new interpretation for null values in the weak instance model. Journal of Comp. and System Sc., 41(1):25{43, August 1990. [6] P. Atzeni and R. Torlone. Approaches to updates over weak instances. In MFDBS'89 (Mathematical Fundamentals of Data Base Systems), Visegrad, Hungary, Lecture Notes in Computer Science 364, pages 12{23, SpringerVerlag, 1989. [7] P. Atzeni and R. Torlone. Ecient updates to independent schemes in the weak instance model. In ACM SIGMOD International Conf. on Management of Data, pages 84{93, 1990. [8] P. Atzeni and R. Torlone. Updating intensional predicates in Datalog. Rapporto R.295, IASI-CNR, Roma, 1990. Presented at the Workshop \Information Systems '90", Kiev, October 1990. [9] F. Bancilhon and N. Spyratos. Update semantics of relational views. ACM Trans. on Database Syst., 6(4):557{575, 1981. [10] C. Beeri and P.A. Bernstein. Computational problems related to the design of normal form relational schemas. ACM Trans. on Database Syst., 4(1):30{59, 1979. [11] G. Birkho. Lattice Theory. Colloquium Publications, Volume XXV, American Mathematical Society, third edition, 1967. [12] V. Brosda and G. Vossen. Update and retrieval in a relational database through a universal scheme interface. ACM Trans. on Database Syst., 13(4):449{485, 1988. [13] E.P.F. Chan. Optimal computation of total projections with unions of simple chase join expressions. In ACM SIGMOD International Conf. on Management of Data, pages 149{163, 1984. 38

[14] E.P.F. Chan and A.O. Mendelzon. Answering queries on embedded-complete database schemes. Journal of the ACM, 34(2):349{375, 1987. [15] E.F. Codd. `Universal' relation fails to replace relational model (letter to the editor). IEEE Software, 5(4):4{6, 1988. [16] S.S. Cosmadakis, P.C. Kanellakis, and N. Spyratos. Partition semantics for relations. Journal of Comp. and System Sc., 33(1):203{233, 1986. [17] S.S. Cosmadakis and C.H. Papadimitriou. Updates of relational views. Journal of the ACM, 31(4):742{760, October 1984. [18] U. Dayal and P.A. Bernstein. On the correct translation of update relations on relational views. ACM Trans. on Database Syst., 8(3):381{416, September 1982. [19] R. Fagin, G.M. Kuper, J.D. Ullman, and M.Y. Vardi. Updating logical databases. In P.C. Kanellakis and F.P. Preparata, editors, Advances in Computing Research, Vol.3, pages 1{18, JAI Press, 1986. [20] R. Fagin, J.D. Ullman, and M.Y. Vardi. On the semantics of updates in databases. In Second ACM SIGACT SIGMOD Symp. on Principles of Database Systems, pages 352{365, 1983. [21] A. Furtado, K. Sevcik, and C.S. Dos Santos. Permitting updates through views of databases. Information Systems, 4(4), 1979. [22] G. Gottlob, P. Paolini, and R. Zicari. Properties and update semantics of consistent views. ACM Trans. on Database Syst., 13(4):486{524, December 1988. [23] M.H. Graham and M. Yannakakis. Independent database schemas. Journal of Comp. and System Sc., 28(1):121{141, 1984. [24] P. Honeyman. Testing satisfaction of functional dependencies. Journal of the ACM, 29(3):668{677, 1982. 39

[25] A.M. Keller. Algorithms for translating view updates to database updates for views involving selections, projections, and joins. In Fourth ACM SIGACT SIGMOD Symp. on Principles of Database Systems, pages 154{163, 1985. [26] A.M. Keller and J.D. Ullman. On complementary and independent mappings on databases. In ACM SIGMOD International Conf. on Management of Data, 1984. [27] R. Langerak. View updates in relational databases with an independent scheme. ACM Trans. on Database Syst., 15(1):40{66, March 1990. [28] C. Lecluse and N. Spyratos. Implementing queries and updates on universal scheme interfaces. In Fourteenth International Conference on Very Large Data Bases, Los Angeles, pages 62{75, 1988. [29] D. Maier. The Theory of Relational Databases. Computer Science Press, Potomac, Maryland, 1983. [30] D. Maier, A.O. Mendelzon, and Y. Sagiv. Testing implications of data dependencies. ACM Trans. on Database Syst., 4(4):455{468, 1979. [31] D. Maier, D. Rozenshtein, and D.S. Warren. Window functions. In P.C. Kanellakis and F.P. Preparata, editors, Advances in Computing Research, Vol.3, pages 213{246, JAI Press, 1986. [32] D. Maier, J.D. Ullman, and M. Vardi. On the foundations of the universal relation model. ACM Trans. on Database Syst., 9(2):283{308, 1984. [33] A.O. Mendelzon. Database states and their tableaux. ACM Trans. on Database Syst., 9(2):264{282, 1984. [34] Y. Sagiv. Can we use the universal instance assumption without using nulls? In ACM SIGMOD International Conf. on Management of Data, pages 108{120, 1981. [35] Y. Sagiv. A characterization of globally consistent databases and their correct access paths. ACM Trans. on Database Syst., 8(2):266{286, 1983. 40

[36] N. Spyratos. The partition model | a deductive database model. ACM Trans. on Database Syst., 12(1):1{37, 1987. [37] J.D. Ullman. Principles of Database Systems. Computer Science Press, Potomac, Maryland, second edition, 1982. [38] J.D. Ullman. Universal relation interfaces for database systems. In IFIP Congress, pages 243{252, 1983. [39] M.Y. Vardi. Response to a letter to the editor. IEEE Software, 5(4):4{6, 1988. [40] M.Y. Vardi. The universal relation data model for logical independence. IEEE Software, 5(2):80{85, 1988.

41