Reduced MVD's and Minimal Covers - CiteSeerX

26 downloads 130458 Views 52KB Size Report
Center for Automation and Intelligent Systems. Case Western Reserve University. Cleveland ..... For convenience, we call such set of MVD's split free MVD's.
Reduced MVD’s and Minimal Covers* Z. Meral Ozsoyoglu and Li-Yan Yuan Computer Engineering and Science Department and Center for Automation and Intelligent Systems Case Western Reserve University Cleveland, Ohio 44106

ABSTRACT

ABSTRACT

Multivalued dependencies (MVD’s) are data dependencies which appear frequently in the ‘real world’, and play an important role in designing relational database schemes. Given a set of MVD’s to constrain a database scheme, it is desirable to obtain an equivalent set of MVD’s which do not have any redundancies. In this paper, we define such a set of MVD’s, called reduced MVD’s, and present an algorithm to obtain reduced MVD’s. We also define a minimal cover of a set of MVD’s, which is a set of reduced MVD’s, and give an efficient method to find such a minimal cover. The significance and properties of reduced MVD’s are also discussed in the context of database design (e.g. 4NF decomposition) and conflict-free MVD’s.

1. Introduction A universal relation scheme is a set of attributes. Given a set of attributes, data dependencies are used to constrain a database scheme, to specify the structure of attribute relationships so that the database scheme is a better model for the real world. Functional and multivalued dependencies (FD’s and MVD’s) are two kinds of dependencies that appear naturally in the ‘real world’. Various algorithms concerning relational databases, such as 3NF synthesis algorithm [BB], 4NF decomposition [Fa, L1], and many others, use a cover of data dependencies as part or all of their input. Since the performance of these algorithms depends on the cover used by the algorithm, it is important to obtain a ‘minimal’ cover of dependencies under consideration without any redundancies. For sets of FD’s, different notions of minimality of covers have been studied and compared in [Ma]. Maier has also shown that the cover with the smallest number of FD’s can be found in polynomial time, while finding the cover with smallest number of attribute symbols is an NP-complete problem. There are also different notions of minimality of covers for sets of MVD’s. One common definition of a minimal cover of a set of MVD’s is defined as follows [Li, BFMY, ZM]. A set M 1 of MVD’s is a minimal cover of __________________ *This research is supported in part by the NSF under grant-8306616. The paper appears in ACM Transactions on Database Systems, 12(3):377394, 1987.

-2-

another set M of MVD’s if M 1 is a cover of M (i.e. have the same closure as M), and no proper subset of M 1 is a cover of M. However, this definition is not sufficient to distinguish some redundancies in sets of MVD’s. Consider the following example from [BFMY]. Let U=ABCDE, M={E →→ B, EA →→ C}, and N={E →→ B, EAB →→ C} be two sets of MVD’s over U. With the above definition, both M and N are minimal covers of each other since no dependencies can be removed without altering the closure. However, the attribute B in EAB →→ C of N is redundant, and hence M is a ‘better’ minimal cover than N. Moreover, M is a conflict-free set of MVD’s and enjoys all the desired properties of conflict-free set of MVD’s [Li, BFMY, Sc], while N is not conflict-free. In this example, the MVD EAB →→ C in N has a redundant attribute on the left-hand-side (LHS), i.e. not left-reduced. Similarly, an MVD may not be right-reduced (see section 3 for terminology and definitions). Sets of right reduced MVD’s, called full version, are considered in [BFMY]. However, the MVD’s in the full version M 1 has the same LHS as those in M, and hence may not be left-reduced. (We shall also note that the purpose of defining full version in [BFMY] was not to remove all redundancies in a set of MVD’s.) A procedure to find a minimal cover of a set of MVD’s is outlined in [ZM] which is also concerned with redundant attributes in the left and right hand side of MVD’s as well as redundant MVD’s. In this paper, we define a reduced MVD to be a nontrivial MVD which is left- and right-reduced and nontransferable*. Given a set M of MVD’s, a set of attributes which is a LHS of a reduced MVD implied by M is called a key of M. We give an algorithm to find all reduced MVD’s in M + , which is polynomial in the number of keys**. A minimal cover of a set M of MVD’s is defined as a set M´ of reduced MVD’s such that M´ is a cover of M and no proper subset of M´ is a cover of M. We give an efficient method to find a minimal cover for any given set of MVD’s. The motivation for requiring MVD’s in a minimal cover to be reduced MVD’s stems from the need to eliminate redundancies. (In the example above, EAB →→ C is not reduced, so N is not a minimal cover with this requirement.) By using this definition, we show that the set of left-hand-sides of any minimal cover (i.e. essential keys) of a conflict-free set of MVD’s is unique. In fact, this property of unique LHS’s in a minimal cover is shown to hold for a strictly broader class than conflict free sets of MVD’s, i.e. the set of MVD’s which do not split its LHS’s. We also show that a minimal cover of a conflict-free set of MVD’s is conflict-free. This is not true if a minimal cover is defined without the requirement of reduced MVD’s. The concept of the reduced MVD’s is useful in database design problems, such as 4NF decomposition [Fa, L1], and finding a normal form for nested relations [OY]. As indicated by Lien, to find a 4NF database scheme with respect to a set M of MVD’s, the input for the Fagin’s decomposition algorithm should be a 4NF covering M 1 of M, that is, for any set R⊆U, if R is not decomposable w.r.t. any MVD in M 1 , then R is not decomposable w.r.t. any MVD in M + (see section 2 and 5 for formal definitions). Obviously, M + is a 4NF covering, but a large number of MVD’s in M + are useless for decomposition. Thus, an interesting problem is to find a minimal 4NF covering M such that no proper subset is a 4NF covering. Lien also gives two necessary conditions for MVD’s in a minimal 4NF covering [L1], but they are not sufficient as will be shown in the paper. Using the concept of reduced MVD’s we give an algorithm to find a 4NF covering M * of a set M of MVD’s, which is a proper subset of set of all reduced MVD’s implied by M. If M is conflict free then we show that M * is a minimal 4NF covering, which is also a minimal cover for M. Furthermore, by utilizing M * , we show that, if M does not split LHS(M), using the full version of M as input to a decomposition algorithm [Fa, L1] is also sufficient to guarantee a 4NF decomposition. That is, if M does not split its LHS(M), then a 4NF decomposition requires polynomial time in the size of M. In the general case, (i.e. M may split LHS(M)), the 4NF covering M * may not be minimal, and the time required to compute M * may be exponential in the size of M although it is polynomial in the number of keys. Grahne and Raiha [GR] present an interesting algorithm to find a 4NF decomposition without explicitly using a 4NF covering. Their algorithm may also require exponential time in the general case, while it is polynomial in the __________________ * Informally, an MVD is nontransferable if no attribute on the LHS can be transfered to its RHS. Formal definition is given in Section 3. ** The number of keys may be exponential in the size of the set M of MVD’s.

-3-

size of M if M does not split its LHS’s. That is, the algorithm in [GR] is polynomial for precisely the class of MVD’s for which any 4NF decomposition [Fa, L1] using a full version of M as input also guarantees a 4NF decomposition. The concepts of reduced MVD’s and minimal covers are also utilized in obtaining a normal form for nested relations [OY]. That is, the attributes are structured into nested relation schemes according to reduced MVD’s so that dependencies represented by the resulting database scheme can be equivalent (if possible) to the given set of dependencies. The normal form for nested relations is discussed in [OY]. The rest of the paper is organized as follows. In section 2, we discuss some fundamental concepts and notations. Section 3 defines reduced MVD’s, keys of a set of MVD’s, and minimal covers. The algorithm to derive all keys and reduced MVD’s of M is presented in section 4. In section 5, we discuss the use of reduced MVD’s in finding a 4NF covering of a set of MVD’s. In section 6, the properties of reduced MVD’s and the 4NF covering for MVD’s which does not split its LHS’s are presented.

2. Terminology and Basic Concepts We assume that the reader is familiar with the theory of functional dependencies, multivalued dependencies, and join dependencies [Ul]. The following usual notations are used: X→Y and X →→ Y denote an FD and an MVD respectively, X →→ Y  Z is shorthand notation for X →→ Y and X →→ Z, MD(X, W, V) denotes an MVD X →→ W holds in V, and *(R) denotes a join dependency (JD) of R. For our purpose, we need the following complete set of inference rules for MVD’s [Ul]. M1 M2 M3 M4

(complementation) (reflexivity) (augmentation) (transitivity)

If X →→ Y, then X →→ (U−XY), If Y ⊆ X, then X →→ Y, If X →→ Y and Z ⊆ W, then XW →→ YZ, If X →→ Y and Y →→ Z, then X →→ Z−Y.

M + is used to denote the closure of a set M of MVD’s, i.e. the set of all MVD’s derived from M by using the inference rules M1 − M4. Two sets M 1 and M 2 of MVD’s are said to be a cover of each other iff M 1 + = M 2 + . As usual, DEP(X) denotes the dependency basis of X w.r.t. a set of MVD’s. If D is a set of FD’s and MVD’s, LHS(D) is used to denote the set of left-hand sides of all dependencies in D. Let U be a set of attributes, we define a database scheme R = {R 1 ,..., R n } to be a set of subsets of U, and the R i ’s are called relation schemes. A database scheme R={R 1 ,..., R n } is a decomposition of U w.r.t. a set D of depenn

dencies if

∪ R i = U and D = *(R). A relation scheme R i is in 4NF w.r.t. a set D of dependencies if whenever X i =1

→→ Y holds in R i and XY ⊂ R i , then D = X → R i [Fa]. A 4NF database scheme R is a decomposition of U such that each relation scheme in R is in 4NF w.r.t. D. 4NF is desirable since it minimizes redundancies in relations [BBG, Sc]. Let g be X →→ Y. We say R ⊆ U is decomposable w.r.t. g, or g-decomposable, if X ⊂ R, and R ∩ (Y−X) and R−XY are nonempty. R is decomposable w.r.t. a set M of MVD’s, or M-decomposable, if there is at least one MVD g in M such that R is g-decomposable [L1]. Considering FD’s as MVD’s [L1, GR], it turns out that R is in 4NF w.r.t. a set M of MVD’s iff R is not M + -decomposable. A better treatment of FD’s and MVD’s by distinguishing their semantic differences for database design problems such as 4NF decomposition can be found in [YO]. A set of attributes X splits R ⊆ U, if there are two dependents V 1 and V 2 of X such that V 1 ∩ R and V 2 ∩ R are nonempty. R is split by a set M of MVD’s if there is at least one X in LHS(M) such that X splits R. A set M of MVD’s is conflict-free [L2, BFMY] if (1)

M does not split any X in LHS(M); and

-4-

(2)

(DEP(X) ∩ DEP(Y)) ⊆ DEP(X ∩ Y), for any X, Y in LHS(M).

Conflict free sets of MVD´s have several desirable properties [BFMY]. Moreover, it is claimed [Sc] that ‘real world’ sets of MVD’s are conflict-free. In fact, R ⊆ U is decomposable w.r.t. X →→ Y implies that X splits R. R is in 4NF w.r.t. a set M of MVD’s iff for each X in LHS(M + ), X ⊂ R implies that X does not split R. A set of MVD’s which does not split LHS’s also have desirable properties as will be discussed in this paper. For convenience, we call such set of MVD’s split free MVD’s. Obviously, the set of conflict free sets of MVD’s is a proper subset of split free sets of MVD’s.

3.

Reduced MVD’s and Keys

In an attempt to eliminate redundancies in a given set of MVD’s, we first define reduced MVD’s. The concept of reduced MVD’s is then used to find a minimal cover of the given set M of MVD’s, with desirable properties such as if M has a conflict free cover, then any such minimal cover is conflict-free and has a unique set of left-hand-sides. Reduced MVD’s are also used to find a 4NF covering of a given set M of MVD’s, which is indeed a minimal cover if M is conflict-free. The use of reduced MVD’s in obtaining a normal form for nested relations (i.e. relations with repeating groups) can be found in [OY]. Definition 3.1. Let M be a set of MVD’s on U, X →→ W in M + is said to be (1)

trivial, if XW = U, or W ⊆ X;

(2)

left-reducible, if there is an X´⊂ X such that X´ →→ W is in M + ;

(3)

right-reducible, if there is an W´⊂ W such that X →→ W´ is a nontrivial MVD in M + ;

(4)

transferable, if there is an X´⊂ X such that X´ →→ (X − X´)W is in M + .

An MVD X →→ W is said to be reduced if it is nontrivial, left-reduced (non-left-reducible), right-reduced (nonright-reducible), and non-transferable. Let M − = {X →→ W  X →→ W is a reduced MVD in M + }. Obviously, M − ⊆ M + . The proposition 3.1 below shows that M − is a cover of M. First we give some lemmas. Lemma 3.1. Let X →→ W be a nontrivial, left- and right-reduced MVD, and X´ →→ (X − X´)W, where X´⊆ X, be a nontransferable MVD. Then X´ →→ (X − X´)W is reduced. Proof: Let m be X´ →→ (X − X´)W. Since X →→ W is nontrivial, m is nontrivial. We need show that m is left- and right-reduced. If m is left-reducible, then there is an X´ ⊂ X´ s.t. X´ →→ (X − X´)W, hence X´(X ´ − X´) →→ W, which contradicts that X →→ W is left-reduced. If m is right-reducible, then X´ →→ X 1 W 1  X 2 W 2 , where X 1 X 2 = {X−X´) and W 1 W 2 = W. Therefore, we have X´X 1 →→ W 1 , X´X 2 →→ W 2 , and X →→ W 1  W 2 . For i = 1, 2, if W i = ∅, then X i ≠ ∅, so X →→ W is leftreducible; if both W 1 and W 2 are nonempty, then X →→ W is right-reducible. Lemma 3.2. If X →→ Y in M + is nontrivial and right reduced, then there exists an MVD Z →→ W, where Z ⊆ X, in M − such that Z →→ W = X →→ Y, and Y ⊆ W ⊆ Y(X−Z). Proof: If X →→ Y is left reducible, then there exists X´⊂ X such that X´ →→ Y in M + is left reduced. Otherwise, let X´ = X. The fact that X →→ Y is right reduced implies X´ →→ Y is right reduced. If X´→→ Y is transferable, then there exists an X´ ⊂ X´ such that X´ →→ (X´−X´)Y ´ is nontransferable. By Lemma 3.1, X´ →→ (X´ − X´)Y ´ is reduced. X´ →→ (X´ − X´)Y ´ = X´→→Y = X →→ Y, that is, X´ →→ (X´ − X´)Y ´ is Z →→ W in M − , which implies X →→ Y. Proposition 3.1. M − = M + . Proof: Let M 1 be the set of all nontrivial and right reduced MVD’s in M + . For any nontrivial MVD X →→ W in

-5M + , W is the union of some dependents of X and some attributes in X [Ul]. This implies that there exist W 1 ,...,W n such that X →→ W i , i = 1,...,n, n > 0, is nontrivial and right reduced, and {X →→ W i  i = 1,...,n} = X →→ W. That is, M 1 = M + . By Lemma 3.2, M − = M 1 . It follows that M − = M + . Elements in LHS(M − ) are called keys of M. (Note that, in [BFMY, L1, L2], keys of M are defined as the elements of LHS(M), but not as the elements of LHS(M − ). Since we consider reduced MVD’s, referring to the elements of LHS(M − ) as keys of M is more convenient in our context.) RDEP(X) denotes the set { W  W ∈ DEP(X) and X →→ W is reduced}. By definition, RDEP(X) ⊆ DEP(X), and RDEP(X) is nonempty iff X is a key. Example 3.1. Let U = ABCDGH, M = {A →→ G, B →→ H, GH →→ C  D}, from an example in [L1]. Then M − = {A →→ G  BCDH, B →→ H  ACDG, GH →→ AB  C  D, AH →→ B  C  D, BG →→ A  C  D, AB →→ C  D}. And A, B, GH, AH, BG, AB are keys. In this example, we can see that for each key X of M, there is an X´ in LHS(M) such that X´⊆ X. The following lemma ensures that this is also true in general. Lemma 3.3. Let M be a set of MVD’s. If X →→ W in M + is nontrivial, then there is an X´ in LHS(M) such that X´⊆ X. Proof: Assume not, i.e. there is no X´ ⊆ X in LHS(M). Let r be a relation which consists of two tuples that agree on X and disagree on the rest of the attributes. Relation r satisfies all MVD’s whose left hand side is not contained in X. Thus, r satisfies M but violates the MVD X →→ W. From example 3.1, it is easy to observe that for any Z ∈ LHS(M − ) and any dependent W of Z, Z →→ W is a nontrivial and nontransferable MVD. The following lemma shows that this is also true in general. Lemma 3.4. Let M be a set of MVD’s, Z be a key of M. Then for any W ∈ DEP(Z), Z →→ W is nontrivial and nontransferable. —

Proof: The nontrivial part is obvious. We need only show that Z →→ W is nontransferable. Assume not, then — — Z´ ⊂ Z s.t. Z´ →→ (Z − Z´)W, so Z´ →→ ZW. It implies that for each V ∈ DEP(Z), where V ≠ W, Z´ →→ V, therefore, Z has no reduced dependents. A contradiction. The following lemmas states some useful properties of keys which will be utilized later in the paper. Lemma 3.5. Let M be a set of MVD’s, Z be a key of M, and X ⊂ Z. Then there exists a V ∈ DEP(X) such that (a)

Z ⊂ XV, (i.e. X does not split Z),

(b)

for each W ∈ RDEP(Z), W ⊂ V,

(c)

X →→ V is left-reduced, and if X is a key then V ∈ RDEP(X),

(d)

Z splits V, and

(e)

X does not split the union of all reduced dependents of Z.

_ _ Proof: (a) Assume X splits Z, then there is V ∈ DEP(X) s.t. V 1 = V ∩ Z ≠ ∅ and V 2 = V ∩ Z ≠ ∅, where V = U − _ XV. X →→ V implies Z →→ V, hence, for each W ∈ DEP(Z), either W ⊆ V or W ⊆ V. If W ⊆ V, then since XV 1 _ _ _ →→ XV 1 V and XV 1 V ⊃ Z, XV 1 →→ W, i.e. Z →→ W is left-reducible. If W ⊆ V, similarly, Z →→ W is leftreducible. Thus for each W ∈ DEP(Z), Z →→ W is left-reducible, This is a contradiction since Z is a key. (b) X ⊂ Z implies that X does not split any W ∈ DEP(Z). That is, for each W in DEP(Z), either W ∩ V = ∅ or W ⊆ V. But if W ∩ V = ∅, then X →→ W, i.e. Z →→ W is left-reducible.

-6-

(c) If X →→ V is not left-reduced, then X´ →→ V for some proper subset X´ of X. Thus X´ splits Z, which contradicts (a). (d) Z →→ V since V ∈ DEP(X) and X ⊂ Z. If Z does not split V, then V − Z in DEP(Z). Since X →→ V, and V = (V − Z)(V ∩ Z) = (V − Z)(Z − X), this implies Z →→ (V − Z) is transferable, which contradicts to Lemma 3.4. (e) Directly follows (b). Lemma 3.6. Let M be a set of MVD’s, Z be a key of M. Then for any W ∈ DEP(Z), there exists a key X ⊆ Z such that W ∈ RDEP(X). Proof: Assume W ∈ / RDEP(Z). By Lemma 3.4, Z →→ W is left-reducible, i.e. there exists an X ⊂ Z such that X →→ W is left-reduced. Obviously, it is nontrivial and right-reduced. But if X →→ W is transferable, i.e. there exists an X´ ⊂ X such that X´ →→ W(X−X´), then X´ splits Z, which contradicts Lemma 3.5 (a). Definition 3.2. A set M of MVD’s is said to be minimal, if (1) each MVD in M is reduced, and (2) no proper subset of M is a cover of M. The difference between this definition of a minimal cover and the usual one, e.g. defined in [L1], is that we require each MVD in the cover to be reduced. The reason is natural, a non-reduced MVD implies some redundancy itself. Let M be a minimal cover, elements in LHS(M) are called essential keys of M, and elements in LHS(M − )−LHS(M) are called nonessential keys of M. Given a set M of MVD’s, a minimal cover of M can be found in polynomial time, which is shown in the following proposition, and finding M − will be discussed in the next section. Proposition 3.2. Given a set M of MVD’s, a minimal cover of M can be found in polynomial time of the input size. Proof: Without loss of generality, we assume that all MVD’s in M are nontrivial and right-reduced, since calculating dependency bases can be done efficiently [Ga]. A minimal cover of M can be obtained in two steps. First, for each X →→ V in M, find a reduced MVD Z →→ W such that Z →→ W = X →→ V to form a set M´ of reduced MVD’s such that M´ = M. By Lemma 3.2, such M´ always exists. Then, finding and deleting all redundant MVD’s in M´ will return a minimal cover of M. Since the second step can be done efficiently, we need only to show that finding M´ is in polynomial time. Let X →→ V be in M. Calculate DEP(X − A) for each A in X. If X →→ V is nonreduced, then it is either left-reducible or transferable. In either situation, there exists an MVD (X − A) →→ V´, which implies X →→ V, where V ⊆ V´ ⊆ VA, for some A in X. Obviously, (X − A) →→ V´ is nontrivial and rightreduced. If (X − A) →→ V´ is nonreduced, this method can be applied recursively until a reduced MVD Z →→ W = X →→V is found. Since DEP(X − A) can be obtained in polynomial time [Ga] and the number of attributes in X is bound by the number of attributes in U, Z →→ W can be obtained in polynomial time with respect to the input size.

4. Algorithm In this section, we present an algorithm to obtain M − from a given set of MVD’s. To obtain M − , we need to find all keys of M. The following proposition gives us a method to do so. Proposition 4.1. Let M be a set of MVD’s, Z ∈ / LHS(M) be a key of M. Then there exist a key X ⊂ Z, a V ∈ RDEP(X), and a Y ∈ LHS(M), which splits V, such that Z = X(Y ∩ V). Furthermore, if Z has more than one reduced dependents then X splits some Y´ in LHS(M). Example 4.1. Consider M in Example 3.1. A, B, GH, AH, BG, AB are all keys of M while AH, BG, and AB are

-7-

not in LHS(M). For the key AH, the X, V, and Y in proposition 4.1 are A, BCDH, and GH respectively. That is, A ⊂ AH, BCDH ∈ RDEP(A) and GH ∈ LHS(M) such that AH = A(GH ∩ BCDH). From Example 3.1, it can be seen that AH has three reduced dependents, namely, B, C and D. Thus, A splits GH ∈ LHS(M) as stated in the proposition. Similarly, for the keys BG and AB, BG = B(GH ∩ ACDG), where GH ∈ LHS(M) and ACDG ∈ RDEP(B); and AB = A(B ∩ BCDH), where B ∈ LHS(M) and BCDH ∈ RDEP(A). The keys BG and AB each have more than one reduced dependents. The key B used in obtaining the key BG also splits GH ∈ LHS(M). To prove the proposition, we need some lemmas, and definitions. Let M be a set of MVD’s on U, and V ⊆ U. Then an operator Π V is defined as follows: Π V (X →→ W) = MD(X ∩ V, W ∩ V, V). Π V (M) = {Π V (g)  g ∈ M and Π V (g) is nontrivial}. (Note that Π V (X →→ W) is an MVD which is not necessarily implied by M.) The following lemma is a technical lemma which is utilized in Lemma 4.2. Lemma 4.1. Π V (M) = Π V (M + ). Proof: Let r be a relation on V, and r´ be the relation formed by padding out to U with the same constant in all tuples of r´, i.e., any two tuples of r´ agree on all attributes of U-V. For any MVD g, r´ satisfies g if and only if r satisfies Π V (g), by the definition of an MVD and the construction of r´. It follows that: r satisfies Π V (M) r´ satisfies M r´ satisfies M + r satisfies Π V (M + ). Lemma 4.2. Let M be a set of MVD’s on U, V be a dependent of some set of attributes, and X be a key of M. If X splits V, then there exists a Y ∈ LHS(M), such that Y splits V, and (X ∩ V)⊇(Y ∩ V) ≠ ∅. Proof: Let Y 1 , Y 2 ,..., Y n be all elements in LHS(M) which split V. Since V is a dependent of some attributes, for each i, 1 ≤ i ≤ n, V i = Y i ∩ V ≠ ∅, otherwise, V is not a dependent of any set of attributes. For the same reason, X 0 = X ∩ V ≠ ∅ and there is a W ∈ DEP(X) s.t. W 0 = W ∩ V ≠ ∅, and V−X 0 W 0 ≠ ∅, i.e. Π V (X →→ W) is nontrivial. By Lemma 4.1, Π V (M) = Π V (X →→ W), and by Lemma 3.3, there exists an V 0 ∈ LHS(Π V (M)) such that V 0 ⊆ X 0 . But LHS(Π V (M))⊆{V 1 , V 2 ,..., V n }. It follows that (X ∩ V)⊇(Y ∩ V) ≠ ∅, for some Y ∈ LHS(M) and Y splits V.

Lemma 4.3. Let X ∈ / LHS(M) be a key of M, which has more than one reduced dependents. Then (1)

There exist a key X´ ⊂ X, V ∈ DEP(X´), a Y ∈ LHS(M) such that X´ splits Y, Y ∩ V ≠ ∅, and Y splits W 0 = W 01 ...W 0n , where {W 01 , ..., W 0n } = RDEP(X).

(2)

For any X´ ⊂ X, if X´ →→ V then X´ splits Y.

Proof: (1) Let M´ = M − {Y →→ W  Y →→ W is in M and Y splits W 0 }. Then M´ does not split W 0 , so does X with respect to M´. Assume DEPM´(X) = {W 1 ´, ..., W m ´, W 0 ´}, where W 0 ´ ⊇ W 0 . Thus, M = X →→ W 0 ´, but X →→ W 0 ´ is right-reducible with respect to M. By Algorithm 7.6 in [Ul], there exists a Y ∈ LHS(M) such that Y ∩ W 0 ´ = ∅ and Y splits W 0 . Thus Y ⊆/ X, since otherwise it contradicts Lemma 3.5 (e). That is, Y − X ≠ ∅, i.e. there exists a V ∈ DEPM(X) such that Y ∩ V ≠ ∅. Y ∩ W 0 ´ = ∅ implies that X →→ V is nonreduced. By Lemma 3.6, there exist a key X´ ⊂ X such that V ∈ RDEP(X´). Thus, X´ is a key, Y ∩ V ≠ ∅, and Y splits W 0 . Now, we show that X´ splits Y. Assume not, then Y ⊆ X´V. But Y splits W 0 and V ∩ W 0 = ∅, it follows that X´ splits W 0 . This contradicts Lemma 3.5 (e). (2) Assume there exists an X´ ⊂ X and X´ →→ V, but X´ does not split Y. Then Y ⊆ X´V. ´ But Y splits W 0 , and V ∩ W 0 =∅, it follows that X´ splits W 0 . This contradicts Lemma 3.5 (e).

-8-

Proof of Proposition 4.1 Let W 0 = W 01 ...W 0m , where {W 01 , ..., W 0m } = RDEP(Z). Z is not in LHS(M), by Lemma 3.3, there exists a Z´ ⊂ Z in LHS(M). Since M is nontrivial, by Lemma 3.2, there exists a key X A ⊆ Z´, i.e. there exists a key X A of M such that X A ⊂ Z. If Z has more than one reduced dependents, then let X A be a key of M such that there exist a V A ∈ DEP(X A ), a Y ∈ LHS(M) such that Y splits W 0 , Y ∩ V A ≠ ∅, and X A splits Y, as in Lemma 4.3. By Lemma 3.5, there exists a W A ∈ RDEP(X A ) such that Z ⊂ X A W A and Z splits W A . By Lemma 4.2, there exists a Y A ∈ LHS(M), where Y A splits W A , such that (Z ∩ W A ) ⊇ (Y ∩ W A ) ≠ ∅. Thus, X A (Y A ∩ W A ) ⊆ Z. Let X +A = X A (Y A ∩ W A ). If X +A = Z, it is done. Assume X +A ⊂ Z. Y A splits W A implies that X +A splits W A . X +A ⊂ Z implies that there exists a W +A ∈ DEP(X +A ) such that W 0 ⊆ W +A , Z ⊆ X +A W +A , and X +A →→ W +A is left-reduced. Since X A ⊂ X +A is a key, X +A →→ W +A is nontrivial. By Lemma 3.1, there exist an X B ⊆ X +A and W B = W +A (X +A -X B ) such that X B →→ W B is reduced. It follows that X B ⊂ Z. The fact that X A ⊂ X +A , (W +A ∩ W A ) ⊇ W 0 ≠ ∅, and X +A splits W A implies that W +A ⊂ W A , so (W +A −Z) ⊂ (W A −Z). But, (W +A (X +A −X B )) − Z = W +A −Z = W B − Z, so W B − Z ⊂ W A − Z. If Z has more than one reduced dependents then the fact that X A →→ V A , X B →→ W B , and X B W B ⊃ Z ⊃ X A implies X B →→ V A . By Lemma 4.3 (2), X B splits Y. By Lemma 4.2, there exists a Y B ∈ LHS(M), where Y B splits W B , such that (Z ∩ W B ) ⊇ (Y B ∩ W B ) ≠ ∅. If X B (Y B ∩ W B ) = Z, it is finished. Otherwise, using the same strategy, we can get another set of X C , Y C , and W C . But the fact that W B −Z ⊂ W A −Z and W B ⊇ W 0 ≠ ∅ guarantees that in some step, we do have X B (Y B ∩ W B ) = Z. Proposition 4.1 gives us a method to derive all keys from a given set M of MVD’s. Assume X is a key of M. For each V in RDEP(X), if there is an element Y in LHS(M), which splits V, then XV 0 = X(Y ∩ V) will be taken as a candidate of a key. Let CDEP(XV 0 ) = {W  W ∈ DEP(XV 0 ) and W ⊂ V}, then RDEP(XV 0 ) ⊆ CDEP(XV 0 ). XV 0 is a candidate of a key, but may not be a key. For any such candidate key XV 0 , the procedure COMPARE(X´, XV 0 ) compares XV 0 with a key or another candidate key X´ ⊂ XV 0 . For each W ∈ CEDP(XV 0 ), if X´ →→ W then W is removed from CDEP(XV 0 ) since W cannot be a reduced dependent of XV 0 . Furthermore, if X´ →→ HW for some W ∈ CDEP(XV 0 ), where (XV 0 ) ⊇ H ≠ ∅, then by Lemma 3.4, XV 0 does not have any reduced dependents, i.e. cannot be a key. In this case, CDEP(XV 0 ) is set to ∅ in the procedure COMPARE(X´, XV 0 ). Procedure COMPARE(X´, X); begin for each W ∈ CDEP(X) do if X´ →→ W then CDEP(X) := CDEP(X) − {W} else if X´ →→ HW, where H ⊆ (X−X´), then begin CDEP(X) := ∅; EXIT end; end. The algorithm to derive all keys of M is given as follows. In the algorithm, a priority queue CANDIDATE is used to store all candidates of keys. The queue gives the highest priority to the candidate with the least number of attributes. Initially, all elements in LHS(M) are put into the queue. In each iteration of step 3, the first element in the queue is picked as the key X, and compared with the other candidate keys X´ ⊃ X in the queue. If CDEP(X´) = ∅ for any candidate X´, then X´ is deleted from the queue (step 3.2). Any candidate Z that is inserted into priority queue CANDIDATE in step 3.3 is compared with the already established keys K in KEY, before the insertion, and it is inserted only if CDEP(Z) ≠ ∅ at this point (step 3.3). Thus, a non-key in CANDIDATE may be deleted before or when it moves to the top of the queue. On the other hand, a key in CANDIDATE is not deleted from the queue until it is picked as the top element, which is then put into the set of keys KEY.

-9-

ALGORITHM 4.1 Input:

A set M of MVD’s.

Output: The set of all keys of M and M − . Step 1: Let KEY := ∅; Let Y 1 , Y 2 ,..., Y m be all elements in LHS(M); Step 2: for i := 1 to m do begin CDEP(Y i ) := DEP(Y i ); PUT(Y i , CANDIDATE) end; Step 3: repeat 3.1 GET(X, CANDIDATE); 3.2 for each X´, where X´⊃X, in CANDIDATE do begin COMPARE(X, X´); if CDEP(X´) = ∅ then DELETE(X´, CANDIDATE) end; 3.3 for i := 1 to m do if X ⊆/ Y i and Y i ⊆/ X then for each V ∈ CDEP(X) which is split by Y i do begin Z := X(Y i ∩ V); CDEP(Z) := {W  W ∈ DEP(Z) and W ⊂ V}; for each K ⊂ Z in KEY do COMPARE(K, Z); if CDEP(Z) ≠ ∅ then PUT(Z, CANDIDATE) end; 3.4 KEY := KEY∪{X}; until CANDIDATE is empty; Step 4:

Output KEY as all keys of M; M − = {X →→ W  X ∈ KEY and W ∈ CDEP(X)}.

The formal proof of correctness for Algorithm 4.1 is given below (proposition 4.2). The following example demonstrates the algorithm. Example 4.1. Consider M in Example 3.1 again. Assume the input of the algorithm is M. In the beginning, KEY is empty. At step 2, we get CDEP(A) = {G, BCDH}, CDEP(B) = {H, AGCD}, CDEP(GH) = {AB, C, D}, and CANDIDATE = {A, B, GH}. In step 3.1, we, first, pick up A from CANDIDATE as X. Since no X´ ⊃ A in CANDIDATE, step 3.2 does nothing. In step 3.3, consider Y 2 be B. Let Z = A(B ∩ BCDH) = AB. CDEP(AB) = {C, D, H}. COMPARE(B, AB) deletes H from CDEP(AB) since B →→ H, i.e. CDEP(AB) = {C, D}. Since CDEP(AB) ≠ ∅, AB is put into CANDIDATE, and KEY = {A}. After this loop finished, we have CANDIDATE = {B, GH, AB}. Then return back to step 3.1, and pick up B as X. Finally, the algorithm will return {A, B, GH, AB, BG, AH} as the set of all keys of M Proposition 4.2. KEY returned by Algorithm 4.1 is the set of all keys of a given set M of MVD’s. Proof: First, we can show that for any key X in CANDIDATE, X will be put into KEY. This follows from the fact that only those elements Y, where CDEP(Y)= ∅, will be deleted from CANDIDATE, and X is a key implies that CDEP(X) ≠ ∅. Second, we show that for each key Z of M, Z will be put into CANDIDATE sooner or later. If Z is in LHS(M), it is done in Step 2; otherwise, by proposition 4.1, there exist a key X, a V ∈ DEP(X), and a Y ∈ LHS(M)

- 10 -

such that Z = X(Y ∩ V). In this case X is said to be nested in Z and the nesting level is 1. (Similarly, if X is not in LHS(M), then there exists a key X´, and V´ in DEP(X´) and Y´ in LHS(M) such that X = X´(Y´∩ V´). That is, X´ is nested in Z, and the nesting level is 2.) By induction on the number of nesting levels of X, we can show that X will be put into CANDIDATE. Thus, by the above argument, X will be put into KEY, and then, Z will be produced and put into CANDIDATE in Step 3.3. Finally, we need to show that for any Z in CANDIDATE, if Z is not a key, then Z must be deleted from CANDIDATE. Suppose Z is not a key, then for each W ∈ DEP(Z), Z →→ W is nonreduced. This implies that, by Lemma 3.1, there is a key X⊂ Z such that X →→ HW, where H ⊆ (Z−X). If X is a key, then, by the above arguments, X will be put into KEY. Since CANDIDATE is a priority queue and X < Z, before or when Z moves to the first place in CANDIDATE, X is in KEY. But, the procedure COMPARE(X´, Z´) is applied between each pair of X´ in KEY and Z´ in CANDIDATE, where X´⊂ Z´. Therefore, W is deleted from CDEP(Z). This implies that CDEP(Z) = ∅, and Z is deleted from CANDIDATE. The time complexity of Algorithm 4.1 is dominated by the following two parts: the first is the comparisons between each key in KEY and each element in CANDIDATE, the second is the computation of dependency basis of each element in CANDIDATE. Let N be the number of all keys, M be the number of MVD’s in M, and U be the number of attributes in U. By Proposition 4.1, the number of all elements in CANDIDATE, CANDIDATE, is less than N M U. Thus, the time required by the comparisons is bounded from the above by CANDIDATE N U < N2 M U2 . From [Ga], the time required for the computation of dependency basis of a set of attributes is bounded by M LogU, where M is the total number of occurrences of attributes in M. Obviously, M < M U. Therefore, the total time required for the computation of dependency basis for each element in CANDIDATE is bounded by N U2 M2 LogU. It follows that the time required by Algorithm 4.1, in the worst case, is O( Max( N2 M U2 , N U2 M2 LogU )). However, the number of all keys may be exponential to the input size, the complexity of the algorithm is still open. For the sake of efficiency, it is better for the input of the algorithm to be a set M of MVD’s such that there is no cover M´ of M, where M´⊂ M. In the next section, we use the concept of reduced MVD’s and M − to find a 4NF covering of a given set of MVD’s, which is useful in the design of relational database schemes.

5. Reduced MVD’s and 4NF Decomposition In [Fa], 4NF database schemes have been introduced, and a procedure to obtain a 4NF decomposition of U w.r.t. a set of FD’s and MVD’s has been given. Lien has observed that for Fagin’s algorithm, there are two points which need to be improved [L1]. First, the database scheme R may contain redundant relation schemes, i.e. R may contain R i and R j such that R i ⊂ R j . Second, R is not guaranteed to be in 4NF, except when the input M is M + , while 4NF is the original objective of the decomposition algorithm . In order to solve these two problems, Lien introduced the following two concepts: a p-ordering of elements in LHS(M), and a 4NF covering of M [L1, L2]. A sequence of elements in LHS(M), X 1 , X 2 ,..., X n is called a p-ordering if X i ⊂ X j implies that 1 ≤ i < j ≤ n. Obviously, a p-ordering of LHS(M) gives an ordering of MVD’s in M, which is not necessarily unique. A 4NF covering M 1 of M is a covering of M such that for any set R ⊆ U, R is not decomposable w.r.t. M 1 iff R is not decomposable w.r.t. M + . A 4NF covering M 1 is minimal, if no proper subset of M 1 is also a 4NF covering of M. A modified version of Fagin’s decomposition algorithm which uses a p-ordering as the order of MVD’s to be considered in the decomposition is given in [L1]. The output of this algorithm has no redundant schemes. However, in order to get 4NF decomposition, for both of the algorithms, the input should be a minimal 4NF covering, or at least a 4NF covering [L1]. Finding a minimal 4NF covering M 1 of a given M is not an easy problem. Lien gave some necessary conditions for elements in LHS(M 1 )−LHS(M), but he did not give a method to find M 1 . In fact, Lemma 3.3 and 3.5 show that any key of M satisfies the conditions given in [L1].

- 11 -

In this section, we show that a subset M * of the reduced MVD’s for a given set M of MVD’s is a 4NF covering of M. M * is obtained by removing any one of the MVD’s X →→ W, for each key X of M from the set of all reduced MVD’s implied by M. That is, if M − is the set of all reduced MVD’s implied by M, and M´ is the set of MVD’s obtained by randomly selecting one MVD X →→ W from M − for each X ∈ LHS(M − ), then M * = M − − M´. Example 5.1. Consider M and U given in Example 3.1. Let M * be M * = {A →→ G, B →→ H, GH →→ C  D, AH →→ C  D, BG →→ C  D, AB →→ C}. Choose A, B, GH, AH, BG, AB as a p-ordering, i.e. the order of MVD’s in M * is exactly the same as above. The algorithm in [L1] returns R = {AG, BH, ABC, ABD}, which is in 4NF w.r.t. M. Note that for any p-ordering, where M * is used as the input set of MVD’s for the decomposition, the result will always be a 4NF decomposition. However, if M itself is used where LHS(M) = {A, B, GH}, the p-ordering, A, B, GH, will result in R´ = {AG, ABCD, BH}. R´ is not a 4NF decomposition since ABCD is not in 4NF. In this example, M * is a 4NF covering but not minimal. M 1 = {A →→ G, B →→ H, GH →→ C  D, AH →→ C, BG →→ C, AB →→ C} is a minimal 4NF covering, and M 1 ⊂ M * . To show that M * is a 4NF covering, we need the following lemmas. Lemma 5.1. Let M be a set of MVD’s in U, R ⊆ U is decomposable w.r.t. M + . Then there is a key X of M, and two reduced dependents W 1 and W 2 of X such that R is decomposable w.r.t. X →→ W 1 and X →→ W 2 . Proof: R is decomposable w.r.t. M + implies that there is an MVD Z →→ V in M + such that R is decomposable w.r.t. Z →→ V, i.e. Z ⊂ R, and R ∩ (V−Z) and R−ZV are nonempty. Then, there exist two dependents V 1 and V 2 of Z such that V 1 ∩ (R ∩ (V−Z)) ≠ ∅ and V 2 ∩ (R−ZV) ≠ ∅. Then R is decomposable w.r.t. Z →→ V 1 and Z →→ V 2 . If both MVD’s are reduced, the proof is finished. Otherwise, assume Z →→ V 1 , is nonreduced. Then by Lemma 3.2, there exists a reduced MVD X →→ HV 1 , where X ⊂ Z and H ⊆ (Z−X). Let W 1 = HV 1 . Then, XW 1 ⊆ Z 1 and V 2 ∩ R ≠ ∅ implies that there is a W 2 ∈ DEP(X) such that W 2 ∩ R ≠ ∅. That is, R is decomposable w.r.t. X →→ W 1 and X →→ W 2 . If X →→ W 2 is nonreduced, then the same argument can be applied until both MVD’s are reduced. Lemma 5.2. M * is a cover of M. Proof: Since M − is a cover of M, by Proposition 3.1, it is sufficient to show that M * is a cover of M − . Assume X ∈ LHS(M − ). Let M X = {X´ →→ W  X´⊆ X and X´ →→ W in M * }. Then for any X´⊂ X, M X ´ ⊆ M X . We show that for each X ∈ LHS(M − ), if V ∈ DEP(X) such that X →→ V is not in M * , then M X = X →→ V, which implies M * is a cover of M − . We use induction on the number of proper subsets X´ of X, where X´∈ LHS(M − ). Basis: There is no X´∈ LHS(M − ) such that X´⊂ X. Then RDEP(X) = DEP(X). For all dependents W of X, except V, X →→ W in M * . Thus, M X = X →→ V by the complementation rule M1. Induction step: Assume the hypothesis is true for all X ∈ LHS(M − ) which has less than n proper subsets X´ in LHS(M − ). Let X ∈ LHS(M − ) which has n proper subsets X´ in LHS(M − ), where n ≥ 1. Then RDEP(X) ⊂ DEP(X). Let V ∈ (DEP(X)−RDEP(X)). By Lemma 3.6, there is an X´ ⊂ X such that V ∈ RDEP(X´). By the induction hypothesis, M X ´ = X´ →→ V, Since M X ´ ⊂ M X , M X = X →→ V. Let V ∈ RDEP(X). Then, V is the only reduced dependent of X such that X →→ V is not in M X . Since for all W ∈ DEP(X), except V, M X = X →→ W, X →→ V can be derived from M X by the complementation rule M1. The proof of the following proposition directly follows from Lemma 5.1 and 5.2. Proposition 5.1. Let M be a set of MVD’s. M * is a 4NF covering of M.

- 12 -

For any set M of MVD’s, using the 4NF covering M * as input to a decomposition algorithm guarantees 4NF decomposition. Grahne and Raiha [GR] give an algorithm to obtain a 4NF decomposition without explicitly using a 4NF covering or reduced MVD’s. In the general case, the order of complexity of their algorithm is exponential. This is also the case if the 4NF decomposition is obtained using the 4NF covering M * as input to a standard decomposition algorithm, since the computation of all keys (Algorithm 4.1) may require exponential time. However, in order to compute 4NF decomposition it is not necessary to compute all the keys. That is, from the construction of M * it follows that it is sufficient to compute only the keys with more than one reduced dependents. From Proposition 4.1, any such key which is not in LHS(M) must contain a proper subset which is a key and splits LHS(M). Thus a 4NF covering M * may be constructed more efficiently by modifying Algorithm 4.1 to generate only those keys properly containing a key which splits some element in LHS(M). The problem of computing 4NF decomposition more efficiently is left for further study. We should also note that a similar efficiency problem also exists in the algorithm by [GR], i.e. not all sets of attributes generated are useful for 4NF decomposition. However, for the set M of MVD’s in example 5.1, the sets of attributes generated by the algorithm [GR], the LHS(M * ), and the set of all keys (i.e. LHS(M − )) are the same which is also the same as LHS of a minimal 4NF covering for M.

6. Split Free MVD’s, Minimal Cover and 4NF covering In this section, we consider split free sets of MVD’s, i.e. a set of MVD’s which does not split any element in its LHS’s, and show some properties in the context of minimal cover and 4NF decomposition. We first show that if M is split free then the full version of M is a 4NF covering of M. Moreover, if M is conflict free then the 4NF covering M * obtained from reduced MVD’s is in fact a minimal cover of M. That is, M * is a minimal 4NF covering if M is conflict free. The following lemma is utilized to show this result. Lemma 6.1. Let M be a split free set of MVD’s, X be a key of M, which has more than one reduced dependent. Then X is in LHS(M). Proof: If X ∈ / LHS(M) then from Proposition 4.1, there exists X´ ⊂ X such that X´ splits some Y in LHS(M). A contradiction, since M is split free. For the class of split free MVD’s, the 4NF decomposition algorithm in [GR] requires polynomial time in the size of M. Proposition 6.1 below shows that if M is split free then using a full version of M as input to a decomposition algorithm [Fa, L1] also guarantees a 4NF decomposition, which obviously requires polynomial time. Proposition 6.1. If M is split free then the full version M´, i.e. M´ = {X →→ W  X ∈ LHS(M) and W ∈ DEP(X)} is a 4NF covering of M. Proof: By Lemma 6.1, any key which has more than one reduced dependents is in LHS(M). From the construction of M * and Proposition 5.1, we have LHS(M * ) ⊆ LHS(M´). Thus, M´ is a 4NF covering of M. This proposition conceptually explains why the 4NF decomposition algorithm in [L1] works for conflict free sets of MVD’s, and the algorithm in [GR] is in polynomial for split free sets of MVD’s. Example 6.1. Let U = ABCDEFG, M = { A →→ B, AC →→ F, AD →→ G, ACD →→ E }. Obviously, M is split free. Then the set M − of all reduced MVD’s, the 4NF covering M * , and the full version M´ are as follows. M − = { A →→ B  CDEFG, AC →→ F  DEG, AD →→ G  CEF, ACD →→ E }; M * = { A →→ B, AC →→ F, AD →→ G }; and M´ = { A →→ B  CDEFG, AC →→ B  F  DEG, AD →→ B  G  CEF, ACD →→ B  E  F  G }. Since LHS(M * ) ⊆ LHS(M´), M´ is also a 4NF covering of M. In fact, for this example, M * is also a minimal cover

- 13 -

of M. The proposition below states that if M is conflict free then the 4NF covering M * is a minimal cover of M, and hence is a minimal 4NF covering. Proposition 6.2. Let M be a conflict free set of MVD’s. Then M * is a minimal cover of M. Proof: Let X →→ W 1 be in M * . Showing that M´ = M * −{X →→ W 1 } ≠ X →→ W 1 is sufficient for the proof. Suppose M´ = X →→ W 1 , i.e. M´ M. Then M´ is conflict-free, since M is conflict free, and, by Lemma 6.1, LHS(M´) ⊆ LHS(M). By the definition of M * , there exists another reduced dependent W 2 of X such that X →→ W 2 is in M − −M * . Let V=W 1 W 2 . M´ = X →→ W 1  W 2 . By Lemma 4.1, Π V (M´) = Π V (X →→ W 1  W 2 ) = MD(∅, W 1  W 2 , V). By Lemma 3.3, there exits an MVD MD(∅, W´, V) in Π V (M´). Since it is nontrivial, W´ ⊂ V. That is, there exists an MVD Y →→ W in M´ such that Y →→ W splits V, and Y ∩ V = ∅. So, W ∈ RDEP(Y) and W ∩ V ≠ ∅. Since Y splits V but not X, there exists a W 0 __ _ __ __ ∈ DEP(Y) such that W 0 ∩ V ≠ ∅ and W 0 ∩ X = ∅. Let W = U− W 0 , and V = U−V. The fact that Y →→ W, W ⊇ X, __ __ __ and X →→ V implies Y →→ (V−W). But W 0 ∈ DEP(Y) and (V−W) ⊆ W 0 , it follows that V−W = W 0 ⊂ V. The fact _ _ that X →→ V, V ⊇ Y, W 0 ∈ DEP(Y), and W 0 ⊂ V implies that W 0 ∈ DEP(X), i.e. W 0 ∈ {W 1 , W 2 }. M´ is conflict free, so W 0 ∈ DEP(X ∩ Y). Similarly, we can show that {W 1 , W 2 } ⊆ DEP(X ∩ Y). Since X →→ W 1  W 2 is not in M´, Y ≠ X. Moreover, Y splits V, by Lemma 3.5 (e), Y ⊆/ X. Thus, X ∩ Y ⊂ Y. It follows that, both W 1 and W 2 are not in RDEP(Y), which contradicts that W ∈ RDEP(Y) and W ∩ V ≠ ∅. Another desirable property of split free MVD’s related to reduced MVD’s is that, if M is split free then the set of essential keys of M is unique, and it is a subset of LHS(M), i.e. for every minimal cover N of M, LHS(N) is unique, and LHS(N) ⊆ LHS(M). The following lemma is utilized in showing this result. Lemma 6.2. Let M be a set of MVD’s. X be a key of M such that X has only one reduced dependent W and X →→ W in M. Then M M − {X →→ W}. Proof: Let N = M − {X →→ W}, DEP M (X) denote DEP(X) w.r.t. M. For each V in DEP M (X), where V ≠ W, since X →→ V is nonreduced, by Lemma 3.6, there exists a Z ⊂ X such that W ∈ RDEP(Z). Thus, if we can show that DEP M (X´) = DEP N (X´), for X´⊂X, then X →→ W can be derived from N, i.e. M N. Assume not; i.e. DEP M (X´) ≠ DEP N (X´), for some X´ ⊂ X. >From the result in [HITK], there exists a S ∈ DEP N (X´) such that X ∩ S = ∅ and W ∩ S ≠ ∅. The fact that N ⊂ M implies that S is the union of some dependents of X´ w.r.t. M. By Lemma 3.5, there exists a V ∈ DEP M (X´) such that XW ⊂ X´V, which is a contradiction. Lemma 6.2 tells us that for any essential key X of a set of MVD’s, X has at least two reduced dependents. Let M be a set of MVD’s such that M does not split any element in LHS(M), N be a minimal cover of M. Then, for any key X ∈ LHS(N), X has more than one reduced dependents. By Lemma 6.1, this implies X ∈ LHS(M). By Lemma 6.1, for any key Y of M, if Y has more than one reduced dependents then Y ∈ LHS(N). It follows that LHS(N) is the set of all keys which has more than one reduced dependents, which completes the proof for Proposition 6.3. Proposition 6.3. If M is split free then M has a unique essential keys which is a subset of LHS(M). From this result, if M is conflict free and N is a minimal cover of M, then LHS(N) ⊆ LHS(M). This implies N does not split LHS(N), and for any X, Y in LHS(N), (DEP(X) ∩ DEP(Y)) ⊆ DEP(X ∩ Y). Thus N is also conflict free. That is, any minimal cover of a conflict free set of MVD’s is conflict free, which is not true for the minimum cover in a usual definition [BFMY]. Furthermore, a set of MVD’s has a conflict free cover if and only if its minimal cover is conflict free. By Proposition 3.2, this gives us an efficient method (polynomial in input size) to check if a given set of MVD’s has a conflict free cover, which is also discussed in [GT].

- 14 -

7.

Conclusion

In this paper, we define reduced MVD’s and keys of a given set M of MVD’s to eliminate redundancies in M, and use the concept of reduced MVD’s to define a minimal cover of M. Using this definition, we show that the set of left-hand-sides of any minimal cover of a split-free set of MVD’s is unique. We also show that any minimal cover of a conflict-free set of MVD’s is conflict-free. We present an algorithm to find all reduced MVD’s and keys of a set of MVD’s which is polynomial in the number of keys. We also give an efficient algorithm (polynomial in the input size) to find a minimal cover of a set of MVD’s. The reduced MVD’s are then utilized to find a covering of M (i.e. 4NF covering), which guarantees a 4NF decomposition of a universal scheme U with respect to M. We show that if M is conflict-free then this 4NF covering is a minimal 4NF covering as well as a minimal cover for M. W show that if M is split free then a full version of M is also a 4NF covering, this implies that if M is split free then the 4NF decomposition can be computed in polynomial time.

Acknowledgements: We are grateful to the anonymous referees who reviewed an earlier version of this paper and made many constructive comments. In particular, a referee brought the paper by Grahne and Raiha [GR] to our attention. The proofs of lemmas 3.3 and 4.1 are due to another referee who simplified our earlier proofs for these lemmas.

References [BB]

Beeri, C., and Bernstein, P.A., Computational Problems Related To the Design of Normal Form Relation Schemes, ACM TODS, Jan. 1979, pp. 30-59.

[BBG]

Beeri, C., Bernstein, P.A., and Goodman, N., A Sophisticate’s Introduction to Database Normalization Theory, Proc. VLDB, 1978, pp. 113-123.

[BFMY]

Beeri, C., Fagin, R., Maier, D. and Yannakakis, M., On the Desirability of Acyclic Database Schemes, JACM, July 1983, pp. 479-513.

[Fa]

Fagin, R., Multivalued Dependencies and a New Normal Form for Relational Databases, ACM TODS, Sept. 1977, pp. 262-278.

[Ga]

Galil, Z., An Almost Linear-Time Algorithm for Computing a Dependency Basis in a Relational Database, JACM, Jan. 1982, pp. 96-102.

[GR]

Grahne, G., and Raiha, K. J., Database Decomposition Into Fourth Normal Form, Proc. 9th VLDB, Nov. 1983, pp. 186-196.

[GT]

Goodman, N., and Tay, Y. C., Synthesizing Fourth Normal Form Relations From Multivalued Dependencies, Technical Report, May 1983, TR-17-83, Aiken Computation Laboratory, Harvard University.

[HITK]

Hagihara, K., Ito, M., Taniguchi, K., and Kasami, T., Decision Problems For Multivalued Dependencies in Relational Databases, SIAM J. Comput. 8,2 (May 1979), pp. 247-264.

[L1]

Lien, Y.E., Hierarchical Schemata for Relational Databases, ACM TODS, March 1981, pp. 48-69.

[L2]

Lien, Y.E., On the Equivalence of Database Models, JACM, April 1982, pp. 333-362.

[Ma]

Maier, D., Minimum Covers in the Relational Database Model, JACM, Oct. 1980, pp. 664-674.

[OY]

Ozsoyoglu, Z. M., and Yuan, L.Y., A Normal Form For Nested Relations, Proc. of the 4th ACM PODS, Mar. 1985, pp. 251-260.

[Sc]

Sciore, E., Real World MVD’s, Proc. SIGMOD, 1981, pp. 121-132

[Ul]

Ullman, J.D., Principles of Database Systems, Computer Science Press, Potomac, Maryland, 1983.

- 15 -

[ZM]

Zaniola, C., and Melknoff, M.A., On the Design of Relational Database Schemata, ACM TODS, Mar. 1981, pp. 1-47.

[YO]

Yuan, L.Y., and Ozsoyoglu, Z.M., Unifying Functional and Multivalued Dependencies For Relational Database Design, (to appear in Proc. of the 5th ACM PODS,1986).