ON UPDATING INHERITANCE RELATIONSHIP IN ... - Semantic Scholar

1 downloads 0 Views 81KB Size Report
IN XML DOCUMENTS. Eric Pardede, J. Wenny Rahayu1, David Taniar2. Abstract. It is a fact that many XML query languages lack support for update operations.
ON UPDATING INHERITANCE RELATIONSHIP IN XML DOCUMENTS

Eric Pardede, J. Wenny Rahayu1, David Taniar2 Abstract It is a fact that many XML query languages lack support for update operations. Those that have minimal support do not concern with preserving documents’ constraints, and consequently the results are updated XML documents with low integrity. In this paper we propose a methodology to accommodate XML Update without violating the original document constraints. The main focus is on the preserving the inheritance relationship semantics. Based on the exclusive-disjoint constraints, we distinguish the inheritance into the union and mutual exclusive inheritance. And based on the number of ancestors, we distinguish them into single and multiple inheritance. The proposed method can be implemented in different ways, where in this paper we apply XQuery language. Since the XML update requires schema, in this paper we also propose the mapping of the inheritance relationship in the conceptual level to the XML Schema. We use XML Schema for structure validation, even though the algorithm can be used by any schema languages.

1. Introduction In the last few years the interest in storing XML Documents in native XML Databases (NXD) has emerged rapidly [1, 11]. The main idea is to store documents in their natural tree form. However, it is widely-known that many users still prefer to use RDBMS as their document storage. One main reason of not using the NXD is the incompleteness of its query language. Many proprietary XML query languages and even W3C-standardized languages still have limitations compared to SQL. One of the most important limitations is the lack of support for update operations [11]. Different NXD applies different strategies for XML updates. Very frequently after the update operations, XML document losses the semantic stated in the conceptual level. To the best of our knowledge, there is no XML query language that has considered the semantic problems emerged from the update operations such as the insertion of duplicate content, deletion of a key, etc. We suggest this as an important issue to raise and to investigate further. This paper proposes a methodology for XML update operations. Our main contribution is that we have covered the constraints embedded in the documents. The constraints discussed in this work are those that can be found in the the inheritance or generalization relationship. In this relationship, an object/class inherits the properties from another object/class [2]. The former is called the sub-class and the latter is the super-class [9]. There are two constraints involved in this relationship type: the 1

Department of Computer Science and Computer Engineering, La Trobe University, Bundoora VIC 3083, Australia, [email protected], [email protected] 2 School of Business Systems, Monash University, Clayton VIC 3800, Australia, [email protected]

exclusive disjunction and the number of ancestor. It is important to emphasize that other XML document constraints have been discussed in our previous works for updating aggregation relationship [6] and updating association relationship [7]. XML update operations will require structure validation. For that purpose we propose the use of XML Schema [15]. Therefore, a part of this paper is allocated to developing a transformation on the inheritance relationship into XML Schema, in a way that suits to our proposed methodology. Even though we use XML Schema, the transformation can be conducted to any standardized or proprietary schema language. The rest of the paper is structured as follows. Section 2 provides the related work on XML Document update including the updates using XQuery language [14]. Section 3 briefly introduces the inheritance relationships in XML Documents as the background. In section 4 we provide the transformation/mapping method of the XML Document inheritance relationship into logical model XML Schema. Section 5 discusses the proposed update methodology grouped by the operation type. Finally, the paper will be concluded in section 6.

2. Related Work: Updating XML Documents There are several strategies for updating XML documents in NXDs [1, 11]. At the time of writing, there are three main strategies: (i) use proprietary update language that will allow updating within the server [3][10], (ii) use XUpdate, the standard proposed by XML DB initiative for updating a distinct part of a document [5][17], and (iii) use XML API after retrieve the document out of the database [4]. It is important to mention that none of these are concerned with the semantic constraint of the XML document that is being updated. : Different strategies have limited the database interchangeability. To unite these different strategies, [14] proposed the update processes for XML Documents into an XML language. These processes are embedded into XQuery and thus, can be used by any NXD that supports this language. XQuery is a W3C-standardized language designed for processing XML data [9, 16]. When it was first introduced, XQuery was only used for document query or retrieval. [14] introduces some primitive operations embedded in XQuery for XML document update. The update is applicable for ordered and unordered XML Documents and also for single or multiple level of nodes. Figures below show the original XQuery expression (see Fig.1) using FLWOR expressions and the extensions (see Fig.2) of the expression for update operations. FOR $binding1 IN path expression… LET $binding := path expression… WHERE predicate, … ORDER BY predicate, … RETURN results

Figure. 1. XQuery FLOWR Expressions

FOR. . .LET. . .WHERE. . . UPDATE $target{ DELETE $child| INSERT content [BEFORE|AFTER $child]| REPLACE $child WITH $content| {,subOp}* }

Figure. 2. XQuery UPDATE Extensions

In [14] however, there is no checking mechanism used to preserve the semantic of the XML documents. In regards to the inheritance constraint we can briefly mention some update examples that can result in many problems. They include the deletion of a super-class instance that has many existing sub-class instance, the insertion of a new sub-class instance in an inheritance with exclusive-disjoint constraint, the replacement of a super-class instance in which the new key does not confirm with the sub-class instances, etc. The previous works do not consider the affect of update to the semantic correctness of the updated documents. This fact has motivated us to conduct the reseach on preserving the relationship constraint, this paper focuses on the constraints found in the inheritance relationship type.

3. Background: Inheritance Relationships in XML Document Inheritance relationship is a relationship that defines an entity/class in terms of another [2]. It organises the classes in taxonomies based on their similarities and differences. The ancestor or super-class holds common information, while the descendants or sub-classes can inherit this information or specify additional contents. The inherited information can be reused or overridden by the sub-classes. For an XML document, the instance of the class is a node or a document. Semantically, this relationship can be distinguished by the exclusive disjoint constraint and the number of ancestor [9]. Each influences how one node/document relating to another node/document. •

Exclusive disjoint constraint identifies that in the relationship a super-class can only inherit the common contents to one and only one sub-class type. It means that with this constraint, once there is an instance of a sub-class type, we cannot create another instance from another subclass type inheriting the super-class content. This constraint differentiates the inheritance relationship into two types: union inheritance and mutual-exclusion inheritance. The former is the inheritance without the exclusive-disjoint constraint. In this case a super-class can inherit the common content to many instances from different sub-class type. We emphasize that it does not mean the super-class instance is the union of all its sub-class instances. On the other side, the mutual-exclusion inheritance is the inheritance type with exclusive disjoint constraint. Exclusive disjoint constraint can be easily identified in XML Data Model such as in Semantic Network Diagram [2] (see Fig. 3). In this example a staff can only inherit the properties into instances of type academic or management because there is an exclusive disjoint constraint.



The number of ancestor distinguishes the inheritance relationship into single inheritance and multiple inheritance. The former indicates that a class can get the properties from only single ancestor, while in the latter, a class can get the properties from more than one ancestors. In Fig. 3, the inheritance between staff, academic and management is a single inheritance. An example of multiple inheritance is shown below (see Fig.4). A tutor can inherit the properties from a staff and a student. If both super-class can inherit the properties for the same sub-class instance, we find thecase of union inheritance. If a sub-class instance can only inherit the properties from one of its super-classes, we find the case of mutual-exclusion inheritance.

p

StaffID

g

Address

a

STAFF a

StaffID

a

g

p STAFF

STUDENT

Name

Phone exclusive

a

g

a

g

Name exclusive

ACADEMIC

Qualification

a

Research

a

Course

MANAGEMENT Phone

a

StudentNo

p

a

a

Position

TUTOR

a

a Duties

Figure. 3 Single Inheritance Example

Year a

Schedule

Subject

Figure. 4. Multiple Inheritance Example

4. Mapping Inheritance Relationship to XML Schema Updating XML documents using a query language will require a schema for validation. We select XML Schema because as well as its precision in defining the data type, XML Schema has an effective ability to define the semantic constraints of a document depicted in the previous section. In XML Schema the inheritance can be represented by using “extension”. Firstly, we have to define the super-class. Secondly, inside a sub-class element, we determine the extension base, which is the super-class. For example continuing Fig. 3, the sub-classes academic and management are both extended from the staff super-class.

The XML Schema shown does not specify the inheritance type in regards to the exclusive disjoint constraint. This constraint can actually be enforced by using “choice” inside a complex type. Under the “choice” constraint, we can specify what elements that a complex type will have and specify whether it is a union or a mutual-exclusion inheritance. For the purpose, we create another type that will have elements of the sub-classes type. For example in the following XML Schema we create a new complex type new_staff_TYPE with elements academic and management. In this new complex type, we also require the key/ID of the Staff_TYPE. It is important to accommodate the update process checking in section 5.

union inheritance mutual-exclusion inheritance

Every time we want to create a sub-class instance we should create it through the element New_Staff, which has new_staff_TYPE instance. Note that we also need to declare the key using key/keyref mechanism [15]. Single inheritance instance

The mappings so far are applicable only for the single inheritance. The problem is arisen if we want to map a multiple inheritance, for example (see Fig. 4), if we want to declare a complex type tutor under the complex types staff and student. At the time of writing a complex type can only have a single extension base. We propose another approach to handle the limitation. A sub-class will not have extension bases. Instead we will implement the super-classes as the sub-class’s child element. Following the example (see Fig. 4), the sub-class Tutor will have child elements Staff and Student. Multiple inheritance instance

5. Proposed XML Update Methodology Now we have shown how to map the conceptual model into logical XML Schema, we can begin to propose the methodology for the XML update processes. The update can be differentiated into three main parts: deletion, insertion, and replacement. We only check the update process if the target node is a key node because it can create a referential integrity problem. It means our checking mechanism will avoid the duplication of key attribute. To

check whether the simple attribute/element update is confirmed to the inheritance constraint is an exhaustive process to the system. Thus, we do not recommend the process at this stage. Since this work focuses on the inheritance constraint, we assume that we have known that the target node is a key node. The function to check this condition is discussed in the previous work [7]. We also empashize that the proposed method aims to avoid the constraint violation during the update. The outcome of our method is the decision to update or not to update a node/document. We do not suggest, at least at this stage, enforcing data constraint by using cascade or nullify actions. The main reason is that the dynamic typechecking can be very costly and slows down an application [12, 13]. If for example a super-class has a large number of sub-class instances, upon the update we have to check all sub-class instances and followed by updating every single of them. 5.1. Method for Delete Operation For the deletion of a key node, checking on instances is required. It aims to avoid the situation when a sub-class instances have no super-class instance. The following chart (see Fig. 5) shows the mechanism for the update in an inheritance relationship. It can be applied both for single and multiple inheritance. The difference lies on the target of the checking. The checking is required if the deletion is conducted upon a super-class instance, where we have to check whether there is any sub-class instance with the same key node. We do not concern on the deletion of a sub-class instance because the same key node might exist in another sub-class instance or in the super-class instance.

START

Is the key target att/ele from a base complex type ?

No

Yes

Any instance of subclass type with the same key target?

No RETURN TRUE

Yes RETURN FALSE

STOP

Figure 5. Checking for Deletion in Inheritance Relationship

For implementation example, the following XQuery shows functions used for predicates in the deletion process of a single inheritance. Function checkSubClassInstance checks whether there is any sub-class instance (under New_Class element) has the same key node with the super-class key node. If the function returns false, we cannot perform the deletion. FUNCTION checkSubClassInstance($BindingPath, $newSuperClass, $cName, $cContent) RETURN BOOLEAN { LET $subClassInstance:=$BindingPath/$newSuperClass[@$cName, “$cContent”]

RETURN IF exists ($subClassInstance) THEN FALSE ELSE TRUE } Example: FOR $g IN document("Faculty.xml") $p $g/Staff(@StaffID = "MR01") $c IN $p/StaffID LET $cName := StaffID $cContent := “MR01” $newSuperClass := New_Staff UPDATE $p{ WHERE checkSubClassInstance($g, $newSuperClass, $cName, $cContent) UPDATE $p{ DELETE $p (:delete the key and the siblings:) }}

The example following the functions shows the key deletion StaffID in document “ Faculty.xml” of instance with StaffID equals to “ MR01” . We have to check the New_Staff instance with StaffID equals to “ MR01” . If it exists, the deletion has to be restricted. Note that if the checkSubClassInstance function returns TRUE, we will delete whole elements and not only the key StaffID. This is because we do not have trigger-based delete, which will delete the whole element if the key is deleted. 5.2. Method for Insert Operation For insertion, we have to check sub-class instance to avoid the choice constraint violation. On the other side, we do not have to check the insertion of a new super-class instance. The following figure (see Fig. 6) depicts the mechanism of the insertion update both applicable for single and multiple inheritance. The first condition separates the union and the mutual-exclusion inheritance.

START

Does the target node has “ choice” constraint?

No

Yes

Any instance has element with the key equals to the target key?

No RETURN TRUE

Yes RETURN FALSE

STOP

Figure 6. Checking for Insertion in Inheritance Relationship

Function checkChoiceConstraint returns TRUE if the target does not have the choice constraint. Otherwise, it will call the function checkSubClassInstance shown in the previous section.

FUNCTION checkChoiceConstraint ($BindingPath, $newSuperClass, $cName, $cContent) RETURN BOOLEAN { LET $choiceConstraint:=$BindingPath/$newSuperClass/choice RETURN IF exists ($choiceConstraint) THEN checkSubClassInstance(($BindingPath, $newSuperClass, $cName, $cContent) ELSE TRUE } Example: FOR $g IN document("Faculty.xml") $p $g/New_Staff(@StaffID = "MR01") $c IN $p/StaffID LET $cName := StaffID $cContent := “MR01” $newSuperClass := New_Staff UPDATE $p{ WHERE checkChoiceConstraint($g, $newSuperClass, $cName, $cContent) UPDATE $p{ INSERT new_att(StaffID, “MR01”) }}

The example shows the checking of StaffID insertion with content “ MR01” under New_Staff element. If we assume the New_Staff has choice constraint, the checkSubClassInstance will be activated and will return true if there is no duplication. 5.3. Method for Replace Operation Since replacement can be seen as a combination of deletion and insertion, we can reuse the functions we have already described in the last two sub-sections. In fact, the checking is only required for replacement of the super-class key node. It aims to make sure that there is no sub-class instance that has the same old key value. A replacement of key values is not supported in Object-Oriented Database (OODB). However, this operation has been widely-used in Relational Database (RDB). In XML Database, we adopt the practice in RDB since in both databases the users actually define the value of the keys. It is different with OODB, where the key (in this case object id) is usually system generated. For the replacement of a sub-class instance, there is no additional checking required. Therefore, the checking mechanism for replacement update is similar to the one in figure 5. The only difference is that there are two target key node content for replacement, the old and the new. For the checking, we used the old value. XQuery below shows the example of replacement for StaffID element in the Staff super-class element. We want to replace the old value “ MR01” into “ WR01” . Note that the query is similar to the query for deletion in the previous subsection. The value to check is the old value, which is “ MR01” . Example: FOR $g IN document("Faculty.xml") $p $g/Staff(@StaffID = "MR01") $c IN $p/StaffID LET $cName := StaffID $cContent := “MR01” $newSuperClass := New_Staff UPDATE $p{

WHERE checkSubClassInstance($g, $newSuperClass, $cName, $cContent) UPDATE $p{ REPLACE $C WITH new_attr(StaffID, “WR01”) }}

6. Discussions Section 3 discusses a specific conceptual semantic in XML documents, which is the inheritance relationship. Using semantic network diagram [2] we can depicts one of the constraint, exclusivedisjoint constraint. It differentiates the inheritance into union and mutual-exclusion inheritance. Even though this constraint is widely used in the object-oriented conceptual model [9], very rarely we find XML storage application uses this semantic. This work wants to extend the modeling capability of XML Database. Another constraint in inheritance relationship is the number of ancestor. To the best of our knowledge, no XML document tree model that can capture multiple number of ancestors or multiple inheritance. Therefore, we modify the semantic network diagram to accommodate multiple inheritance. The rules used are the same with the rules for single inheritance, only now the sub-class can have many super-classes. Section 4 proposed the transformation of the conceptual level semantic constraint into a logical model, in this case using schema language. This step is necessary before we can proceed to the next step, which is the XML Update methodology. It is because our methodology will require the document structure validation that is provided by the schema language. We have a few selection of schema language. We select XML Schema based on its modeling strength [2, 15]. XML Schema also has features that can capture the inheritance relationship constraints depicted in our semantic network diagram. For single inheritance in our XML Schema, along with the super-class and the sub-class types we propose an additional type that has the child elements of the sub-classes type. This new type also has the same key as the super-class type key. Everytime we want to instantiate a sub-class type, we have to do it through the additional class type. There are two contributions of this practice: a. We enforce the exclusive-disjoint constraint. Unlike in the usual practice, where one can instantiate as many sub-class type without considering how it relates with the super-class, we check whether the super-class can have different instantiated sub-class type b. We avoid the possibility of database anomalies, created by redundancy insertion of key node by more than one sub-class type Like in the conceptual level, XML Schema also cannot capture the multiple inheritance. In the future, there might be additional features in the always-improved XML Schema that can be used for this special case. At this stage, we have to implement the super-classes as the element of the subclasses. Therefore, if a sub-class instance inherits properties from two super-classes, the instance will have two child elements of the super-class types. The schema derived from the previous rules are used for the next step. Section 5 discusses the proposed methodology for updating XML document. The method is divided into three operations: deletion, insertion, and replacement. For example, we utilize the query language XQuery. Not only

because it is the most complete W3C-standardized language [16], but also because it is used by most NXD products [3, 4, 5, 8, 10]. For deletion, the checking is performed if we want to delete a super-class instance. A super-class instance can only be deleted if it does not inherit any content to any sub-type instance. The checking is not necessary for another direction. In other words, an ancestor cannot be deleted if it has any predecessors, while a predecessor can be deleted regardless of the existence of an ancestor. For insertion, we perform the checking before inserting a sub-class instance. In a union inheritance, a sub-class element can only be added under a new class instance if there is no duplication of the same sub-class. In a mutual-excusion inheritance, a sub-class element cannot be added under a new class instance, if there is already another sub-class element inside the instance. On the other side, there is no checking required (in regards to inheritance constraint) to insert a super-class instance. It is possible because there is a general practice that a sub-class should not be created before the creation of the super-class. Finally, we only perform checking for super-class replacement. We avoid replacing a super-class instance that actually has inherited the content to one or more sub-classes. There is no checking is required for sub-class replacement. To the best of our knowledge, there is no NXD that have preserved the semantic of the XML documents during the update operations. This work has tried to maintain the semantic constraint focusing on the inheritance relationship by doing the checking during the update of the super-class instances and/or the sub-class instances.

7. Conclusion and Future Work Many XML applications still use DBMS based on established data model (such as Relational Model, Object-Relational Model, etc) for the document repository. Among different arguments, the poor capability of XML query languages is one of the most frequently mentioned. The XML query languages lack support for update operations. In this paper, we propose methodology to preserve semantic constraints during an XML update. The update operations are divided into deletion, insertion, and replacement. The focus in the paper is on the inheritance relationship type. For the purpose, we deal with the checking mechanism of key node update in super-class and sub-class instances. By doing this, we can preserve the conceptual semantic of the XML Documents. For implementation we apply one of the W3Cstandardized language, XQuery. Since update requires a document structure validation, we also propose the transformation of the document structure into a schema language. For this purpose we select XML Schema. The constraint captured is the exclusive-disjoint constraint and the number of acestors. It is important to emphasize that this paper deals only with a subset of the semantic constraint in XML Schema, specifically the one that relates to inheritance relationship. With this extension, XML query languages (in the form of XQuery) are becoming more powerful. Concurrently, it can increase the potential of using tree-form XML repository such as Native XML Database.

8. References [1] BOURETT, R., XML and Databases. http://www.rpbourret.com/xml/XMLAndDatabases.htm, 2003. [2] FENG, L., CHANG, E., DILLON, T.S., A Semantic Network-Based Design Methodology for XML Documents, in ACM Trans. Information System, Vol. 20, No. 4., ACM Press, 2002, 390-421. [3] IPEDO., Ipedo XML Database, http://www.ipedo.com/html/products.html, 2004. [4] JAGADISH, H. V., AL-KHALIFA, S., CHAPMAN, A., LAKHSMANAN, L. V. S., NIERMAN, A., PAPRIZOS, S., PATEL, J. M., SRIVASTAVA, D., WIWATTANA, N., WU, Y., YU, C., TIMBER: A native XML database, in VLDB Journal, Vol. 11, No. 4., Springer, 2002, 279-291. [5] MEIER, W.M., eXist Native XML Database, in A.B. Chauduri, A. Rawais, R. Zicari (eds.), XML Data Management: Native XML and XML-Enabled Database System, Addison Wesley, Boston, 2003, 43-68. [6] PARDEDE, E., RAHAYU, J.W., TANIAR, D., Preserving Constraints for Aggregation Relationship Type Update in XML Document, in Proceedings of IAWTIC 2004, to appear. [7] PARDEDE, E., RAHAYU, J.W., TANIAR, D., Preserving Referential Constraints in XML Document Association Relationship Update, in Proceedings of INTELLCOMMM 2004, Springer-Verlag, 2004, to appear. [8] ROBIE, J., XQuery: A Guided Tour, in H. Kattz (ed.), XQuery from the Experts, Addison Wesley, Boston, 2004, 378. [9] RUMBAUGH, J., BLAHA, M.R., LORENSEN, W., EDDY, F., PRAMERLANI, W., Object-Oriented Modelling and Design, Prentice Hall, Englewood Cliffs, 1991. [10] SODA Technology., SODA. http://www.sodatech.com/products.html, 2004. [11] STAKEN, K., Introduction to Native XML Databases. http://www.xml.com/pub/a/2001/10/31/nativexmldb.html, 2001. [12] SUCIU, D., On Database Theory and XML, in SIGMOD Record, Vol. 30, No.3, ACM Press, 2001, 39-45. [13] SUCIU, D., The XML Typechecking Problem, in SIGMOD Record, Vol 31, No.1., ACM Press, 2002, 89-96. [14] TATARINOV, I., IVES, Z.G., HALEVY, A. Y., WELD, D. S., Updating XML, in Proceedings of ACM SIGMOD 2001, ACM Press, 2001, 413-424. [15] VLIST, E. V-D., XML Schema, O’Reilly, Sebastopol, 2002. [16] W3C, XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery, 2001. [17] XML DB, XUpdate – XML Update Language, http://www.xmldb.org/xupdate/, 2004.