XML Access Control for Semantically Related XML ... - CiteSeerX

Proceedings of the 36th Hawaii International Conference on System Sciences - 2003

XML Access Control for Semantically Related XML Documents Vijay Parmar and Hongchi Shi Department of Computer Engineering & Computer Science University of Missouri-Columbia Columbia, MO 65211, USA [email protected] [email protected] Su-Shing Chen Department of Computer & Information Science & Engineering University of Florida Gainesville, FL 32611, USA [email protected]

Abstract The extensible markup language (XML) is a standard for describing information on the Internet and is quickly becoming the most preferred way to store and exchange information. The need to provide controlled access to such information is imminent. In this paper, we present an access control mechanism for a collection of semantically related XML documents such as a collection of user records in XML for a distributed information system where the existing XML access control mechanisms are not easily applicable.

1. Introduction The extensible markup language (XML) is a standard for describing information to be stored and exchanged on the Internet. There is an imminent need to provide controlled access to such information. In this paper, we present an access control policy and mechanism for a collection of semantically related XML documents. Our access control mechanism has features: (i) It is developed for XML documents semantically related to one another; (ii) Access control conditions can be specified based on the content of the documents; and (iii) Access control is rolebased. In the collection of semantically related XML documents for which we develop an access control mechanism, we assume that each XML document resembles an entity playing a certain role, and each entity has certain relationships with other entities (XML documents) in the collection. An example is a collection of records (XML documents) for users of different roles in a distributed

information system. It is not always necessary that the target XML document name be known prior to access. In fact, the target document names should be hidden, and the access requester should only know about the content for not accidentally revealing the identity of the entity playing a particular role. An access request may result in data coming from more than one document in the collection. Since the XML documents have relationships, each entity playing a particular role can have access to other entities playing a different role. For XML documents having such semantic relationships, the currently prevailing access control mechanisms are not easily applicable. As shown in the next section, the related work does not cater to such cases and are not flexible. Our access control mechanism is very flexible and stresses on placing conditions for access to any part of the document at any granularity. The rest of this paper is organized as follows. Section 2 gives the related work and discusses the lack of applicability of different approaches for semantically related XML documents. Section 3 presents our way to specify access control policies, discussing in detail about policy and condition specification. Section 4 describes our access control mechanism and its overall model. Section 5 presents an implementation of our access control mechanism. Section 6 concludes the paper with some remarks.

2. Related Work Our access control mechanism is related to many of the previously available access control and authorization models [3-6,8,10-12,14,19]. The work by Bertino et al. provides access control over general XML document

0-7695-1874-5/03 $17.00 (C) 2003 IEEE

1


sources [3,4], and Wand and Tan proposed a scalable implementation [19]. However, it lacks the capability to handle multiple XML documents efficiently. The Author-X prototype system by Bertino et al. is similar to our implementation [3], but it lacks the ability to deal with complex semantic relationships of the XML documents in the collection. It also lacks role-based access control, only relying on user-based access control. Our approach uses variable context information in specification of access conditions, a feature that is not present in Author-X. Author-X assumes that the requester knows about the location of the data prior to making the request, while our approach assumes XML documents stored as a collection, and each document is considered as an entity. Each entity may have parts of documents that are readable by the entity itself and by others assuming certain roles. Hence, an entity may not have access control over parts of its own document. Finally, the request to the system is done over the collection of entities (XML documents), and access rights are determined after checking access conditions. The query result may come from more then one XML document (entity) in the collection. Other related work differs and requires the source or target to be specified in order to complete an access request [5,6,8,10-12,14]. The work by Kudo and Hada exhibits a formal access control specification language [11], but it lacks the capability to deal with more than one XML document having semantic relationships. Our approach has similarities in terms of using variable context information and providing role-based access control. The work by Damiani et al. exhibits use of SOAP services to enforce access control [5,6], but lacks the capability to provide access control while considering multiple XML documents with semantic relationships. Another piece of somehow related work is XLINKIT [13]. It provides a rule language to allow rule-based link generation and checking consistency between different documents. It allows variable context information to be specified while specifying the links to maintain consistency. Our approach also uses the concept of variable context information. However, the purpose of the rule language of XLINKIT is not for access control policy specification. The existing work on XML access control is mostly concerned with access control for unrelated individual XML documents. The existing access control mechanisms are not flexible enough to handle semantically related XML documents and to provide a general control mechanism for accesses made to a collection of related documents rather than individual documents. All the related work requires the source and target documents to be specified explicitly. In many scenarios where XML documents are related semantically, they need to be accessed as a collection. The XML document collection is accessed in its entirety, and the access results are checked and returned depending on

the decisions based on the authorization and access control policy.

Figure 1: Sample relationships of entities playing particular roles

Figure 2: Relationship between entities (XML documents)

3. Access Control Policy Specification The access control policy (ACP) specification in this paper is based on several assumptions and thus is efficient in its applicability for certain cases providing access control to XML documents. First, XML documents are not accessed by the document names, but rather they are conceived as entities playing certain roles. Hence, a particular entity playing a certain role may try to request data from the collection of XML documents (collection of entities playing different roles) by giving a general request over the whole collection. The requesting entity’s identity and role would cause the access control mechanism to

0-7695-1874-5/03 $17.00 (C) 2003 IEEE

2


restrict its access according to the access control policy. Second, the entities are related to one another. Each entity may act by playing a certain role, defining its relationship with other entities as shown Figure 1. Third, entities may assume different roles at different times, but an entity cannot have more than one role at any time. Fourth, since the entities are related to one another, the data in XML document of an entity may point to data in the XML documents of other entities as shown in Figure 2. Fifth, all the XML documents in a collection should comply with the same DTD. That is, all entities playing any roles would have similar document structure but different data content, which is the main reason why our ACP specification goes one step ahead of other related work and allows flexible condition specification.

3.1. Overview of access control policy specification The access control policy specification presented here allows specification of access control policies at different granularity levels as shown in Figure 3. The specified policies are DTD based. The conditions for accessing elements can be configured for all elements and attributes of the DTD [16]. With the use of variable context information, the policies can also be considered to be document-based because the policies apply to individual documents that satisfy the conditions.

Figure 3: The access control policy DTD The access control mechanism can be dynamically configured with the access control policy document. The policy document is an XML document confirming to the DTD shown in figure 3. The “role” element is used to specify the type of role and the “operations” allowed for this role. The “operations” element defines a set of “operation” elements. An “operation” element gives the type of the operation and the allowed elements. The elements listed in the allowed element list resemble the elements that can be accessed by the entity playing the particular role. However, presence of element name in the allowed element list does not guarantee access, since the conditions also need to be satisfied for access to be allowed for a particular element. For each “allowedelement”, there

are an “elementname” element to specify the name of the allowed element and a “conditions” element to specify the conditions that are to be laid on the access to the element for the role. The “conditions” element has “condition” elements to specify individual conditions. The “condition” element allows the condition type and the “source_element” (in the document corresponding to the requester) and “destination_element” (in the requested document) to be specified. The attributes “QueryRelative” and “DocumentRelative” are to mention, as to where should the respective locations specified in the “source_element” and “destination_element” be relative to. For example, the condition could be of type “exists,” and the source element may point to an element in the document of the entity (with use of context variables) while the destination may point to the same element in the document to be accessed.

3.2. Operation types and execution There are four types of operations, namely, read, write, create, and delete. The operations are performed by first querying the XML document collection with the XPATH query expression provided in the access request. Our access control policy specification makes use of variable context information; hence, the condition specification is not just based on the DTD, but also on the content of the elements to be checked to satisfy a condition. If the query were to be pre-checked, the conditions would have to be non-contextual. Since context information is used in processing the access request, the collection of documents has to be queried first and then checked. The general steps required for all operations involve checking compliance of the source and destination elements and document fragments with the respective DTD. Compliance with DTD is important as it ensures data integrity. Read: The “read” operation allows a read-only operation over the elements in the allowed element list. The steps involved in a “read” operation are: a) The XPATH query is processed on the collection of XML documents. b) The query results are checked against the list of allowed elements for the user (under the “read” operation and under the appropriate role). The result of the query having document fragments root elements other than those present in the allowed element list of that role and under “read” operation are discarded from the results. The result of this step leaves a set of document fragments to be checked for further access control. c) The resultant set of document fragments is now checked with the conditions for each allowed element and each document fragment. d) If the conditions are satisfied, the content of the allowed element is not deleted; otherwise, the

0-7695-1874-5/03 $17.00 (C) 2003 IEEE

3


content of the element is deleted. The document fragment structure is not altered by the “read” operation. Figure 4 shows an example of the “read” operation.

Figure 4: A sample ‘read’ operation Write: The “write” operation allows the user to update the content of the queried element. The destination element should be present in the allowed element list for the user’s “write” operation. If the element name is not present in the allowed element list, no “write” operation is carried out. The steps involved in the “write” operation are: a) The destination elements (document fragments having the element as root element) are retrieved using the XPATH [18] query provided by the user.

list, the user has the write privilege, hence the conditions will checked next. c) The above step is carried out for all the document fragments right from the root element to all the leaf elements. d) If any condition prohibits update over some element, then the “write” operation is abandoned. Figure 5 shows an example of the “write” operation. Create: The “create” operation is similar to the “write” operation with a slight difference that there is no destination element(s) to be overwritten. Instead, a new element or document fragment is added to the destination specified by the XPATH query. The steps involved in the “create”operation are: a) Obtain the destination element where the creation of the element will be done. b) Check the “alowedelement” list for permissions. If allowed, proceed with the creation of the new element and its sub-elements. c) For each element in the source document fragment, check its presence in the allowed element list. If not present, the “create” operation is abandoned. Delete: The “delete” operation is similar to the “read” operation. The similarity lies in checking if that operation is allowed over certain elements. If allowed, then proceed to check if the “delete” operation is allowed on its child elements. Continue the process until all the elements are checked for the “delete” operation. If deletion is denied on any of the child elements, then the operation is abandoned.

3.3. Variable context information

Figure 5: A sample ‘write’ operation b) The “write” operation’s “allowedelement” list is checked with the results. If the element name of the document fragment root is not present, then the “write” operation is abandoned, as the user has no write privilege over that element. If present in the

The conditions that are specified in the access control policy can be very complex and are usually related to the access requester (an entity). It is desirable to specify conditions in a general manner and then individualize each condition for respective entities, which requires a context for the access to be maintained. With variable context information it is possible to specify context sensitive conditions. The same access requests and the same conditions can be used for different entities representing a particular role. Context variables support a logical approach to condition specification. The following are some examples of variable context information: a) $entityid$ -- ID of the entity making the request. It is like a user ID. The variable $entityid$ represents a variable for holding the value of the entity’s identification number. The variable is replaced, while processing the access request, by its actual identification number of the requesting entity. The variable shall be used in specifying the conditions in the access control policy or shall be

0-7695-1874-5/03 $17.00 (C) 2003 IEEE

4


used in the list of variables to create more complex context variables. b) $entityrole$ -- variable to represent the role of the entity while making the access request.

Figure 6: DTD for variable list documents and a document instance In the specification of a condition, an XPATH expression [18] is used with context variables. Before the XPATH is used to locate required information from the collection of XML documents, the context variables are substituted with the values in the context of the requesting entity. Figure 6 shows the DTD used to define context variable lists and a sample variable list document. Variable context information can be conveniently used with XPATH expressions to simplify the access control policy specification. It allows specifying conditions over access to elements in general for a particular role, but when the access request is processed the variable context enables the conditions to have the context of the entity. This eliminates the need for a separate set of conditions for each individual entity playing a particular role. We will see some examples of using variable context information to specify the conditions in the access control policy later on.

3.4. Condition specification The conditions indicate constraints for the access to the particular allowed element for a specific operation. The presence of the name of an element in the allowed element list indicates that it is allowed for access for the particular role only if the conditions are satisfied. Conditions can be specified in the access control policy document with the “condition” element. The condition specification mechanism for access control is very flexible. It can be used to specify both simple and complex conditions. As shown in Figure 7, the AND signifies that all the conditions should be satisfied, and the OR signifies that one of the conditions should be satisfied. Hence, the implied product

of sums allows simple as well as complex condition specification.

Figure 7: Logical combination of conditions The condition type is indicated with the “conditiontype” attribute of the “condition” element. There are four types of conditions imposed over allowed elements. (i) Prohibit: - The “prohibit” condition type indicates prohibition of any element in the document fragment rooted by the allowed element. - As it is known, the conditions are checked after the query is processed and the results are obtained. The resultant root elements of all the document fragments are any of the “allowedelement” for which these conditions prevail. Hence, a “prohibit” condition is to prohibit any element at any level. - It is a unary condition with only a destination element specified. As shown in Figure 8, there is a “prohibit” condition on address element, wherein the “streetnumber” is not allowed for the particular operation. Hence, the element is discarded, or the content of the element is removed if compliance with the DTD is required. (ii) Equals: - The ”equals” condition type requires the source and destination elements in the condition to be equal. - If they are equal, the condition is true; otherwise, it is false. As shown in Figure 8, the “equals” condition requires the value of “stdaddress” element to be equal to the variable value to allow the “address” element to be accessed.

0-7695-1874-5/03 $17.00 (C) 2003 IEEE

5


Note that $username$ and $destination_document_username$ can be XPATH expressions or variables pointing to the elements containing the user name. Example 2: A teacher has permissions to update the grades of the students in the course offered by him/her and also has permissions to view grades of other students.

Figure 8: A sample condition specification (iii) NotEquals: The “notequals” condition is similar to the “equals” condition except that it negates the “equals” condition. (iv) Exists: - The “exists” condition requires destination elements to exist in order to allow the required operation on the “allowedelement” for which the condition prevails. - It is a unary condition. As shown in Figure 8, the “pobox” element should exist so that the “address” element is allowed for access. We use the following two examples to further illustrate how flexibly access conditions can be specified in the access control policy. Example 1: A student is not allowed to update his grades, but allowed to view them. grades $username$ $destination_document_username$

grades grades