A Query Language for a Versioned Object

2 downloads 0 Views 822KB Size Report
features aforementioned. For instance, consider an example taken from an academic system. The curric-. XXI Simpósio Brasileiro de Banco de Dados. 73 ...
XXI Simpósio Brasileiro de Banco de Dados

A Query Language for a Versioned Object Oriented Database∗ ´ Rodrigo Machado1 , Alvaro Freitas Moreira1 , Renata de Matos Galante1 , Mirella M. Moro2 1

2

Instituto de Inform´atica – Universidade Federal do Rio Grande do Sul (UFRGS) P.O. Box 15.064, CEP 91.501-970, Porto Alegre, RS – Brazil

Dept of Computer Science & Engineering – University of California Riverside (UCR) 900 University Ave, Riverside, CA 92521 – USA {rma, afmoreira, galante}@inf.ufrgs.br, [email protected]

Abstract. Many applications require that all data updates be stored on and retrieved from a database. Such requirement is supported on object oriented databases through versioning. While most related work focuses on different aspects of versions concepts, design modeling and efficient processing of versions, there is yet to be a precise definition of a query language for database systems with versions control. Therefore, we define a query language (called VOQL, Versioned Object Query Language) for an object oriented database with versioning support. VOQL extends ODMG and OQL for managing the evolution of different elements of the data. Besides the language main features, we provide the base of a formal definition for VOQL. Finally, we validate the proposed definition by implementing an interpreter for the language.

1. Introduction The notion of versioning is routinely employed in the practice of programmers, software engineers and anyone involved with the production and the maintenance of any kind of documents (e.g. source code and specifications, user’s manual, etc). In these contexts, a version is typically a file, and a set of versions is handled at the operation system level by well known tools such as CVS (Concurrent Version System1 ) and RCS (Revision Control System2 ). Alternatively, versions can be managed by a database system, in which the versions’s granularity depends on the database model (e.g. relations, tuples, columns for relational model, and classes, attributes, relationships, methods for object oriented model). There is also a wide range of applications that employ versioning by adapting the main concepts to work on different scenarios, such as XML document management [Vagena et al. 2004], semantic web [Noy and Musen 2004], and data warehousing [Wrembel and Morzy 2005]. While most of these related work focuses on different aspects and applicabilities of the concepts, designing models, and efficient processing of versions, there is yet to be a precise definition of a query language for database systems with versioning support. ∗

Work supported by grants CNPq 481516/2004-2 and FAPERGS 01412264 http://ximbiot.com/cvs 2 http://www.gnu.org/software/rcs/rcs.html 1

72

XXI Simpósio Brasileiro de Banco de Dados Therefore, our main goal is to fill this gap by defining a query language for a database with versioning support. The language, named Versioned Object Query Language (VOQL), extends ODMG [Cattell et al. 2000] with new features for recovering object versions. These new features are presented through a series of examples and validated through an implementation of a VOQL interpreter. Additionally, we explain how the VOQL can be formally defined through a series of inference rules for its semantics and its type system. These rules work as a precise language reference for guiding its implementation process and are essential for proving that the language is safe (in the sense that evaluation of well-typed queries do not lead to execution errors). The complete formal definition of VOQL can be found in [Machado et al. 2006a, Machado et al. 2006b]. In this paper, we concentrate in explaining the intuition behind the most important rules. Finally, the presentation of VOQL is based on a generic database with versioning support. It is not our goal to propose yet another database model with such support. We continue in Section 2 by discussing relevant background information. Section 3 presents the main features of VOQL. Section 4 introduces an operational semantics for VOQL, while Section 5 defines its type system. Section 6 proves that the type system is safe with respect to the operational semantics. Section 7 presents an interpreter for VOQL. Section 8 discusses related work and Section 9 concludes the paper.

2. Background This section briefly presents the main features of a generic database with versioning support. We assume the existence of an object model based on ODMG [Cattell et al. 2000] and we extend it with features for specifying object versions. The schema for a database with versioning support is a collection of class definitions. A class defines a set of attributes, relationships, and method signatures. A relationship is defined explicitly. Transversal paths are declared in pairs, one for each direction of the relationship. Relationships are then accessed by the query language as attributes (in other words, from the semantics point of view, attributes and relationships receive the same treatment). The database management system is responsible for maintaining the referential integrity of relationships. For example, when an object that participates in a relationship is deleted, any transversal path to that object must be deleted as well. Maintaining referential integrity ensures that applications cannot dereference transversal paths that lead to nonexistent objects. Finally, the class behavior is specified as a set of method signatures. In ODMG, methods are defined in a host programming languages such as Java and C♯, and hence, we do not consider methods in our formal treatment. Once the database is active, objects are instantiated and versions can be created either implicitly (any update defines a new version) or explicitly (by a user). For each set of versions, there is a versioned object in charge of grouping all versions of the same object. Each versioned object has also a current version which, by default, is the most recently created (the user can also set any version as current). Again, it is not the focus of this paper to discuss how exactly this whole process works. What matters is that the database is capable of managing different versions for the same object following the basic features aforementioned. For instance, consider an example taken from an academic system. The curric-

73

XXI Simpósio Brasileiro de Banco de Dados ula usually undergo successive revisions, such as an update on the set of courses to be completed in order to obtain a degree. Students must follow the current curriculum, but it might be necessary to query past curriculum versions in order to grant course equivalences for instance. This is quite common in Brazilian universities, where even though a unique curriculum is active (referred as the current curriculum), several students can be following different curricula at the same time. Figure 1 illustrates the graphical representation of the extension of a class Curriculum (we refer to [Galante et al. 2005] for a complete description of this academic system). This extension has one versioned object with three versions of a Computer Science curriculum: the initial version - CS 2004, and two derived alternatives - CS 2005 and CS 2006.

Figure 1. Graphical representation of a versioned object of the class Curriculum with three versions (CS 2004, CS 2005 and CS 2006).

As illustrated, the versioned object’s versions are organized in a version derivation tree. A language for querying a database with such organization must provide support for collecting all versions of a versioned object, and for navigating among the nodes of the derivation tree (e.g. selecting the original and the last versions, and selecting the antecessors/successors of a specific version). Next section informally presents the VOQL language through a series of examples.

3. VOQL Query Language In this section we propose the Versioned Object Query Language (VOQL) as a query language for the specified database with versioning support. This language is a fragment of ODMG OQL [Cattell et al. 2000] extended with features for maintaining object versions. We consider only functional aspects of the query language, i.e. queries that do not create and/or delete objects. VOQL queries belong to the language defined by the grammar of Figure 2. Next, we focus our discussion on the new features presented by VOQL. Extended features. VOQL has variables of types boolean, integer, strings, and sets. When a query denotes a versioned object q, q.a accesses the attribute a of the versioned object’s current version. Additionally, q ->a results in a collection of all a-attributes of elements of q. The query q ->versions returns all versions of the versioned object represented by q. Besides the regular arithmetic, logical and relational operators, VOQL has unary operators that test the absolute position of a version (is leaf, is root) and binary operators that compare the relative position of two versions in the version derivation

74

XXI Simpósio Brasileiro de Banco de Dados q ∈ q ::= | | | | | | | | | | |

Query b i s {q1 , . . . , qn } x q1 bop q2 uop q1 q.a q->a q->versions if q1 then q2 else q3 select qa from q1 x1 , . . . , qn xn where qb

booleans integers strings sets, n ≥ 0 variables binary operations unary operations attribute access attribute access in a colection acess to all versions conditional selection, n ≥ 0

Figure 2. VOQL syntax.

tree of a versioned object (is succ, is pred). VOQL has the usual conditional and select-from-where constructs as well. In the remaining of this section we informally describe the principles of VOQL with emphasis on its versioning capabilities. We consider again an academic system with a fragment of its instances illustrated on the top half of Figure 3. The bottom half of Figure 3 has a series of examples of VOQL queries. In the figure, each query has a proper identifier that is used in the text. There are three classes named Person, Faculty, and Course. In this fragment, class Person has attributes for last name and social security number; Faculty extends Person with office number and title; Course has number, description, and units. There is a relationship (with transversals called teaches and is taught by) between classes Faculty and Course. Class Person has only one instance. Class Faculty has one versioned object with two versions. Finally, class Course has three different instances: Networks and Compilers with one instance each, and Introduction to Computer Science with three versions. Basic and versioned queries. Query 1 retrieves the set of courseNum of all courses named “Intro to CS” and is a plain query (with no versions). The variable Courses represents the extension of the class Course. The names of extents (previously declared within the database schema) are treated as global variables within queries. Note that the variable c is local to the query. Since no specific version is required in this query, c ranges over the current version in the extent Courses. Adding the word versions to the extent name changes the query range to consider all its versions. Therefore, query 2 retrieves the collection of course numbers of all versions of courses named “Intro to CS”. Similarly, it is possible to retrieve the original course numbers of all course instances. In VOQL this is achieved by query 3 that uses the operator is root for testing the position of versions within the version derivation tree. This query returns a collection of course numbers only from those courses versions that are root on the derivation tree of the versioned object. Note that courses CS114 and CS121 do not have any version. They

75

XXI Simpósio Brasileiro de Banco de Dados

Figure 3. Graphical representation of some versions from the academic system plus queries and their results.

belong to the results of this query because the first instance of an object is regarded as its first version and root of the derivation tree, by default. The ‘.’ operator. In OQL, the ‘.’ operator has a position-dependent semantics. When used in the statements select and where of a query, it implies conventional member access. For example, in query 4, the operator is applied directly to the variable c, which ranges over object identifiers. This query returns the set of courseNum of courses with more than 4 units. On the other hand, when used in the statement from, it expresses the creation of a collection of attributes from a collection of object identifiers. For example, in query 5, the ‘.’ operator is used in conjunction with the extent FacultyM. It generates a collection by reaching the teaches relationship of all elements within the extent and grouping them all. The result is different from query 4 because 5 only considers courses being taught (i.e. course CS121 is not returned since it has no faculty assigned to). In order to have a simpler context-independent, non-positional semantics, we distinguish these two different meanings with two operators: ‘.’ for member access and ‘→’ for iteration over collections. For example, query 5 is then formulated as query 6 in VOQL. The ambiguity in the interpretation of the ‘.’ operation (not mentioned in ODMG OQL) has been firstly pointed out in [Galante et al. 2003a]. This distinction became clear

76

XXI Simpósio Brasileiro de Banco de Dados while we were working on the formal semantics and the type system for VOQL.

4. Operational Semantics for VOQL The design of a query language starts with pragmatical considerations such as which data types will be supported and which operators will be available. Usually these considerations are determined by the associated data model. After that, the designer can make some experiments by implementing an interpreter for the core aspects of the language and by performing some tests with it. This section presents a formal definition of the semantics of VOQL. This formal semantics is a reference for guiding the implementation of an interpreter for the language and is essential for proving properties of the language (such as type safety presented in Section 6.). The formal description of query evaluation is presented in the style known as small-step operational semantics [Plotkin 2004]. The semantics consists of a set of inference rules of the form premise1 . . . premisen conclusion Each rule is read as “if the premises are true then the conclusion holds”. Premises and conclusion are of the form vee; voe; ve ⊢ q → q ′ , representing that the VOQL query q is reduced, in one step, to the VOQL query q ′ in the database state ( vee; voe; ve) . Premises can also be conditions that must be satisfied by the components of the database state. The components vee, voe and ve (extension, versioned object, and version environment respectively) are formal representations of the database state. The reduction process from a source query to a final value is a sequence of reduction steps in which each reduction step is governed by a semantic rule. Values. Before giving the set of semantic rules, we define the values of the language. At each step of the reduction process, a query is reduced/rewritten into an intermediate query before reaching a final value. Some of these intermediate queries may produce values that are not present in the source language, as for example version identifiers v id. A grammar for the VOQL values is as follows. v :: = | | | |

b i s v id {v1 , . . . , vn }

booleans integers strings version ids set of values

n≥0

A complete definition and analysis of how the database state is formalized and a presentation of the complete set of rules are out of the scope of this paper and can be found in [Machado et al. 2006a]. We proceed with an intuitive explanation of the most important semantic rules for VOQL. Basic Queries. Sets of queries {q1 , . . . qn } are evaluated from left to right. The premise of rule S ET evaluates the first non-value query form left to right in the set.

77

XXI Simpósio Brasileiro de Banco de Dados vee; voe; ve ⊢ qj → qj′ vee; voe; ve ⊢ {vii∈1...j−1 , qj , q i∈j+1...k } → {vii∈1...j−1 , qj′ , q i∈j+1...k }

(S ET)

The select-from-where construct is evaluated by first building the cartesian product of all extents mentioned in the from statement. From this product, only those objects that satisfy the condition specified within where are collected. The result of the query is given by the select statement that projects components of these objects. Rule S ELECT 1 reduces the first query in the from part. Then, if the first query in the from part is reduced to an empty set, the whole query reduces to the empty set by the rule S ELECT 2. vee; voe; ve ⊢ q1 → q1′ vee; voe; ve ⊢ select qa from q1 x1 , . . . , qn xn where qb → select qa from q1′ x1 , . . . , qn xn where qb

(S ELECT 1)

vee; voe; ve ⊢ select qa from {} x1 , . . . , qm xn where qb → {} (S ELECT 2) The cartesian product is given by S ELECT 3 that uses the substitution operation3 to produce all possible combinations of values for all variables within the from statement. Rules S ELECT 2 and S ELECT 3 are examples of rules that have no premises. Note that the query in the right side of S ELECT 3 forms another query using the union set operator. vee; voe; ve ⊢ select qa from {v1 , . . . , vm } x1 , . . . , qn xn where qb → ( select qa from q2 x2 , . . . , qn xn where qb) [ x1:: = v1] union ... union ( select qa from q2 x2 , . . . , qn xn where qb) [ x1:: = vm] (S ELECT 3) The next rules state that when there is nothing to evaluate in the from part, the evaluation proceeds with the where part. When the where part is false, the result of the select is the empty set. Otherwise, the whole select proceeds with a set with qa . vee; voe; ve ⊢ qb → qb′ vee; voe; ve ⊢ select qa from where qb → select qa from where qb′ (S ELECT 4) vee; voe; ve ⊢ select qa from where false → {}

(S ELECT 5)

vee; voe; ve ⊢ select qa from where true → {qa }

(S ELECT 6)

Versioned Objects. Rule VAR reduces an extent identifier to its associated collection of versioned object identifiers. Extents are sets of versioned object’s identifiers, and an extent environment vee is a mapping from extent identifiers to extents . 3

The notation q[x := v] denotes the query that results from the substitution of v for all free occurrences of x in q

78

XXI Simpósio Brasileiro de Banco de Dados vee( x) = {vo id 1 , . . . vo id 2 } vee; voe; ve ⊢ x → {vo id 1 , . . . vo id 2 }

(VAR)

A versioned object environment voe maps each versioned object identifier vo id to a tuple (c, versions, current, tree), i.e. the versioned object’s class name, its versions, its current version, and a relation representing the versions derivation tree respectively. The rule VO ID evaluates a reference to a versioned object to its current version according to the versioned object environment. voe( vo id) = ( c, versions, current, tree) vee; voe; ve ⊢ vo id → current

(VO ID)

Operators. Rules P RED 1 and P RED 2 state that the operands of is pred are evaluated from left to right. The rules for the other positional operators follow this same pattern and, hence, are omitted (rules for arithmetic, logical and relational operators are also omitted). vee; voe; ve ⊢ q1 → q1′ vee; voe; ve ⊢ q1 is pred q2 → q1′ is pred q2

(P RED 1)

vee; voe; ve ⊢ q2 → q2′ vee; voe; ve ⊢ v1 is pred q2 → v1 is pred q2′

(P RED 2)

A version derivation tree of a versioned object is a set of pairs of version identifiers. If a pair ( v id 1 , v id 2) belongs to this set, it means that the version v id 1 is an immediate predecessor of version v id 2 . Next, the premises of rules P RED 3 and P RED 4 check whether v id 1 is predecessor of v id 2 in the version derivation tree. voe( vo id) = ( c, versions, current, tree) ( v id 1 , v id 2) ∈ tree (P RED 3) vee; voe; ve ⊢ v id 1 is pred v id 2 → true voe( vo id) = ( c, versions, current, tree) ( v id 1 , v id 2) ∈ tree (P RED 4) vee; voe; ve ⊢ v id 1 is pred v id 2 → false The ‘.’ operator. As previously discussed, the ambiguity of the ‘.’ operator is solved by presenting two operators with distinguished meanings. First, rule D OT 1 evaluates a query like q.a by first reducing the component q. vee; voe; ve ⊢ q → q ′ vee; voe; ve ⊢ q.a → q ′ .a

(D OT 1)

Then, rule D OT 2 accesses the content of an object attribute. A version environment ve maps a version identifier to a pair (c,vstate) where c is the version’s class name and vstate is a mapping from the version’s attributes to their contents. vstate( a) = k ve( v id) = ( c, vstate) vee; voe; ve ⊢ v id .a → k

79

(D OT 2)

XXI Simpósio Brasileiro de Banco de Dados Second, rules A RROW 1 and A RROW 2 evaluate queries of the form q ->a. vee; voe; ve ⊢ q → q ′ vee; voe; ve ⊢ q ->a → q ′ ->a

(A RROW 1)

vee; voe; ve ⊢ v id i .a → ki i≥0 vee; voe; ve ⊢ {v id i }->a → {ki }

(A RROW 2)

Finally, rule V ERSIONS collects all versions of a versioned object by accessing the versioned object state. ( ci , versionsi , currenti , treei) ∈ Image( voe) v id i ∈ versionsi (V ERSIONS) n vee; voe; ve ⊢ {v id i }->versions → i=1 versionsi

5. Type System After presenting the semantic rules for VOQL, we now discuss the type system of the language. This type system is important because not all syntactically correct queries are 6 VOQL queries. For instance, the operator is succ requires operands of the same class. Therefore the following query, although grammatically correct, is not well-typed in the database context described in Figure 1. select c.courseNum from Courses c, FacultyM f where c is_succ f A type system consists of a set of rules that specify certain constraints that must be respected before the query is evaluated. Regarding the operational semantics, more details can be found in [Machado et al. 2006a]. We now present an intuitive explanation for some of the typing rules for VOQL. Then, we discuss the relevance of the property that the evaluation of well-typed queries does not lead to certain execution errors. The typing rule’s premises and conclusion have the form vsch; vee; voe; ve; Γ ⊢ q: σ where vee, voe and ve are extent, versioned object and version environments respectively (as explained in the previous section), vsch is a representation of a database schema, and Γ is an mapping from query variables to their types. For clarity, a database typing context vsch;vee;voe;ve;Γ is sometimes represented by K. A type judgement K ⊢ q : σ is read as “the query q has the type σ in database typing context K”. Types in VOQL are given by the following grammar. σ :: =

int | bool | string | c | set( σ)

where c is a metavariable ranging over class’s names. Basic queries. The environment Γ has the type of both local and global query variables (extent identifiers as defined in the schema):

80

XXI Simpósio Brasileiro de Banco de Dados Γ ( x) = σ vsch; vee; voe; ve; Γ ⊢ x: σ

(VAR)

Rule T-S ELECT types the queries q1 . . . qn in the from part from left to right adding the associated type assignments of variables x1 . . . xn to the type context Γ . vsch; vee; voe; ve; Γ ⊢ q1: set( σ1) vsch; vee; voe; ve; Γ , x1: σ1 ⊢ q2: set( σ2) .. . vsch; vee; voe; ve; Γ , x1: σ1 , . . . , xn−1: σn−1 ⊢ qn: set( σn) vsch; vee; voe; ve; Γ , x1: σ1 , . . . , xn: σn ⊢ qa: σa vsch; vee; voe; ve; Γ , x1: σ1 , . . . , xn: σn ⊢ qb: bool x1 to xn different x1 , . . . , xn ∈ Dom( Γ ) n≥0 vsch; vee; voe; ve; Γ ⊢ select qa from q1 x1 , . . . , qn xn where qb: set( σa) (T-S ELECT) Versioned objects. Rules T-VOID and T-V ID access the versioned object environment and the version environment to obtain the class name. voe( void) = ( c, vobjstate) K ⊢ void: c

(T-VOID)

ve( vid) = ( c, vstate) K ⊢ vid: c

(T-V ID)

Operators. Rule T-P RED simply requires that both operands of is pred be of the same class type c. The rules for other positional operators are similar. K ⊢ q1: c K ⊢ q2: c K ⊢ q1 is pred q2: bool

(T-P RED)

The ‘.’ operator. A schema is represented by a mapping vsch from class identifiers to class definitions. A class definition is then given by a triple ( s, e, ty) where s is the name of the superclass, e is the name of the class extent and ty is a map from the class attributes to their types. Rule T-D OT states that if q is of type c, and if the type of attribute a is σ based on the schema environment for c, then q.a is of type σ. K ⊢ q: c

vsch( c) = ( s, e, ty) K ⊢ q.a: σ

ty( a) = σ

(T-D OT)

Observe that the type of a query q ->a, given by rule T-A RROW, is set(σ), where a is of type σ according to the schema. K ⊢ q: set( c)

vsch( c) = ( s, e, ty) K ⊢ q->a: set( σ)

81

ty( a) = σ

(T-A RROW)

XXI Simpósio Brasileiro de Banco de Dados The rule T-V ERSIONS ensures that the construction versions will be applied only to collections of versioned objects. K ⊢ q: set( c) K ⊢ q->versions: set( c)

(T-V ERSIONS)

Subtyping. The subtyping relation for VOQL types, written σ ≤ σ ′ , is the reflexive and transitive closure of the relation defined by the next rules. We assume that the subclass hierarchy specified in the schema definition is represented by a relation written extends, as follows. c extends c′ c