Querying Clocked Databases - CiteSeerX

2 downloads 0 Views 147KB Size Report
1500, Salisbury, SA 5108, Australia. Abstract. ... numbers hr; ni where r is a real number and n is an integer. .... A clocked database DB is a triple hR; ; i where.
Querying Clocked Databases Mehmet A. Orgun1 and Chuchang Liu2 1

2

Department of Computing, Macquarie University, NSW 2109, Australia Information Technology Division, Defence Science and Technology Organisation, PO Box 1500, Salisbury, SA 5108, Australia

Abstract. We propose a temporal extension of Datalog which can be used to model and query temporal databases with relations based on multiple clocks. The extension, called Clocked Temporal Datalog, is based on a clocked temporal logic in which each predicate and hence each formula can be assigned a separate clock. A Clocked Temporal Datalog program consists of three parts: (1) a clock definition, (2) a clock assignment, and (3) a program body. The clock definition specifies all the available clocks. The clock assignment assigns to each predicate defined in the program body a clock from the clock definition. The meaning of the program body naturally depends on the provided clock definition and assignment. Therefore a Clocked Temporal Datalog program models intensionally a clocked database in which each relation is defined over a clock. Programmable clock definitions are very flexible in specifying periodic as well as some nonperiodic clocks, and in specifying relationships between clocks on the fly.

1 Introduction While there is not a wealth of reported research on deductive database systems for temporal data, temporal databases based on the relational model have been extensively studied in the literature. A comprehensive survey of temporal query languages is provided by Chomicki [5]. The recent status of research in temporal databases is summa¨ rized in Ozsoyo˘ glu and Snodgrass [17]. Temporal extensions based on logic programming are considered by Chomicki and Imieli´nski [6], Chomicki [4], Baudinet et al [2] and B¨ohlen and Marti [3]. These proposals are concerned with modeling and querying infinite temporal data in logic languages (such as extensions of DATALOG) in which predicates are extended with explicit time parameters. A temporal extension of Datalog is considered by Orgun [15], based on the function-free subset of the temporal language Chronolog [16]. These languages are not designed to deal with multiple granularities and/or multiple clocks. However, they make the representation of infinite temporal information possible and often enable a more compact representation of finite information. They also improve the expressivity of query languages for temporal data. One important issue is that the relations in a temporal database are not necessarily defined on the same granularity of time. Some events occur at irregular intervals, and it seems unnatural to force them all onto a prescribed notion of time. Doing so would lead to semantic mismatches [9, 10]. Ladkin [10] recognized that distinct granularities cannot be mixed, and developed an algebra where the granularity of the source timestamps is considered. Wiederhold, Jajodia and Litwin [20] also recognized the problem, and provided an algebra in which data with multiple granularities are converted

to a uniform model of data based on time intervals. Gagne and Plaice [9] propose a non-standard temporal deductive database system in which time is modeled by pairs of numbers hr; ni where r is a real number and n is an integer. Their model is based on a dense model of time rather than a discrete model, and it is not clear how it can be used in practice. Dyreson and Snodgrass [7] extended SQL-92 to support mixed granularities with respect to a granularity lattice. There are also some other recent works extending the relational model and algebra to deal with multiple granularities, for instance, another calendar-based approach is proposed by Lee et al [11]. In this paper, we first propose a model for clocked databases in which relations are defined over multiple time-lines or more precisely, multiple histories. We then consider a deductive system for clocked databases, featuring a clocked temporal extension of Datalog. Clocked Temporal Datalog is based on TLC [12], a temporal logic which can be used to model predicates defined over multiple clocks. In TLC, each predicate and hence each formula can be assigned a clock which is a subsequence of a discrete time-line modeled by the sequence of natural numbers. In TLC, a calendar-dependent partitioning of the time-line is not assumed, and hence granularity conversion operators are not required. Our approach is therefore more restrictive in modellin g granularity than some others reported in the literature, however, it does not require a predetermined granularity lattice, it involves programmable clock definitions, and it is grounded in temporal logic. Temporal logic [19] provides a clean framework in which the temporal properties of certain applications such as temporal databases can be formalized, studied, and then generalized and applied to other application domains. Clocked Temporal Datalog can also be used as a deductive front-end to clocked databases to enhance the expressivity of their query languages. Through programmable clock definitions, temporal data in a clocked database can also be viewed and summarized at different granularities. There are also other approaches to temporal databases based on temporal logic. For instance, Orgun [14] proposed a temporal algebra, with algebraic counterparts of temporal operators first, next and fby, which also includes temporal aggregation operators. Tuzhilin and Clifford [18] proposed a temporal algebra (called TA) as a basis for temporal relational completeness. The algebra is equivalent in expressive power to a temporal calculus based on a temporal logic with temporal operators since and until. Gabbay and McBrien’s [8] considered a refinement of TA which is also based on a temporal logic with since and until. They introduce two linear recursive operators, namely, since-product (S ) and until-product (U ). These operators closely resemble their counterparts in temporal logic. We do not consider these operators in Clocked Temporal Datalog because their properties in a logic programming setting have not been studied yet. Also, they do not have a straightforward operational semantics. This paper is structured as follows. Section 2 proposes a model for clocked databases. Section 3 gives an introduction to TLC, including its clock calculus, syntax and semantics. Section 4 introduces Clocked Temporal Datalog. Section 5 presents the declarative semantics of Clocked Temporal Datalog programs and establishes the connection between clocked databases and programs.

2 A Model for Clocked Databases A clocked database consists of three components: (1) a set of relation symbols, (2) a clock assignment, and (3) a relation mapping. The clock assignment assigns a clock (a subsequence of an assumed discrete time-line) to each relation symbol; the relation mapping assigns to each relation symbol a clocked relation defined over its clock. In the following, we denote the set of natural numbers f0; 1; 2; 3; : : :g by ! . Definition 1 (Clocks). A clock is a strictly increasing sequence of natural numbers. The global clock gc is the sequence of natural numbers: h0; 1; 2; 3; : : :i. The empty clock is the empty sequence: h i. Let ck be a clock. We write t 2 ck if t occurs in ck (t is a moment on clock ck ). For any given clock ck , the notation ck (i) denotes the ith element on ck . We now define an ordering relation on clocks as follows.

Definition 2 (v). For any given clocks ck1 and ck2 , we write ck1 v ck2 if for all t 2 ck1 , we have t 2 ck2 . If ck1 v ck2 then we also say that ck1 is a sub-clock of ck2 .

It can be shown that the set of clocks, denoted by CK, with the ordering v, is a complete lattice in which the global clock is the maximum element and the empty clock is the minimum element. Note that CK contains uncountably many clocks, all of which are conceivably available to the user. We do not provide a calendar-dependent partitioning of the time-line at this stage. The only structure available in CK is provided by the ordering v. In particular, if ck1 v ck2 for clocks ck1 and ck2 , we say that ck2 has a finer granularity than ck1 . We now define two operations on clocks that correspond to the greatest lower bound (g.l.b) and least upper bound (l.u.b.) of clocks with respect to v. The least upper bound of two given clocks can be obtained by merging them; the greatest lower bound can be obtained by taking the common moments in them. Definition 3 (u; t). Let ck1 ; ck2 2 CK. We define two operations on clocks as follows: ck1 u ck2  g:l:b:fck1; ck2 g and ck1 t ck2  l:u:b:fck1; ck2 g.

D

Let be a domain of values of interest. Here we are not concerned with the actual type of each domain element, but the model can be easily extended to include types of elements. Let n denote the n-folded Cartesian product of , and P ( n ) the set of all subsets of n . We denote the set of all functions from set X to set Y by [X ! Y ].

D

D

D

D

Definition 4 (Clocked relations). Let ck 2 CK. A clocked relation with arity n is a map from ck to P ( n ). The set of clocked relations is denoted by [n2! [ck2CK [ck ! P ( n )].

D

D

Let R be the set of relation symbols we are allowed to have in a clocked database. A clock assignment tells us those times at which each relation symbol has a defined value (let it be the empty relation or a non-empty relation). Note that we also write r=n for a relation symbol r with arity n. Definition 5. A clock assignment is a map from R to CK, that is, 2 [R ! CK]. The clock associated with a relation symbol r=n 2 R over is denoted by (r=n).

Definition 6 (Clocked database). Let R be a set of relation symbols, and a clock assignment. A clocked database DB is a triple hR; ; i where  assigns a clocked relation over (r=n) to all r=n 2 R. We write (r=n) to denote the clocked relation which is assigned to r=n by ; (r=n) is in fact a map from (r=n) to P (Dn ). Intuitively, tells us when a particular relation symbol has a defined value, while  tells us the value associated with a relation symbol whenever it is defined.

3 Temporal Logic with Clocks To make the paper relatively self-contained, we give an introduction to the syntax and semantics of TLC. Most of the material presented in this section is from [12] with some modifications to extend TLC with temporal operator fby. In the vocabulary of TLC, apart from variables, constants and predicate symbols, we also have propositional connectives: :, _ and ^, universal quantifier: 8, three temporal operators: unary first and next, and binary fby, and punctuation symbols: “(” and “)”. In TLC, the definition of terms is as usual. The other connectives !, $ and the quantifier 9 can be derived from the primitive connectives and universal quantifier as usual. The intuitive meaning of the temporal operators is as follows: (1) first A: A is true at the initial moment in time, (2) next A: A is true at the next moment in time, and (3) A fby B : A is true at the initial moment in time and from then on B is true. It should be kept in mind that these readings are relative to the given formula clocks (see below). We write next[n] for n successive applications of next. If n = 0, then next[n] is the empty string. 3.1 Clock calculus Just like in a clocked database, we use a clock assignment to assign a clock to each predicate symbol. Formally, we have the following definition: Definition 7. A clock assignment is a map from the set SP of predicate symbols to the set CK of clocks, that is, 2 [SP ! CK]. The notation (p ) denotes the clock which is associated with a predicate symbol p on . We extend the notion of a clock to formulas as follows: Definition 8. Let A be a formula and a clock assignment. The clock associated with A, denoted as  (A), is defined inductively as follows: -

If A is an atomic formula p(e1 ; : : : ; en ), then  (A) = (p). If A = :B , first B , (8x)B or (9x)B then  (A) =  (B ). If A = B ^ C , B _ C , B ! C or B $ C , then  (A) =  (B ) u  (C ). If A = next B , then (1)  (A) = ht0 ; t1 ; : : : ; tn?1 i when  (B ) = ht0 ; t1 ; : : : ; tn i is non-empty and finite; (2)  (A) =  (B ) when  (B ) is infinite or empty.

- If A = B fby C , then  (A) = hb0 ; ck ; ck+1 ; ck+2 ; : : :i where  (B ) = hb0 ; b1 ; : : :i and  (C ) = hc0 ; c1 ; : : : ; ck?1 ; ck ; ck+1 ; ck+2 ; : : :i and ck?1  b0 < ck for some k  0.

The u?rule says that the corresponding formulas will have a defined value at a particular moment in time only if that time occurs in the clocks of all sub-formulas in it. Note that the above clock calculus is an extension of the one given in [12] to deal with fby. We now define the rank of a moment on a given clock. Definition 9. Given a clock ck = ht0 ; t1 ; t2 ; : : :i we define the rank of tn on ck to be n, written as rank(tn ; ck) = n. Inversely, we write tn = ck(n) , which means that tn is the moment in time on ck whose rank is n. The following definitions will be very useful in developing the declarative semantics of Clocked Temporal Datalog programs. Definition 10. Temporal atoms are defined inductively as follows: - Any atomic formula is a temporal atom. - If A is a temporal atom then so are first

A and next A.

Definition 11. A temporal atom is fixed-time if it has an application of first followed by a number of applications of next; otherwise it is open. A formula is fixed-time if all atoms contained in it are fixed-time; otherwise it is open. 3.2 Semantics Let B = h ffalse; trueg;  i denote a Boolean algebra of truth values with the ordering false  true and the following standard operations: - comp = ftrue 7! false; false 7! trueg (complementation), - X  Y = the g.l.b. of fX; Y g with respect to , - X +Y = the l.u.b. of fX; Y g with respect to . In TLC , at a given time t 2 ! , the value of a formula can be true, false or un-

defined, depending on the clocks of predicate symbols appearing in it. We do not have a representation for undefined values, rather we use partial mappings. The meaning of a predicate symbol p is actually a clocked relation, i.e., a partial mapping from ! to P ( n ) where n is the arity of p and D is the domain of discourse. For any t 2 (p), the image is naturally defined. A temporal interpretation together with a clock assignment assigns meanings to all the basic elements of TLC.

D

Definition 12. A temporal interpretation I on a given clock assignment of TLC comprises a non-empty set D, called the domain of the interpretation, over which the variables range, together with for each variable, an element of D; for each term an element of D; and for each n-ary predicate symbol p=n, a clocked relation of (p=n) ! P ( n ).

D

We denote the clocked relation represented by p=n on I over by I (p=n). To refer to the value of p=n at a particular moment in time, we use the notation I (p=n)(t). In the following, the notation [[A]]t ;I is used to denote the value of A under interpretation I on clock assignment at moment t. Definition 13. Let I be a temporal interpretation on a given clock assignment of TLC . For any formula A of TLC, [[A]]t ;I def = [[A]]t  (A);I whenever t 2  (A). The t function [[A]] ;I is defined inductively as follows:

D

(1) If e is a term, then [[e]]t ;I = I (e) 2 . (2) For any n-ary predicate symbol p=n and terms e1 ; : : : ; en , [[p=n(e1 ; : : : ; en )]]t  (p=n(e ;:::;e ));I = if h [[e1 ]]t ;I ; : : : ; [[en ]]t ;I i 2 I (p=n)(t); 1 n otherwise. (3) For any formula of the form :A, [[:A]]t  (:A);I = ([[A]]t  (A);I ). t (4) For any formula of the form A ^ B , [[A ^ B ]]  (A^B );I = [[A]]t  (A);I  [[B ]]t  (B );I . (5) For any formula of the form A _ B , [[A _ B ]]t  (A_B );I = [[A]]t  (A);I +[[B ]]t  (B );I . (6) For any formula of the form (8x)A, [[(8x)A]]t  ((8x)A);I = if [[A]]t  (A);I [d=x] = for all d 2 where the interpretation I [d=x] is just like I except that the varible x is assigned the value d in I [d=x]; otherwise. (7) For any formula of the form first A, [[first A]]t  (first A);I = [[A]]s  (A);I where s =  (first A)(0) . (8) For any formula of the form next A, [[nextA]]t  (next A);I = [[A]]s  (A);I where s =  (next A)(n+1) and n = rank (t;  (next A)). (9) For any formula of the form A fby B , [[A fby B ]]t  (A fby B );I = [[A]]t  (A);I if rank (t;  (A fby B )) = 0; otherwise, [[A fby B ]]t  (A fby B );I = [[B ]]s  (B );I where s =  (B )(n?1) and n = rank (t;  (A fby B )).

true

false

true

comp

true

D

false

Let j=I; A denote the fact that A is true under I on clock assignment , in other words, [[A]]t ;I = [[A]]t  (A);I = for all t 2  (A). We also use the notation j= A to denote the fact that j=I; A for any temporal interpretation I over clock assignment . In particular, if j=I; A, then we say that the temporal interpretation I on is a model of the formula A and use j= A to denote the fact that for any interpretation I and any clock assignment we have j=I; A. Axioms and rules of inference of TLC can be found in [12]. The version of TLC presented in this paper would have additional axioms and rules to formalize the temporal operator fby.

true

4 Clocked Temporal Datalog A Clocked Temporal Datalog program consists of three components: P = Pc 1 Pa 1 Pb where Pc , Pa and Pb are the clock definition, the clock assignment and the program body of the program P . The symbol 1 means “jointing”, that is, Pc , Pa and Pb jointly form the program P . The main point is that the clock assignment of Pb is totally determined by Pc and Pa .

The formal definition of program clauses is given below. We assume that some builtin Prolog predicates are also available in Clocked Temporal Datalog (e.g., the is predicate). Definition 14 (Temporal units). - A temporal atom is a temporal unit. - If A1 ; : : : ; Am and B1 ; : : : ; Bn (for (A1 ; : : : ; Am ) fby (B1 ; : : : ; Bn ).

m; n 

1)

are temporal units, then so is

Definition 15 (Program clauses). - A program clause in Pc and Pb is of the form A