A spatial SQL extension for continuous field querying - IEEE Xplore

2 downloads 2947 Views 236KB Size Report
archaeology, urban management and tourism where handling ... Proceedings of the 28th Annual International Computer Software and Applications Conference ...
A Spatial SQL Extension for Continuous Field Querying Robert Laurini Research Center for Images and Information Systems INSA of Lyon 69621 – Villerurbane Cedex - France [email protected]

Abstract In the last decade, a growing interest has been devoted to the management of data referring to geographic scenarios. However, the attention of recent research has been focused on discrete data disregarding continuous data, because of their intrinsic complexity. In this paper, we introduce an extension of a spatial SQL, which provides users with the capability to pose queries about both discrete and continuous data.

1. Introduction The capability to handle and analyze spatial data are usually seen as the key characteristic that distinguishes a GIS from other information systems. In the last decade, many authors have designed their extended database languages for handling geographic data, also in terms of temporal features (e.g. [2, 4, 5, 6, 7, 8]). However, such languages are limited to retrieve and compute discrete data, disregarding continuous data, which are a fundamental component of spatial analysis. In this paper we present the Continuous Field SQL (CFSQL) language, an extension of spatial SQL query language able to manage continuous data, as well as discrete data. Such a language is a desirable tool for managing of many situations related to geographic data. In fact there are many fields, such as archaeology, urban management and tourism where handling temperature, pressure, air pollution, etc is a significant issue. The remainder of this paper is organized as follows:

Luca Paolino, Monica Sebillo, Genoveffa Tortora, Giuliana Vitiello Dipartimento di Matematica e Informatica University of Salerno 84084 – Fisciano (SA) - Italy [email protected] {sebillo, tortora, gvitiello}@unisa.it

Section 2 introduces some basic concepts underlying the definition of the SQL extension for continuous fields. Then, Section 3 describes operators and functions aimed to manipulate continuous data in the corresponding SELECT-FROM-WHERE query constructs.

2. Preliminaries Properties characterizing continuous data can be described through a function which assigns every Earth’s position a unique value representing the data intensity. Such a function is named continuous field. In this paper, we will refer to scalar continuous fields, f : D Ž A o V , where: D Ž ƒ2 represents an Earth’s surface subpart, and f(D) Ž V represents phenomenon values. Moreover, continuous field intensity can be considered as a constant value just for a limited time period, which is determined by the experience about the phenomena. Such considerations have lead us to define the following general schema to represent them, that is: CF = (D, F, T), where: x D is the continuous field domain, x F is the function representing the phenomenon, x T is the time period when the continuous field representation is valid. D, F, and T are called continuous field attributes. By assigning values for each attribute we define a continuous field instance cf. Moreover, in order to make a useful query we could bring together different continuous field instances and discrete data inside structures we call Multifields, which are formally defined as follows:

Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04) 0730-3157/04 $20.00 © 2004 IEEE

A set of continuous field instances and discrete data cf1 = (f1, d1, t1), cf2 = (f2, d2, t2),…, cfn = (fn, dn, tn), g1, g2, …, gm is a Multifield. A set of Multifields can be grouped into a table where each column represents either an instance of the same phenomenon or instances of the same discrete data. Formally: a Multifield table is a set of Multifields cf11=(f11, d11, t11),…,cf1n =(f1n, d1n, t1n),g11,…,g1m,, …, cft1=(ft1, dt1, ts1),cft2=(ft2, ds2, ts2),…,cftn=(ftn, dtn, tsn), gt1, …,gtm where cf1i, …, cfsi sets for each 1 ” i ” n represent the same phenomenon, and g1j , g2j, …, gsj for each 1 ” j ” m represent the same discrete data layers. Attributes previously defined can be used to derive further useful information, as: x Integral(cf) = ³³D f ( x, y ) dxdy x x x

as parameter. In our language, the parameter is essentially composed by Temporal and Spatial operators, which verify relationships described in [1] and [2]. In the first part of this section, we present an adaption of Allen’s operators for managing the validity time periods of continuous fields. Let us consider a continuous field cf containing a validity time period Tcf = [ts, te] where ts represents the start time period and te the finish time period, and, again, a generic time period T = [t1, t2], then Allen’s operators are recast in CFSQL as follows: x Before(cf, T) returns a Boolean value indicating if the cf time validity completely happens before T, that is te < t1; x

Meets(cf, T) returns a Boolean value indicating if te = t1 holds;

x

Overlaps(cf, T) returns a Boolean value indicating if ts < t1 and t1 < te < t2 hold;

x

FinishedBy(cf, T) returns a Boolean value indicating if ts < t1 and te = t2 hold;

x

Contains(cf, T) returns a Boolean value indicating if ts < t1 and te > t2 hold;

x

Starts(cf, T) returns a Boolean value indicating if ts = t1 and t1 < te < t2 hold;

x

Equals(cf, T) returns Boolean value indicating if ts = t1 and te = t2 hold;

x

StartedBy(cf, T) returns Boolean value indicating if ts = t1 and te > t2 hold;

x

During(cf, T) returns Boolean value indicating if ts> t1 and te < t2 hold;

x

Finishes(cf, T) returns Boolean value indicating if t1 < ts < t2 and te = t2 hold;

x

OverlappedBy(cf, T) returns Boolean indicating if t1 < ts < t2 and te > t2 hold;

x

MetBy(cf, T) returns a Boolean value indicating if ts = t1 and te > t2 hold;

x

After(cf, T) returns Boolean value indicating if ts > t1 and te > t2 hold.

Area(cf) = ³³D dxdy Surface(cf) = ³³D 1  Density(cf) =

df

2

dx



df

2 dxdy

dy

Integral (cf )

Area (cf ) representing the volume contained between the function and the reference plan, the area of the continuous field geometry, the rate between them and finally the area or the length of the function describing the continuous field, respectively. Such elements are named derived attributes.

3. The Continuous Field SQL The CFSQL language is based on a previous work presenting a visual environment for continuous field querying, named Phenomena [3] which has been extended, so as to deal with temporal continuous phenomenaA generic CFSQL statement can be easily described by the following structure: SELECT FROM WHERE . The FROM clause calculates the Cartesian product among the elements appearing in the and stores the results in a Multifield table. As for the WHERE and the SELECT clauses, we describe their adaption in the following sections.

3.1. The WHERE clause Once a FROM clause has been applied, discarding some rows might be necessary. To this aim, a WHERE clause can be specified, by using the

value

Sometimes, the comparison of two different validity time periods could be useful to verify what relationship holds between them. In this case, the op(cf, cf1) notation, where op is one of the previous operators and cf’ is a continuous field having the T’ validity time period, is used to indicate the corresponding op(cf, T’) operator. For what concerns spatial relationships, as for the temporal ones, they can be used to verify conditions

Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04) 0730-3157/04 $20.00 © 2004 IEEE

regarding continuous fields and therefore select fields or multifields which satisfy them. As previously, g and cf indicate generic discrete data and continuous fields, respectively. Moreover, we refer to interior, boundary and exterior of a geometry according to the definition in [2]. x Touches(cf, g), returns true if the only common points between d and g lie in the union of the boundaries of d and g; x Crosses(cf, g), returns true if the intersection of d and g results in a value whose dimension is less than the maximum dimension of d and g and the intersection value includes points interior to both d and g, and the intersection value is not equal to either; x Within(cf, g), returns true if d domain is completely contained in g; x Contains(cf, g): returns true if g is completely contained in g; x Overlaps(cf, g): returns true if the intersection of d and g results in a value of the same dimension as d and g but is different from both d and g; x Disjoint(cf, g), returns true if the intersection of d and g is the empty set; x Equals(cf, g), returns true if the Contains and the Within are true at the same time; x Distance(cf, g): returns the shortest distance between d and g.

3.2 The SELECT clause Once the select operation has been performed, it is possible to apply several functions in order to extract subparts or aggregate continuous fields. In CFSQL particular operations have been defined, which are divided in three categories, namely intensity, spatial and aggregate functions. Intensity Functions. In order to extract continuous fields subparts where a particular condition on the surface holds, the following set of functions have been defined. x Min(cf) returns the lower local minimum point; x Max(cf) returns the higher maximum point set; x Concave(cf) returns a continuous field containing cf subparts whose domains corresponds to the regions where the function is concave; x Convex(cf) returns a continuous field containing cf subparts whose domains corresponds to the regions where the function is convex; x Saddle(cf) returns a continuous field containing cf subparts whose domains corresponds to the regions where the function is a saddle;

x

x

Gradient(cf, Condition) returns a continuous field containing the cf subparts whose domains corresponds to the regions where the gradient function respect the condition Condition; GetValue(cf, Condition) returns a continuous field containing the cf subparts whose domains corresponds to the regions where the f function respect the condition Condition.

Spatial Functions. As for intensity functions, spatial functions are able to extract continuous field subparts. However, in this case, conditions are posed on the continuous field domains, that is: x Interior(cf, g) returns a continuous field containing cf subparts whose domains corresponds to the intersection between the originary domain and the interior of g. x Exterior(cf, g) returns a continuous field containing cf subparts whose domains corresponds to the intersection between the originary domain and the exterior of g. x Boundary(cf, g) returns a continuous containing the cf subparts whose domains corresponds to the intersection between the originary domain and the boundary of g. Aggregate Functions. Sometimes, it is useful to have the capability to derive collections of values from a dataset, by extracting information from query results. As an example, determining the highest acclivity, calculating the mean temperature, summarizing the effects of electromagnetic fields, represent data of interest in many studies, related to the monitoring of environmental resources. Basically, this set of operations corresponds to the aggregate functions, which are usually applied only on alphanumeric results. The aim of this section is to extend them in order to derive information from query results on continuous fields. In the CFSQL two kinds of aggregate functions exist, the column aggregate function and the row aggregate function. The column aggregates continuous field instances representing the same phenomenon, while the row aggregates continuous field instances belonging to different phenomena or discrete data. The Column aggregate function set is made up of eleven kinds of functions, namely Minimum, Maximum, Mean, Sum, Count, Intersection, Union and Difference, the last three having two different signatures. Formally, they are defined as follows. Let cft be a continuous field table, which is composed by the continuous fields cf1, …, cfn. Let Feature be

Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04) 0730-3157/04 $20.00 © 2004 IEEE

one among the derived attributes, Integral, Area, Density and Surface, and let O be an operation, then: x Min(Feature, cft) = cfm  cft such that Feature(cfm) ” Feature(cfi) for each i: 1 ” i ”n x Max(Feature, cft) = cfm  cft such that Feature(cfm) • Feature(cfi) for each i: 1 ” i ”n x Mean(Feature, cft) = r  ƒ such that ¦ i Feature (cfi ) , for each i: 1 ” i ”n r= n x Sum(Feature, cft) = r  ƒ such that r = ¦ i Feature(cfi ) , for each i: 1 ” i ”n

x x

x x

x x

Count(cft) = i  Integer such that i = n; Intersection(O, cft) = cf such that d = i Area (cfi ) and for each Pxy belonging to d, Pxy = Oi fi(x, y) ; Union(O, cft) = cf such that d = i Area (cfi ) and for each Pxy belonging to d, Pxy = Oi fi(x, y); Difference(O, cft) = cf such that d is the difference between areas and for each Pxy belonging to d, Pxy = Oi fi(x, y); Intersection(Area, cft) = g  Geometry such that g = i Area (cfi ) ; Union(Area, cft) = g  Geometry such that g = i Area (cfi ) ;

Difference(Area, cft) = g  Geometry such that g is the difference between areas. Sometimes, it can be useful to aggregate features belonging to different continuous fields. As an example, if we need to find regions where temperature is higher than 20° C and pressure is lower than 850mb at the same time, these kinds of operations are performed by using the row aggregate functions which are defined in the following. Let mf = cf1, cf2, …, cfn, g1, g2, …, gm be a Multifield, then: x Intersection(mf) returns the intersection of the areas of the continuous fields; x Union(mf) returns the union of the areas of the continuous fields; x Difference(mf) returns the difference of the area of the continuous fields. Thus, regions where temperature is higher than 20° C and pressure is lower than 850mb at the same time can be found by specifying the following statements. SELECT (intersection(Temperature.getValue(“>=20”), Pressure.getValue(“