The SH-tree

12 downloads 0 Views 97KB Size Report
Dang Tran KHANH, Josef KÜNG, Roland WAGNER. Institute for ..... Thomas Seidl, Hans-Peter Kriegel: “Efficient User-Adaptable Similarity Search in Large.
The SH-tree: A Super Hybrid Index Structure for Multidimensional Data Dang Tran KHANH, Josef KÜNG, Roland WAGNER Institute for Applied Knowledge Processing (FAW) University of Linz, Austria {khanh, jkueng, rwagner}@faw.uni-linz.ac.at Abstract. Nowadays feature vector based similarity search is increasingly emerging in database systems. Consequently, many multidimensional data index techniques have been widely introduced to database researcher community. These index techniques are categorized into two main classes: SP (space partitioning)/KD-tree-based and DP (data partitioning)/R-tree-based. Recently, a hybrid index structure has been proposed. It combines both SP/KDtree-based and DP/R-tree-based techniques to form a new, more efficient index structure. However, weaknesses are still existed in techniques above. In this paper, we introduce a novel and flexible index structure for multidimensional data, the SH-tree (Super Hybrid tree). Theoretical analyses show that the SHtree is a good combination of both techniques with respect to both presentation and search algorithms. It overcomes the shortcomings and makes use of their positive aspects to facilitate efficient similarity searches. Keywords: similarity search, multidimensional index, bounding sphere (BS), minimum bounding rectangle (MBR), super hybrid tree (SH-tree).

1 Introduction Feature based similarity search has a long development process which is still in progress now. Its application range includes multimedia databases [33], time-series databases [32], CAD/CAM systems [34], medical image databases [27], etc. In these large databases, feature spaces have been usually indexed using multidimensional data structures. Since Morton introduced the space-filling curves in 1966 up to now, many index structures have been developed. A survey schema that summarizes the history of multidimensional access methods from 1966 to 1996 has been presented in [1]. This summary and two recent publications [2, 19] show that multidimensional index techniques can be divided into two main classes: Index structures based on space partitioning (SP-based) or KD-tree-based such as kDB-tree [6], hB-tree [7], LSD-tree and LSDh-tree [8, 9], Grid File [10], BANG file [11], GNAT tree [29], mvp-tree [35], SKD-tree [28], etc. Index structures based on data partitioning (DP-based) or R-treebased consist of R-tree and its improved variants [12, 13, 14], X-tree [15], SS-tree [5], TV-tree [3], SR-tree [4], M-tree [20], etc. The remains, which can not be categorized into the above schema, are called dimensionality reduction index techniques [19] like Pyramid technique [16, 17], UB-tree [18], space-filling curves (see [1] for a survey).

Recently, the Hybrid tree 1 [2, 19], a hybrid technique has been proposed. It is formed by combining both SP and DP based techniques. For detailed explanations of classification, see [1, 2, 19]. This paper is organized as follows: Section 2 discusses motivations, which lead us to introduce the SH-tree. Section 3 is devoted to discuss structure and advanced aspects of the SH-tree. Section 4 presents update operations, query algorithms with the SH-tree. Section 5 gives conclusions and future work.

2 Motivations The SR-tree [4] has shown superiorities over the R*-tree and the SS-tree by dividing feature space into both small volume regions (using bounding rectangles–BRs) and short diameter regions (using bounding spheres–BSs). Nevertheless, the SR-tree must incur the fan-out problem: only one third of the SS-tree and two third of the R*-tree [4]. The low fan-out causes the SR-tree based searches to read more nodes and to reduce the query performance. This problem does not occur in the KD-tree based index techniques: the fan-out is constant for arbitrary dimension number. Recently, the Hybrid tree [2, 19] has been introduced. It makes use of positive characteristics of both SP-based and DP-based index techniques. It depends on the KD-tree based layout for internal nodes and employs bounding regions (BRs) as hints to prune while traversing the tree. To overcome the access problem of unnecessary data pages, the Hybrid tree also applies a dead space eliminating technique by coding actual data regions (CADR) [9]. Although the CADR technique partly softens the unnecessary disk access problem, it is still not a high efficient solution to solve the entire problem. It strongly depends on the number of bits used to code the actual data region and, in some cases, this technique does not benefit regardless of how many bits are used to code space. Figure 1a and 1b show examples like that in 2-dimensional space. Here the whole region is coded irrespective of how many bits are used. Figure 1c shows an example where the benefit from coding the actual data region is not interesting, especially for range queries. This is due to the high remaining dead space ratio in the coded data region. Besides, when new objects locate outside the bounds of feature space already indexed by the Hybrid tree, the encoded live space (ELS) [19] must be recomputed from scratch. Furthermore, the SP/KD-tree based index techniques in common recursively partition space into two subspaces using a single dimension until the data object number in each subspace can be stored in a single data page as the Hybrid tree, the LSDh-tree, etc. This partitioning way leads cluster of data to be quickly destroyed because the objects stored in the same page are “far away” in the real space. This problem could significantly influence the search performance; increase the number of disk accesses per range query [1]. It is contrary to the DP/R-tree based index techniques as the SS-tree, the SR-tree, etc. They try to keep near objects in the feature space into each data page. 1

Internal nodes presentation idea is similar to one introduced by Ooi et al in 1987 for the Spatial KD-tree [28].

To alleviate these problems and take inherent advantages of the SR-tree (the R-tree based techniques as a whole), together with introducing novel worth attentions we will present the SH-tree in the successive section. In the SH-tree, the fan-out problem will be overcome by employing the KD-tree presentation for partitions of internal nodes. The data cluster problem as mentioned above, however, is softened by still keeping the SR-tree-like structure for presentation of balanced and leaf nodes of the SH-tree (c.f. section 3.1). Section 3 will detail these ideas.

Coded data space Dead space Coded dead space

a

b

c

Fig. 1. Some problems with coding actual data region

3 The SH-tree This section is dedicated to introduce the SH-tree. We are going to discuss how to split multidimensional space into subspaces and introduce a very special hybrid structure of the SH-tree. 3.1 Partitioning Multidimensional Space in the SH-tree Because the SH-tree is planned to apply not only for point data objects, but also for extended data objects we choose no overlap-free space partitioning. This approach easily control objects that cross a selected split position and solve the storage utilization problem. The former had been described in the SKD-tree [28] and the latter has happened to the kDB-tree, which shows uninterestingly slow performance even in 4-dimensional feature vector spaces [21]. There are three node kinds in the SH-tree: Internal, balanced and leaf nodes. Each internal node i has structure , where d is split dimension, lo represents the left (lower) boundary of the right (higher) partition, up represents the right (higher) boundary of the left (lower) partition and other_info consists of additional information as the data object number of its left, right child. While up=lo means no overlap between partitions, up>lo indicates that partitions overlap. This structure is similar to ones introduced in the SKD-tree [28] and the Hybrid tree [2]. The supplemental information also gives hints to develop a cost model for the nearest neighbor search in high-dimensional space, query selectivity estimation, etc. Moreover, let BRi denote bounding rectangle of internal node i. The BR of its left child is defined as BRi∩(d ≤ up). Note that ∩ denotes geometric intersection. Similarity, the BR of its right child is defined as BRi∩(d ≥ lo). This allows us to apply algorithms used in the DP/R-tree based techniques to the SH-tree. Balanced nodes are just above leaf nodes and they are not hierarchical (figure 2). Each of them has a similar structure to that of an internal node of the SR-tree This is a

specific characteristic of the SH-trees. It conserves the data cluster, in part, and makes the height of the SH-tree smaller as well as employing the SR-tree’s superior aspects. Moreover, it also shows that the SH-trees are not simple in binary shape as in the KDtree based techniques. They are also multi-way trees as R-tree based index techniques: (minBN_E ≤ n ≤ maxBN_E) BN: Bi: A balanced node consists of entries B1, B2, … Bn, (minBN_E ≤ n ≤ maxBN_E) where minBN_E and maxBN_E are the minimum and maximum number of entries in the node. Each entry Bi keeps information of a leaf node including four components: a bounding sphere BS, a minimum bounding rectangle MBR, the object number of leaf node num and a pointer to it pointer. Furthermore, computing MBS (minimum BS) of a given objects set is not feasible in a high-dimensional space, since the time complexity is exponential in the dimension number [25]. Therefore, the SH-tree preliminarily uses MBRs and only BSs. See [4] for the calculation formula of BS. 10

d=1 lo=6 up=6

3

1

2

12

13

8 4

7

d=2 lo=3 up=4

14

8

6

1

2

4

6

4

10

2

9

15

d=2 lo=8 up=8 10

11

4

6

8

16

12

13

14

9 8

2

15

11

16

0 0

7

6

5

5

3

d=2 lo=5 up=6

10

d=1 lo=3 up=3 1

10

2

3

Internal node

11

MBR

Balanced node

BS Leaf node

Overlapping space 4

5

6

7

8

9

Fig. 2. A possible partition of a data space and corresponding mapping to the SH-tree

Each leaf node of the SH-tree has the same structure as that of the SS-tree (because the SR-tree [4] is just designed for point objects but the SH-tree is also planned for both points and extended objects): (minO_E ≤ m ≤ maxO_E) LN: Li: A leaf node consists of entries L1, L2, … Ln, (minO_E ≤ n ≤ maxO_E) where minO_E and maxO_E are minimum and maximum number of entries in a leaf. Each entry Li consists of a data object obj and information in the structure info as a feature vector, the radius bounds the object’s extent in the feature space, object’s MBR, etc. If objects in database are complex, then obj is its identifier instead of a real data object. In addition, in case that the SH-tree is only applied for point data objects, each Li is similar to that of the SR-tree: Li: . In this case, the other

information of the objects is no longer needed. For example, the parameter radius is always equal to zero and MBR is the point itself. Figure 2 shows a possible partition of a feature space and its corresponding mapping to the SH-tree. Assume we have a 2-dimensional feature space D with a size of (0,0,10,10). With (d, lo, up)=(1,6,6), the BRs of left and right children of the internal node 1 are BR2=D∩(d ≤ 6)=(0,0,6,10) and BR3=D∩(d ≥ 6)=(6,0,10,10), individually. For the internal node 2, (d, lo, up)=(2,3,4), BR4=BR2∩(d ≤ 4)=(0,0,6,4), BR5=BR2∩(d ≥ 3)=(0,3,6,10) and so on. The BRs information is not stored in the SHtree, but it is computed when necessary. Furthermore, the storage utilization of the SH-tree must ensure that each balanced node is filled with at least minBN_E entries and each data page contains at least minO_E objects. Therefore, each subspace according to a balanced node holds N data objects and N satisfies the following condition: minO_E x minBN_E ≤ N ≤ maxO_E x maxBN_E

(1)

3.2 The Extended Balanced SH-tree For almost index techniques based on the KD-tree, the tree structure is not balanced (e.g., the LSD/LSDh-tree, the SKD-tree). It means that there are leaf nodes that are farther away from the root than all others are. The experiments of [29] have shown that a good balance is not crucial for the performance of the index structure. In this section, we introduce a new conception for the balance problem in the SH-tree: extended balance. The motivation is to retain acceptable performance of the index structure and reduce maintenance cost for its exact balance. Suppose that p, b, b_min, b_max denotes leaf node number, balanced node number, minimum and maximum number of balanced nodes in the SH-tree, respectively. The following inequality holds: ⎡ ⎤ ⎡ ⎤ p p b_min = ⎢ ⎥ ≤ b ≤ ⎢ ⎥ = b_max max BN _ E min BN _ E ⎢ ⎥ ⎢ ⎥

(2)

The SH-tree’s height h satisfies the following inequality: 1 + ⎡log2b_min ⎤ ≤ h ≤ ⎡log2b_max ⎤ + 1

(3)

Inequality (3) is used to evaluate whether the SH-tree is “balanced” or not. The meaning of balance here is loose. It does not mean that path length of every leaf node from the root is equal. We call this extended balance in the SH-tree. If the height hl of each leaf node in the SH-tree satisfies (3), i.e. 1+ ⎡log2b_min ⎤ ≤ hl ≤ ⎡log2b_max ⎤ +1, then the SH-tree is called an extended balanced tree (EBT), and otherwise it is not a balanced tree. The extended balance conception generalizes the conventional balance conception: if inequality (3) becomes 1+ ⎡log2b_min ⎤ =h= ⎡log2b_max ⎤ +1, then an EBT becomes a conventional balanced tree (CBT). If minBN_E=2 and maxBN_E=3, the SH-tree in figure 3 is not a CBT or an EBT; it is not a balanced tree. The inequality (3) can be also extended as follows:

1 + ⎡log2b_min ⎤ - x ≤ h ≤ ⎡log2b_max ⎤ + 1 + x

(4)

or a more general form: 1 + ⎡log2b_min ⎤ - x ≤ h ≤ ⎡log2b_max ⎤ + 1 + y

(5)

In (4), (5) x and y are acceptable “errors”. These parameters give more flexibility to the SH-tree but they must be carefully selected to prevent from creating a too much unbalanced tree. The SH-tree does not satisfy (3) but (4) or (5) will be called loosely extended balanced tree (LEBT). For example, concerning the SH-tree in figure 3 then (3) becomes 4 ≤ h ≤ 4 (here b_min= ⎡16 / 3⎤ =6 and b_max=8). If the SH-tree satisfies this condition, it really becomes a CBT (also EBT). We can readjust this condition with x=1 and get the new condition concerning (4): 3 ≤ h ≤ 5. With respect to the new condition, the above SH-tree can be considered a LEBT. The parameter x (and y) in (4) (and (5)) depends on many attributes, say p, minBN_E, maxBN_E and so on. If x is suitably chosen, the maintenance cost of the SH-tree is substantially decreased but does not affect the querying performance. In general, if the SH-tree fails to satisfy (4), it needs to be reformed. The reformation can entirely reorganize the SH-tree (also called dynamic bulk loading) or suitably change splitting algorithm. Henrich has presented a hybrid split strategy for KD-tree based access structures [22]. It depends on weighted average of the split positions calculated using two split strategies, data dependence and distribution dependence. Notice that the dynamic reformation operation usually incurs substantial costs including both I/O accesses and CPU time. An efficient algorithm for the SHtree reformation is still an open problem. 3.3 Splitting Nodes in the SH-tree In the context of dynamic databases, which means that the SH-tree is incrementally created and in that process, the data objects can be added or deleted, we present leaf nodes splitting and balanced nodes splitting in the SH-tree. Leaf nodes splitting. The boundary of a leaf node in the SH-tree is the geometric intersection between its MBR and BS, but BS is isotropic thus it is not suitable for choosing the split dimension. Therefore, the choice of the split dimension depends on its MBR. This problem is solved in the same way as that of the Hybrid tree including overlap free splitting. The selected split dimension must minimize the disk access number. Without loss of generality, assume that the space is d-dimensional and extent of MBR along the ith dimension is ei, i= [1,d]. Let range query Q be a bounding box with each dimension of length r. Prove as done in [4] to get result: the split dimension is k if

r ek + r

is the minimum. Therefore, split dimension k is chosen such that its

extent in MBR is the maximum, i.e. ek=max(ei), i=[1,d]. The next step is to select the split position. First, we check if it is possible to split in the middle without violating the utilization constraint. If it is impossible, we distribute data items equally into two

nodes. This way also solves the special case as shown in the hB-tree [7]. Figure 3 shows this case as an example in two-dimensional space. y

x

Fig. 3. Assume the split dimension x is chosen and the minimum data object number of each partition is three. There is no suitable split position if we apply the way of the Hybrid tree as described in [2]. In this case and other similar cases, the SH-tree distributes data items equally into two nodes.

Balanced nodes splitting. Because the balanced node has the similar structure to internal nodes of the SR-tree and the R*-tree, thus the internal nodes splitting algorithm of the R*-tree [24] can be applied to split overfull balanced nodes of the SH-tree. With the SH-tree, however, if the sibling of an overfull balanced node is also a balanced node and still not full, then an entry of the overfull balanced node can be shifted to the sibling to avoid a split. This method also increases the storage utilization [36, 9]. Thus, the modified splitting algorithm for the balanced nodes can be concisely described: First, try to avoid a node splitting as just discussed. If it fails, the split algorithm similar to that of the R*-tree is employed. Notice that, in the SH-tree, the balanced node split does not cause propagated splits upwards or downwards, which is called cascading splits [7] and happened to the kDB-tree [6].

4 The SH-tree Operations 4.1 Insertion Let NDO be a new data object to insert into the SH-tree. First, the SH-tree must be traversed from the root to locate leaf node w, which NDO will belong to. The best candidate is the node whose MBR is closest to NDO 2 . Ties are broken based on the nodes’ data object number. If there is an empty entry in this leaf, NDO is inserted. Conversely, the leaf is an overflow leaf node, then one object of this leaf can be redistributed to the sibling, which is still not full, to make space for NDO. This idea is the same as that of [36] but does not recursively go upward like that, the siblings here are locally located in the balanced node. In fact, the predefined constant l of the algorithm in [36] is similar to the current entry number (CEN) of the balanced node (minBN_E ≤ CEN ≤ maxBN_E). The parameter CEN for the SH-tree’s corresponding redistribution algorithm is different from each balanced node; this is a difference from the one presented in [36]. If a split is still compulsory, it can only propagate upwards at most one level. Figure 4 illustrates the split propagation in the SH-tree. In that, 2

The distance metric used here is MINDIST, described in [26]

assume leaf node P1 is selected to insert a NDO and P1’s entry number is maxO_E already. Moreover, suppose that the redistribution is also failed. Consequently, P1 is split into P1’ and P1”. Nevertheless, because maxBN_E=2 (minBN_E=1) in this example, the balanced node B1 is later split into B1’ and B1”. At last, a new internal node N is created. The split process is stopped and has no more propagation to upper level (root node R in this example). R

R

N

B1 B1’

B1”

P1

P1’

P1”

Fig. 4. Split in the SH-tree

4.2 Deletion After determining which leaf node contains the object and removing the object, the leaf may become under-full (it means that the object number kept in this leaf is less than minO_E). There are some solutions to solve this problem as discussed in [23]. An under-full leaf node can be merged with whichever sibling has least enlargement or its objects can be scattered among sibling nodes. Both of them can cause the node splitting, especially the latter can lead into a propagated splitting, say the balanced nodes splitting. The R-tree [23] employed re-insertion policy instead of two ones above. The SR-tree, the SS-tree, the R*-tree and the Hybrid-tree also employ this policy. We propose a new algorithm to solve the under-full leaf problem called eliminate-pull-reinsert. The algorithm is similar to eliminate-and-reinsert policy as well. However, because reinsertion can cause the splits of leaf and balanced nodes, thus after deleting the object, if the leaf node is under-full, we apply a “pull” strategy to get one object from the sibling so that this sibling still ensures utilization constraints. This also depends on the idea in section 4.1 but in a contrary direction. While the under-full leaf here “pulls” one object from the sibling, the overflow one, in section 4.1, “shifts” one object to the sibling. If the pull policy still does not solve the problem, the objects of the under-full leaf node are reinserted. Note that, the pull policy can also propagate to only the siblings located in the same balanced node. 4.3 Search The search operations of the SH-tree are similar to the SR-tree for the balanced nodes, leaf nodes and similar to the R-tree for the internal nodes. Because of the space limitation, we do not present them here. The detail discussion is referred to [31].

5 Conclusions and Future Work In this paper, we introduced the SH-tree for indexing multidimensional data. The SHtree is a flexible multidimensional index structure to support similarity searches in information systems. It is a well-combined structure of both the SR-tree and the KDtree based techniques. The SH-tree carries positive aspects of both the KD-tree and the R-tree families. While the fan-out problem of the SR-tree is overcome by employing the KD-tree like representation for partitions of internal nodes, the SH-tree still take advantages of the SR-tree by using the balanced nodes, which are the same as internal nodes of the SR-tree. Moreover, the tree operations in the SH-tree are similar to the R-tree family but there are many modifications to adapt them to the new structure. We also introduced a new concept for the SH-tree, the extended balanced tree (EBT). It implies that the SH-trees are not necessary to be exactly balanced, but the querying performance is still not deteriorated and the maintenance cost for the tree balance is reduced. As a part of the future work, we intend to compare the SH-tree to the SR-tree, the LSDh-tree and some other prominent multidimensional index structures as X-tree, SS-tree, M-tree, etc. We also plan to deploy the SH-tree for indexing features in the research project VASIS [30] at FAW institute.

References 1. V. Gaede, O. Günther. Multidimensional Access Methods. ACM Computing Surveys, Vol. 30, No. 2, June 1998. 2. K. Chakrabarti, S. Mehrotra. The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. Proc. of 15th International Conference on Data Engineering 1999. IEEE Computer Society. 3. King-Ip Lin, H.V. Jagadish, C. Faloutsos. The TV-Tree: An Index Structure for HighDimensional Data. VLDB Journal, Vol. 3, No. 1, January 1994. 4. N. Katayama, S. Satoh. The SR-Tree: An Index Structure for High Dimensional Nearest Neighbor Queries. Proc. of the ACM SIGMOD International Conference on Management of Data, 1997. 5. D.A. White, R. Jain. Similarity Indexing with the SS-Tree. Proc. of the 20th International Conference on Data Engineering, 1996. IEEE Computer Society. 6. J.T. Robinson. The k-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes. Proc. of ACM SIGMOD International Conference on Management of Data, 1981. 7. D.B. Lomet, B. Salzberg. The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance. ACM Trans. on Database Systems, Vol. 15, No. 4, Dec. 1990. 8. A. Henrich, H.W. Six, P. Widmayer. The LSD Tree: Spatial Access to Multidimensional Point and Nonpoint Objects. Proc. of 15th VLDB, August 1989. 9. A. Henrich. The LSD/sup h/-tree: An Access Structure for Feature Vectors. Proc. of 14th International Conference on Data Engineering, 1998. IEEE Computer Society. 10. J. Nievergelt, H. Hinterberger, K.C. Sevcik. The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Trans. on Database Systems Vol. 9, No. 1, March 1984. 11. M. Freeston. The BANG file: A new kind of grid file. Proc. of the ACM SIGMOD Annual Conference on Management of Data, 1987. 12. A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. Proc. of ACM SIGMOD Conference, 1984.

13. T.K. Sellis, N. Roussopoulos, C. Faloutsos. The R+-Tree: A Dynamic Index for MultiDimensional Objects. Proc. of 13th VLDB, September 1987. 14. N. Beckmann, H.P. Kriegel, R. Schneider, B. Seeger. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990. 15. S. Berchtold, D.A. Keim, H.P. Kriegel. The X-tree: An Index Structure for HighDimensional Data. Proc. of 22nd VLDB, September 1996. 16. S. Berchtold, C. Böhm, H.P. Kriegel. The Pyramid Technique: Towards Breaking the Curse of Dimensionality. Proc. of ACM SIGMOD International Conference on Management of Data, June 1998. 17. J. Küng, J. Palkoska. An Incremental Hypercube Approach for Finding Best Matches for Vague Queries. Proc. of the 10th International Workshop on Database and Expert Systems Applications, DEXA 99. IEEE Computer Society. 18. R. Bayer. The Universal B-Tree for Multidimensional Indexing. Technical Report TUMI9637, November 1996. (http://mistral.informatik.tu-muenchen.de/results/publications/) 19. K. Chakrabarti, S. Mehrotra. High Dimensional Feature Indexing Using Hybrid Tree. Technical Report, Department of Computer Science, University of Illinois at Urbana Champaign. (http://www-db.ics.uci.edu/pages/publications/1998/TR-MARS-98-14.ps) 20. P. Ciaccia, M. Patella, P. Zezula. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. Proc. of VLDB 1997. 21. D. Greene. An implementation and performance analysis of spatial data access methods. Proc. of 5th International Conference on Data Engineering 1989. IEEE Computer Society. 22. A. Henrich. A hybrid split strategy for k-d-tree based access structures. Proc. of the fourth ACM workshop on Advances on Advances in geographic information systems, 1997. 23. A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD, Proc. of Annual Meeting, June 1984. 24. N. Beckmann, H.P. Kriegel, R. Schneider, B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. Proc. of ACM SIGMOD International Conference on Management of Data, 1990. 25. R. Kurniawati, J.S. Jin, J.A. Shepherd. The SS+ -tree: An Improved Index Structure for Similarity Searches in a High-Dimensional Feature Space. SPIE Storage and Retrieval for Image and Video Databases V, San Jose, CA, 1997. 26. N. Roussopoulos, S. Kelley, F. Vincent. Nearest neighbor queries. Proc. of ACM SIGMOD International Conference on Management of Data, 1995. 27. F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, Z. Protopapas. Fast Nearest Neighbor Search in Medical Image Databases. Proc. of VLDB 1996. 28. B.C. Ooi, K.J. McDonell, R. Sacks-Davis. Spatial kd-Tree: A Data Structure for Geographic Databases. Proc. of COMPSAC 87, Tokyo, Japan. 29. S. Brin. Near Neighbor Search in Large Metric Spaces. Proc. VLDB 1995. 30. FAW Institute, Johannes Kepler University Linz. VASIS – Vague Searches in Information Systems. (http://www.faw.at/cgi-pub/e_showprojekt.pl?projektnr=10) 31. D.T. Khanh, J. Küng, R. Wagner. The SH-tree: A Super Hybrid Index Structure for Multidimensional Data. Technical Report, VASIS Project. (http://www.faw.uni-linz.ac.at) 32. C. Faloutsos, M. Ranganathan, Y. Manolopoulos. Fast subsequence matching in time-series databases. ACM SIGMOD International Conference on Management of Data, 1994. 33. Thomas Seidl, Hans-Peter Kriegel: “Efficient User-Adaptable Similarity Search in Large Multimedia Databases”. VLDB 1997. 34. S. Berchtold, H.P. Kriegel. S3: Similarity Search in CAD Database Systems. Proc. of ACM SIGMOD International Conference on Management of Data, 1997. 35. T. Bozkaya, M. Ozsoyoglu. Indexing Large Metric Spaces for Similarity Search Queries. ACM Transactions on Database Systems. Vol. 24, No. 3, September 1999. 36. A. Henrich. Improving the performance of multi-dimensional access structures based on kd-trees. Proc. of the 12nd International Conference on Data Engineering, 1996.