Locality preserving dictionaries - Semantic Scholar

21 downloads 0 Views 214KB Size Report
stead of simply clustering data at a page granularity, LPDs ... bandwidth is much higher than disk seek time. .... de ned as its capacity and is denoted by cap(i).
Locality preserving dictionaries: theory & application to clustering in databases Vijayshankar Raman CS Division, UC Berkeley, Berkeley, CA 94720 [email protected] Phone: 510-642-1863 Fax: (510)642-5615

Abstract We discuss strategies for building locality preserving dictionaries (LPDs) in which all data items within a range lie together, within a space that is a small function of the number of items in the range. We describe an approach where the memory space is partitioned and items are placed in sorted order, with judiciously placed gaps between them, resulting in ecient insert, delete, and search operations. We adapt our algorithms to the particular application of storing database relations on disk via LPDs. By providing a natural clustering mechanism for data in a sorted order instead of simply clustering data at a page granularity, LPDs provide much better I/O performance than traditional clustered indexes on range searches, as well as on access of data in sorted order. Analytical studies of LPDs and clustered B-Trees show that using LPDs results in up to 5 to 13 times faster range searches and sorted order accesses over using a clustered B-Tree, at the expense of 0 to 75% overhead in storage needs and up to 28% overhead in insert/delete costs.

size is not a solution since that will increase the cost of each update and hence slow transaction processing. In this paper we present locality preserving dictionaries (LPDs) in which all data items within any (one-dimensional) range lie together, irrespective of the range size. Due to this stronger clustering, both range searches and accesses in sorted order involve no random seeks, and are not constrained by access latency. Our analysis show that LPDs outperform clustered B-Trees by a factor of 5 to 10 for range searches, with an insertion overhead of about 28% and a space overhead of 75%. We believe that the space overhead is not serious because disks are getting cheaper and denser { disk capacities double, and disk costs halve, every 18 months [PK98]. The simplest incarnation of a locality preserving (LP) data structure is a sorted array, which unfortunately has expensive inserts and deletes. We tackle this problem by judiciously placing extra spaces between the items. While most inserts can be handled by \nearby" spaces, repeated inserts in the same region will ll up the space and require an reorganization. We prove that for any distribution of values and any order of inserts/deletes, reorganization is rare, and the overhead is amortized e ectively. We partition space recursively into segments of increasing size and pad each segment with extra space so as to get good performance for inserts and deletes. This space partitioning and padding technique turns out to be a general framework, and we have found two di erent approaches with di erent time complexities for the operations and space constraints for locality, with tradeo s between them. Given a structure with n items, our best method gives an amortized time complexity of O((1=)(log 2 n)) for inserts and deletes and O(log n) for search, with a O(r1+ ) locality constraint for a range of r items, where  can be made as close to 0 as desired. The data movement or I/O complexity of inserts and deletes is only O((1=)(log n)). An important feature of our data structure is that it allows a range to be de ned via any total order on the items, and the locality we obtain for a range is in terms of the number of items in the range. This is in contrast to approaches based on hashing (see for instance [LS96]) that provide locality in terms of the size of the range. Our data structure is more robust to skewed data distributions where di erent ranges in the data domain have di erent densities. To our knowledge, this is the rst LP data structure that provides this notion of locality with sublinear inserts/deletes.

1 Introduction Avoidance of random disk I/O is a key issue in database management systems (DBMSs) because the sequential read bandwidth is much higher than disk seek time. This is not a temporary phenomenon, but instead re ects the general trend in the improvement of storage technology; over time random I/Os are only getting slower and slower when compared to sequential reads. According to [PK98], rotational delay and seek times halve once every 10 years whereas read bandwidth doubles every 2 years. DBMSs try to tackle this problem by clustering data so as to minimize the number of I/Os. This clustering is typically done based on the sort order of some eld at the granularity of a page and a (clustered) index is used for range searches. However very little e ort is made to cluster data across page boundaries. For instance in a B-Tree index, the leaf pages are not stored consecutively on disk. This ad-hoc, intra-page clustering has very bad I/O performance since reading random pages is about 10 times slower than sequential reads [GG97]. Increasing the page

To appear in ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Philadelphia, June 1999. 1

1.1 Application of LPDs to database clustering The immediate application of LPDs is to store relations on disk. Disk I/O has a fairly large per-request cost[GK97] and random I/O imposes a high seek cost. Traditionally, B-Trees are used to index data on the disk and this data is clustered at a page granularity to minimize I/O. An LPD will allow for a much stronger notion of clustering because a range search involving X pages can be performed as a single large (X page) I/O. With a clustered B-Tree, it may involve X separate page I/Os. Also, an LPD provides guaranteed locality in the face of dynamic inserts and deletes whereas a typical B-Tree will use a \ ll factor" to constrain the amount of data on a page (so as to allow inserts and deletes without too many index page splits/merges). Another problem with B-Trees is that they need periodic re-building after lots of updates have weakened the clustering [ZS96]. In contrast, LPDs self-maintain their clustering in the face of any sequence of inserts and deletes. Another advantage of LPDs is that they maintain items in sorted order (albeit with spaces between them). In DBMSs, many operations such as sort-merge join, group-by, and orderby need to read items in sorted order. While reading tuples in sorted order from an LPD involves only a sequential scan on disk, reading tuples in sorted order from a clustered BTree index will involve random page I/Os in general, and so typically sorting is often done as a separate operation. Sorting is an expensive operation, and it also ruins the ability to pipeline the query execution.

adapt our algorithms to the problem of clustering relations on disk in Section 5, and present a preliminary analytically study of the performance of algorithms on B-Trees and LPDs in Section 6. We brie y look at alternative space partitioning structures in Section 7. Conclusions and possible future work are outlined in Section 8. 2 An LP space-partitioning technique In this section we present a technique to partition space in an array, and place items in it with padding, so as to preserve locality and also support insert, search, and delete operations eciently. 2.1 Design guidelines We maintain items in the array in sorted order. This allows us to easily search for a required item, and also allows easy locality preservation. We segment the LP datastructure, with each segment being (recursively) another LP data-structure. Segmentation allows us to (a) incrementally grow the segment by adding segments to it and (b) on an insert, shifting all the items to the right (or left) is transformed into bubbling the extra item through the segments on the right { we de ne bubbling as inserting a smallest item at the beginning of the segment and removing the highest item out of the segment (a similar de nition applies to the insertion of a highest element). By making one segment consist of a sequence of several smaller capacity segments, we can do inserts quickly. In order to avoid shifting all the items on a bubbling, we need to pad the items with extra spaces, to \cushion" the e ect. Periodic reorganization will be needed when the space gets lled up. In order to limit the frequency of reorganization, the amount of spacing that we add must be a function of the capacity of the array.

1.2 Related Work There is some recent work on non-expansive hashing, both in one and multiple dimensions. Linial and Sasson [LS96] give a family of hash functions that ensure that any two integers p and q are mapped onto places f (p) and f (q) such that jf (p) ? f (q)j  jp ? qj. Their scheme needs O(n2 ) space where there are n integers to be hashed. However, their notion of locality is di erent: the distance between two items is bounded by the di erence in values and not by the actual number of items between them. Therefore LP hashing is not appropriate for dictionary operations against skewed data distributions where di erent ranges in the data domain could have di erent densities. Linial and Sasson pose supporting our idea of locality (in terms of the number of items in the interval) as an open1+problem. They also provide schemes that need only O(n ) space (for any  > 0) but these split the space into O(log(1=)) tables and locality is preserved only within tables. Recently, Dasgupta, Nayak and Ostrovsky [DNO98] have found a scheme where the space needed is only O(n) and all operations take only O(1) time, but the locality is only in the expected case. Indyk et. al. [IMRV97] extend the results of [LS96] to multiple dimensions. Hennie and Stearns [HS66] use a construction similar in spirit to our padding approach for bubbling, to prove a tapereduction theorem. They use a clever organization of k items within 2k cells to allow movement of all items in the structure to the right one item at a time in an ecient manner: n movements take O(n log n) steps.

2.2 Formal Description A Locality Preserving Dictionary (LPD) is a contiguous memory segment Si (see Figures 1 and 2) that is recursively composed of a symmetric sequence of subsegments Si?1 : : : S1 S0 S0 S1 : : : Si?1 . S0 is de ned to consist of a single memory cell. Data items are stored in ascending order in a segment, and span a contiguous substring of the subsegments Slt : : : S0 S0 : : : Srt i.e., all subsegments Si?1 : : : Slt+1 and Srt+1 : : : Si?1 are empty, Slt?1 : : : Srt?1 are lled to capacity, and Slt ; Srt are partially lled. We also stipulate that lt must be in the left half of the sequence of subsegments and that rt must be in the right half of the sequence of subsegments. The maximum number of data items that Si can hold is de ned as its capacity and is denoted by cap(i). The number of memory cells occupied by Si is denoted by space(i). S0 is de ned to have aPcapacity of 1. For i > 0, Si has a capacity cap(i) = (1=2) j 0. P space(i) = 2  3 Similarly, cap(i) = (1=2) j 0. Also, space(0) = cap(0) = 1.

1.3 Outline of the paper In Section 2 we describe the space partitioning structures and the design choices we made. We describe the algorithms for dictionary operations in Section 3. We then generalize our LPD to use any amount of padding in Section 4. We 2

S(i)

partly filled space padding S(i-1)

fully filled S(lt)

S(0) S(0)

space padding S(rt)

S(i-1)

Figure 1: Data Arrangement in the dictionary

Capacity

Structure S(0)

1

S(1)

1

S(2)

2

S(3)

S(2)

S(2) S(1)

4

S(1)

Figure 2: Space partitioning in the dictionary

S(i) [lt,rt,min,max]

S(lt)[lt’,rt’, min,max] S(lt’)[...]

S(0)

S(0)

S(rt’)[...]

S(rt)[lt’’,rt’’, min,max] S(lt’’)[...]

Figure 3: The metadata tree

3

S(rt’’)[...]

insertmin(Si ): if Slt is not full, do insertmin(Slt ) else if Slt is full and lt < i ? 1 , set lt lt + 1 and do insertmin(Slt ) else Reorganize(Si ) and then repeat. deletemax(Si ): This is similar, except that we reorganize when rt = 0 and S0 will become empty after the delete (this will violate the criterion that rt must be on the right half).

2.3 Space Density We de ne the space density of a (contiguous) range of data items as the space occupied by the range divided by the number of items in the range. If the entire database is mapped onto a structure Si (we explain the details in Section 5), its space density is space(i)=cap(i) = 2  1:5i?1 . Although this appears excessive, we will see that after generalizing to arbitrary ll factors (Section 4) this overhead is only about 2.5 for current disk sizes. Denote this global space density by SD. It is easy to see that, since the items are arranged in sorted order, the space density of any given range is on the average O(SD) (the proof is given in Appendix B).

Time P Complexity: Assign a potential function of 2 i?1j