Storing and Indexing Spatial Data in P2P Systems - DBLab

1

Storing and Indexing Spatial Data in P2P Systems Verena Kantere, Timos Sellis, Spiros Skiadopoulos

Abstract— The P2P paradigm has become very popular for storing and sharing information in a totally decentralized manner. Until recently, research has focused mainly on P2P systems that host one-dimensional data. Nowadays, the need for P2P applications with multi-dimensional data is emerging. However, most existing indexing and search techniques are not suitable for such applications: most indices for multi-dimensional data have been developed for centralized environments, while, existing distributed indices for P2P networks aim at one-dimensional data. Our focus is on structured P2P systems that share spatial information. We present S PATIAL P2P, a totally decentralized indexing and searching framework that is suitable for spatial data. S PATIAL P2P supports P2P applications in which spatial information of various sizes can be dynamically inserted or deleted, and peers can join or leave. The proposed technique preserves well locality and directionality of space. Index Terms—

I. I NTRODUCTION The Peer-to-Peer (P2P) paradigm has become very popular for storing and sharing information in a totally decentralized manner. Typically, a P2P system is a distributed environment formed by autonomous peers that operate in an independent manner. Each peer stores a part of the available information and maintains links (indices) to other peers. P2P systems • provide a method to distribute the available information to peers, • guarantee the retrieval of any information that exists in the system, • achieve a reasonable index size for any peer • achieve a reasonable search path for any search performed in the system, • maintain a low cost for updating peer indices when peers join or leave and • achieve data and searching load balancing, i.e., there are no peers overloaded with stored data and there are no traffic bottlenecks in the network. Until recently, research has focused mostly on P2P systems that handle one-dimensional data like strings and numbers. However, the need for P2P applications with multidimensional data is emerging. P2P systems hosting information about multi-dimensional data pose additional requirements that stem from the particularities of such data. In centralized multi-dimensional applications information is stored This work has been funded by the project PENED 2003. The project is cofinanced 75% of public expenditure through EC - European Social Fund, 25% of public expenditure through Ministry of Development - General Secretariat of Research and Technology and through private sector, under measure 8.3 of OPERATIONAL PROGRAMME ”COMPETITIVENESS” in the 3rd Community Support Programme. V. Kantere is with the Nat’l Tech. Univ. of Athens. T. Sellis is with the Nat’l Tech. Univ. of Athens. S. Skiadopoulos is with the Univ. of Peloponnese.

according to its multi-dimensional extent using an indexing structure (like R-trees [5]). Typically, these structures, preserve the locality and the directionality of multi-dimensional information. Informally, locality implies that neighboring multidimensional information is also stored in neighboring nodes of the index structure while directionality implies that the index structure preserves orientation. For instance, if directionality is preserved then the information that lies north of a A is also stored north of A in the index structure. The notions of locality and directionality are very important; if an index structure preserves locality and directionality then searching in the index becomes actually searching in space. In the literature there exist two different approaches that handle multi-dimensional data in P2P systems. The first approach proposes the use of a distributed version of some centralized multi-dimensional index (similarly to [9], [22]). The main challenges of these approaches is to avoid the bottlenecks that occur due to the heavy load of the peer storing the root of the tree (since every search has to pass through this peer). The second approach maps the multi-dimensional data into one-dimension and uses a Distributed Hash Table (DHT) based method ([4], [2], [11], [17]). Specifically, onedimensional DHT techniques use a distance metric in order to define the locality of the one-dimensional data. Then, peers use this distance in order to store data and to index other peers. Typically, peers store data and index peers that are closer to them according to the ordering. Furthermore, search is performed based on distance proximity and the available indexing information in each peer. In order to handle multi-dimensional data in P2P systems, there are similar requirements. There is a necessity of a storing, an indexing and a searching framework that can capture the locality and directionality of the multidimensional space. Existing works (e.g., [2], [11], [17]), first, order multidimensional data using a space-filling curve (e.g., z-order) and, then, define the distance according to this ordering. The main challenge of these approaches is to preserve the important properties of multi-dimensional space (like locality and directionality). Unfortunately, space filling curves do not preserve locality and directionality. For instance, two multi-dimensional regions that are close in the multi-dimensional space are not always close in the z-ordering curve. These locality problems of the one-dimensional ordering are inherited to the indexing and the searching strategy of the P2P network. In this paper, we propose S PATIAL P2P, a totally decentralized framework for spatial (i.e., two-dimensional) data that conforms to the autonomy principle of the P2P paradigm. S PA TIAL P2P provides storing, indexing and searching services for spatial data in a P2P network. S PATIAL P2P exploits existing experience in the field of DHT’s and can be built on top of

2

any one-dimensional DHT. Thus, S PATIAL P2P distributes the spatial information to peers and guarantees the retrieval of any spatial area that exists in the system with low space and time complexity bounds. Additionally, S PATIAL P2P efficiently handles changes in the spatial information held in the system and in the network structure caused by joining or leaving peers without the need of load balancing or restructuring. Finally, and contrary to existing DHT based approaches, S PATIAL P2P is able to capture the locality and directionality of the space. To achieve these goals S PATIAL P2P undertakes a different approach and does not use a global ordering as the space-filling curve of one-dimensional DHT methods does. In S PATIAL P2P each peer p stores data and indexes peers that are closer (to p) in the multi-dimensional space. This avoids the locality drawbacks of the space-filling curves. Moreover, in S PATIAL P2P the indexed peers are carefully chosen in such a way that also the directionality of the multi-dimensional space is preserved. Summarizing, the S PATIAL P2P framework: 1) is a totally and inherently distributed P2P network, 2) can be built on top of any one-dimensional DHT, 3) provides efficient hashing, indexing and searching services for spatial data in a P2P network, 4) guarantees the retrieval of any information that exists in the system, 5) achieves a small index size for any peer and a small search path for any search performed in the system, 6) requires a low cost for updating peer indices when peers join or leave, 7) achieves data and searching load balancing without restructuring the network and 8) preserves the locality and the directionality of space. In S PATIAL P2P, our main focus is the spatial twodimensional space, but our techniques can also be applied to any multi-dimensional space. The rest of this paper is structured as follows. Section II presents in detail the fundamentals of the S PATIAL P2P framework. Section III studies various issues concerning S PATIAL P2P. Section IV presents an experimental study of S PATIAL P2P. Section V surveys related work. We offer conclusions in Section VI. II. T HE S PATIAL P2P FRAMEWORK In this section we present the S PATIAL P2P framework. S PATIAL P2P uses a grid to partition the space and defines an appropriate distance. Section II-A considers the cells of the grid while Section II-B considers areas in the grid. In both cases, we discuss how spatial information is assigned to peers, how we search for spatial information, how new peers join the system and how peers leave the system. A. Basic grid We partition the space using a x×y grid. Each cell C in the grid is identified by its x- and y-coordinate. Grid cells form the smallest granule of information stored in the P2P system. A 7 × 7 grid is presented in Fig. 1. Distance metric. To measure the distance between two cells in the basic grid we can use any of the traditional metrics

(e.g., Manhattan or Euclidean distance). In this work, we use an alternative metric [10]. Consider two cells C = (cx , cy ) and C ′ = (c′x , c′y ) of the basic grid. Their relative distance denoted by D(C, C ′ ) is defined as follows: D(C, C ′ ) = (d1 , d2 ) where

d1 = max {|cx − c′x |, |cy − c′y |} d2 = min{|cx − c′x |, |cy − c′y |}

Intuitively, distance D(C, C ′ ) is formed by the maximum and the minimum difference of the coordinates of C and C ′ . For instance, for the basic grid cells A(1, 2) and B(3, 6) of Fig. 1, we have D(A, B) = (4, 2). Distance D encapsulates both the Euclidean and the Manhattan distance. Let D(C, C ′ ) = (d1 , d2 ) be the D distance of cells C and C ′ . Then, if E and M denote the Euclidean ′ and the p Manhattan distance of C and C respectively, we have 2 2 E = d1 + d2 and M = d1 + d2 . Intuitively, distance D(C, C ′ ) represents the notion of how many cells apart are the two cells C, C ′ if we move on a straight line, combined with a weight about their spatial distance. This is very useful in a framework such as S PA TIAL P2P since it considers moving (i.e. searching) on a grid using locality and directionality as guidelines. Definition 1: Consider two distances D = (d1 , d2 ) and D′ = (d′1 , d′2 ). Addition and comparison are defined as follows. D + D′ = (d1 + d′1 , d2 + d′2 ) d1 < d′1 D < D′ iff d1 = d′1 and d2 < d′2 Theorem 1: Distance D is a distance metric. Proof: D is a distance metric iff it satisfies isolation, symmetry and triangular inequality. It is easy to verify that the above properties hold. 1) Distributed hashing of basic grid cells: Similarly to DHT approaches, we require that the space of peer identifiers is the same as the space of data (in our case grid cell) identifiers and that peers store the information corresponding to their identifier [20]. Thus, a peer is identified by a (x, y) pair and stores the spatial information that corresponds to the grid cell (x, y) (provided that this information exists in the system). Since peers are identified by (x, y) pairs, D can be trivially defined between cells and peers. In P2P systems, we need a method that assigns to the available peers the information about the grid cells. This procedure is called hashing [20], [15]. Hashing in S PATIAL P2P requires that a cell A for which there is no peer with the same identifier is stored in a peer with the closest (to cell A) identifier with respect to the D distance. Ties (i.e., cases where cell A has the same D distance from more than one peers) are handled according to the ordering of Fig. 2. To form the ordering, we partition the cells around a given cell A into 8 groups as shown in Fig. 2. For instance, Group 8 is formed by the light gray cells in the right of cell A. If cell A has the same D distance with more than one peer, we assign it to the peer that lies in the group of A with the smallest number. For instance, in Fig. 2, we have ′ • D(A, p) = D(A, p ) = (2, 1) and

3

B

6

6

5

A

4

1

p1

p

3 1

2

3

4

5

Fig. 3.

s3

Proof: Let A be a grid cell. It is easy to verify that, for any two cells C and C ′ such that D(A, C) = D(A, C ′ ) and C and C ′ lie in the same group of A it follows that C ≡ C ′.

Thus, the proposition holds. 2) Links to other peers: In a P2P network each peer has to maintain links to other peers. Such links should guarantee that existing information can always be reached when searched from any peer of the network following a path of satisfiable length. In S PATIAL P2P, similarly to the DHT-based approaches, each peer maintains links to successors and to indexed peers. Successors in S PATIAL P2P have a dual role. First, they guarantee the connectivity of the network. Thus, successors assure that there is always a path between two arbitrary peers in the network. Moreover, successors preserve the locality and the directionality of the two-dimensional space in the P2P network. This allows to apply traditional spatial query evaluation techniques [16]. Unfortunately, a search that is entirely based on successors is not efficient since it may result in long search paths. To speed up searching, a peer also knows the location of additional peers. Such peers are called indexed peers. In the following, we describe the procedure that selects successors and indexed peers. Successors. Intuitively, we require that each peer p has 4 successors s1 , . . . , s4 where each successor lies in a different quadrant (as in Fig. 4a) More formally, successor s1 , is the closest peer to p with respect to the D distance that lies in Groups 1 or 2 of p (see Fig. 2). Similarly, s2 (respectively s3 and s4 ) is the closest peer to p that lies in Groups 3 or 4 (respectively 5 or 6, and 7 or 8). Ties between peers are handled similarly to hashing.

s4 p

s2

... ...

s1 (a)

Fig. 4.

...

The following proposition demonstrates the correctness of our hashing method. Proposition 1: The proposed hashing method is unambiguous and correct.

•

Illustration of Examples 1, 2 and 3

peers p and p′ lie in groups 4 and 5,

thus A is assigned to peer p. Example 1: Fig. 3 shows how a 8 × 8 grid is hashed to peers p1 , . . . , p8 . For instance, p6 stores A and all the light gray cells in the upper-left corner of the grid.

•

(0, 0)

Ordering of grid cells

A basic grid

...

•

p7

2

6

Fig. 2.

p2

p8

1

0 0

p6

A

A

2

p3

8

p′

3

Fig. 1.

p5

5

4

y/x

p4

7

(b)

Successors and indexed peers of p

Notice that we could guarantee that the P2P is connected with only two successors. Four successors are required to preserve the locality and directionality of the two-dimensional space in the P2P network. Indexed peers. To speed up searching, we require that each peer p(x, y) also stores the location of a set of indexed peers having progressively increasing distance from p (similarly to Fig. 4b). Formally, the indexed peers of a peer p(x, y) are all peers p′ (x′ , y ′ ) such that D(p, q) = (2i , 0), 0 ≤ i. If an indexed peer p′i (x′ , y ′ ) is not present in the network, then p indexes the closest peer to p′i (x′ , y ′ ) with respect to the D distance. Ties between peers are handled similarly to hashing. In other words, to find the indexed peers of a peer p(x, y) we locate the closest peer to each cell with coordinates (x ± 2i , y) or (x, y ± 2i ). Notice that the D distance between these cells and p is exactly (2i , 0). The proposed two-dimensional indexing approach is inspired by the one-dimensional Chord method [20]. However, with minor adaptations, other DHTs can also be used. Let us now present an example. Example 2: Consider the grid and the peers p1 , . . . , p8 of Fig. 3. The successors s1 , . . . , s4 of peer p1 (1, 3) are peers p7 , p8 , p3 and p2 respectively. For instance, p7 is the closest to p1 peer that lies in the Groups 1-2. Additionally, p1 also indexes the following peers: p2 since it is the closest peer to (2, 3) = (1 + 20 , 3) and (3, 3) = (1 + 21 , 3) p7 since it is the closest peer to (5, 3) = (1 + 22 , 3) p8 since it is the closest peer to (0, 3) = (1 − 20 , 3), (1, 1) = (1, 3 − 21 ) and (1, 2) = (1, 3 − 20 ) p3 since it is the closest peer to (1, 4) = (1, 3 + 20 ) and (1, 5) = (1, 3 + 21 ) p4 since it is the closest peer to (1, 7) = (1, 3 + 22 ) 3) System operations: We elaborate on the operations offered by S PATIAL P2P. Specifically, we present a searching

4

Algorithm S EARCH Input: A peer p that queries information about cell A. Output: The information concerning cell A. Method:

Algorithm R ANGE S EARCH Input: A peer p that queries information about a set of cells C. Output: The information concerning C. Method:

If A is stored in p Then Return the corresponding information

For each cell c ∈ C, If c is stored in p Then

ElseIf p has a successor or an indexed peer q with the same co-

ordinates with A Then Return S EARCH(q, A) Else Return S EARCH(q, A) where q is the closest peer to A

among the successors and the indexed peers of p. Ties are handled similarly to hashing. Fig. 5.

Add to result the information about c EndFor For each cell c ∈ C If there is an indexed peer or a successor p′

such that c = p′ Then Add S EARCH(p′ , c) to result

Algorithm S EARCH

EndFor

algorithm and we briefly discuss how a peer joins and leaves the P2P overlay. Searching. Consider a peer p seeking information about cell A(ax , ay ). The searching process is summarized in Algorithm S EARCH (Fig. 5). The searching process is illustrated in the following example. Example 3: Consider the grid and the peers p1 , . . . , p7 of Fig. 3. Assume that peer p1 searches information about cell A(5, 6). Thus, p1 executes S EARCH(p1 , A). As we have seen in Examples 1 and 2, p1 does not store A and there is not a successor nor an indexed peer of p1 with the same co-ordinates as A. Thus, since p4 is the closest to A peer indexed by p1 , the Else statement of S EARCH(p1 , A) algorithm executes S EARCH(p4 , A). In the same manner, since p6 is the closest (to A) successor of p4 , S EARCH(p4 , A) executes S EARCH(p6 , A) which returns the result. In Section IV, we will demonstrate the efficiency of the S EARCH algorithm. Range queries. Range queries take as input a set of cells on the grid and return the information that is stored in these cells. A naive method to evaluate such queries is to use the S EARCH algorithm for each cell of the given set. A more efficient method for range queries is presented in Fig. 6. Let p be a peer requesting information about a set of cells C. Algorithm R ANGE S EARCH first gathers all information about C stored in p (first For statement). Then, the algorithm gathers all information about C stored in the successors and the indexed peers of p (second For statement). Then, R ANGE S EARCH algorithm searches for the remaining cells C ′ ⊆ C. To this end, the algorithm considers all indexed peers and successors p′ of p and all cells c ∈ C ′ , selects the pair (p′ , c) that minimizes D(p′ , c) and executes R ANGE S EARCH(p′ , C ′ ). The second For loop of R ANGE S EARCH algorithm can be further optimized as follows. When visiting a successor on an indexed peer we gather all information concerning C (and not only information of the requested cell). By doing so, we do not have to visit this peer multiple times. Joining the network. Let us consider a new peer p that wants to join the network. Peer p first finds a peer pB already in the network. pB is called the bootstrapping peer and is responsible

Let C ′ be all the cells in C that we have not found their

information Consider all indexed peers and successors p′ of p and all c ∈ C. Select the pair p′ , c, such that D(p′ , c) is minimized. Add R ANGE S EARCH(p′ , C ′ ) to result Fig. 6.

Algorithm R ANGE S EARCH

to find the successors of p (by initiating searches for them). Then, p uses its own successors to find its indexed peers and to locate the cell information that it should store. Alternatively, to speed up the searching process, the joining peer p can inherit the indices of its successors or very closeby peers. As we will demonstrate in Section IV, this method performs significantly less searches. Finally, p announces its presence in the network, i.e., existing peers become aware of the presence of p and p is eligible to be used for indexing and searching. Leaving the network. The peer p that wants to leave the system follows a reverse procedure. First p notifies its successors which initiate a search request in order to find the peer that should replace p in their successor list. Also, p assigns all its stored data to appropriate peers. Then, p notifies its indexed peers and leaves the system. The notified indexed peers delete p from their own index and refresh the corresponding index entry. B. Areas in the grid Spatial information often spans into more than one cell of the grid. Thus, it is very useful to be able to represent areas that are formed by groups of cells. To this end, S PATIAL P2P is able to store information about any group of cells forming a rectangular area in the grid. Information about an area of arbitrary size can be stored to the rectangular area that corresponds to its minimum bounding box. Definition 2: A rectangular area A in a grid can be represented by a quadruple (ax , ay , a′x , a′y ) where (ax , ay ) (respectively (a′x , a′y )) is the lower-left (respectively the upper-right) cell of area A. The size of an area A, denoted by α(A), is the square root of the number of A cells, i.e., q α(A) = (a′x − ax )(a′y − ay )

5

holds. The center of area A is denoted by κ(A), i.e.,

A4

a′ − ax a′y − ay κ(A) = ( x , ). 2 2 From this point onwards, unless specifically stated, the term

d1 = 2 ·

Mx + θ|α(A) − α(A′ )| α(A) + α(A′ )

d2 = 2 ·

My + θ|α(A) − α(A′ )| α(A) + α(A′ )

and (Mx , My ) = D(κ(A), κ(A′ )) (i.e., the D distance of the centers of A and A′ ) and θ is a parameter. Addition and comparison for Dz can be defined similarly to D. Theorem 2: Distance Dz is a distance metric. Proof: It is easy to verify that Dz satisfies isolation, symmetry and triangular inequality. Thus, Dz is a distance metric. Distance Dz (A, A′ ) takes into account ′ • the distance D(κ(A, κ(A )) between the centers of areas ′ A and A and ′ ′ • the size α(A) and α(A ) of areas A and A . For instance, for areas with the same size (i.e., where α(A) = α(A′ )), Dz is proportional to the D distance. Parameter θ comes into play when measuring the distance of areas of different sizes. In such cases, θ prescribes how much should the size difference of the areas influence their relative distance. For large values of θ, the distance between two areas A and A′ is mostly determined by the α(A) − α(A′ ) factor. This means that areas with similar size are considered closer. For small values of θ, the distance between two areas A and A′ is mostly determined by the α(A) + α(A′ ) factor. This means that larger areas are considered closer to every other areas. Since distance Dz will be used in hashing, indexing and searching, the value of θ is critical for the even loading of peers. After an extended experimental study we have decided that θ = 5 · n (where n is the size of the basic grid) achieves a nice trade-off between how the D distance and the areas sizes affect Dz . In Section III, we elaborate more on the θ parameter value and in Section IV, we will demonstrate that Dz preserves the locality of the space and achieves even load balancing among the network peers. 1) Distributed hashing of areas: In order to hash areas to peers, we require that the space of peer identifiers is the same as the space of area identifiers, i.e, a peer is also identified by a quadruple. Thus, we may treat peers as areas. Similarly to the basic grid, hashing in S PATIAL P2P requires that an

p4

A3

area refers to rectangular areas. To handle areas, we appropriately extend the distance metric and the hashing, indexing and searching methods of the basic grid. Distance metric. To measure the distance between areas we define the distance metric Dz that is based on distance D. Consider two areas A and A′ . Their distance Dz (A, A′ ) is a pair (d1 , d2 ) where

p2

p1

p1

A1

A2

p2

p0

(a) Fig. 7.

p3

(b)

Illustration of Examples 4 and 5

area A for which there is no peer with the same identifier is stored in a peer p such that Dz (A, p) is minimum (i.e., p is the closest (to area A) peer with respect to the Dz distance). To handle ties (i.e., cases where cell A(ax , ay , a′x , a′y ) has the same Dz distance from more than one peers) we choose a p(px , py , p′x , p′y ) such that: •

• •

κ(p) lies in the group (see Fig. 2) of κ(A) with the minimum number (i.e., the center of p lies in the group of area’s A center with the smallest number), |α(A) − α(p)| is minimum (i.e., the size of p is closer to the size of A) and D((ax , ay ), (px , py )) is minimum (i.e., the D distance of the lower-left points of A and p is minimum).

Example 4: Consider the grid and the peers p1 and p2 of Fig. 7a. The areas stored in the system are A1 , . . . , A4 . Peer p1 stores areas A3 and A4 , since Dz (p1 , A3 ) < Dz (p2 , A3 ) and Dz (p1 , A4 ) < Dz (p2 , A4 ). Similarly, p2 stores A1 and A2 . Observe that p1 and p2 store the areas that are either of equal or of slightly different size. Proposition 2: The proposed hashing method is unambiguous and correct. Proof: Since hashing is performed with regard to the center of areas, the proof is reduced to the proof of proposition 1. 2) Links to other peers: Similarly to the basic grid, we require that each peer maintains successors and indexed peers.

σ5 σ3

Larger size than p σ4

p σ2

Same size as p σ1

σ6

Fig. 8.

Smaller size than p

Successors and indexed peers of p

Successors Each peer has 6 successors σ1 , . . . , σ6 defined as follows (see also Fig. 8). σ1 is the closest peer to p with respect to Dz from the

6

following set of peers {(qx , qy , qx′ , qy′ )| px + px′ ≤ qx + qx′ ∧ qy + qy′ < py + py′ }. The counterpart of σ1 in the basic grid is successor s1 . Intuitively, σ1 is the closest peer to p among the peers that their center lies in Groups 1 or 2 of κ(p). σ2 (respectively σ3 , . . . , σ4 ) is the closest peer to p among the peers that their center lies in Groups 3 or 4 (respectively 5 or 6, and 7 or 8) of κ(p). σ5 (respectively σ6 ) is the closest peer to p among the peers that are larger (respectively smaller) in size than p. Ties between peers are handled similarly to hashing. The two newly introduced successors σ5 , σ6 are necessary in order to guarantee the connectivity of the overlay if there are peers that correspond to areas of various sizes. Indexed peers To speed up searching, we require that each peer p(px , py , px′ , py′ ) indexes all peers q(qx , qy , qx′ , qy′ ) such that: 1) α(q) = α(p) and κ(q) = κ(p) ± (α(p) · 2i , 0) hold, or 2) α(q) = α(p) and κ(q) = κ(p) ± (0, α(p) · 2i ) hold, or 3) α(q) = α(p) and κ(q) = κ(p) ± (2i , 0) hold, or 4) α(q) = α(p) and κ(q) = κ(p) ± (0, 2i ) hold, or 5) α(q) = α(p) ± 2i and κ(q) = κ(p) hold. where 0 ≤ i. If an indexed peer q is not present in the network, then p indexes the closest peer with respect to Dz distance, instead. Ties between peers are handled similarly to hashing. Apart from indexing areas of the same size, a peer indexes areas of bigger and smaller sizes as well. Thus, independently of the search area size, it can achieve short search paths in the overlay. Example 5: Consider the grid and the peers p0 , . . . , p4 of Fig. 7b. Peer p0 has one successor of the same size, σ4 = p2 . p0 does not have successors σ1 , σ2 , σ3 , since there are no peers residing in Groups 1-2, 3-4 and 5-6, respectively. Also it has one successor of bigger and one of smaller size, namely σ5 = p4 and σ6 = p1 . Furthermore, p0 does not index any peers additionally to its successors (for example peer p3 is not indexed), since its successors are the closest peers to all areas that should be indexed. 3) System operations: Algorithms S EARCH and R ANGE S EARCH can be easily extended to handle areas as well. Joining and leaving the network are performed as in Section II-A. The interested reader is referred to [10] for more details. III. D ISCUSSION A. The θ parameter As mentioned in Section II-B, the θ parameter tunes the relationship between the area size similarity and the relative distance of areas. The goal of the relative distance metric is two-fold: • areas should be stored in peers that correspond to areas that are close to them (in absolute terms) and preferably of similar size

areas should be evenly distributed in peers if the area and peer distributions are similar. These two goals can be achieved by fine-tuning the value of θ parameter. Actually, the θ value should be such, that the area size should play a bigger role in relative locality as closer (in absolute terms) the considered areas are. In other words, assume there are three areas A1 , A2 , A3 with respective sizes α1 , α2 , α3 . Moreover, assume that A2 and A3 are co-centered and α1 = α2 6= α3 . The absolute distance of A1 from A2 and A3 is the same. Since A1 , A2 have the same size, the relative distance of A1 from A2 is smaller than that of A1 from A3 . However, we want this relative distance difference to be bigger as closer the areas are (and the opposite). Extended experimental study [10] showed that the best θ value is 5n, for a n × n grid. This value guarantees that the area size role is bigger as the areas are closer. This means that big areas are generally closer but not too close to all areas and small areas are either very close or very far from other areas, depending on their absolute distance. For example, the biggest area, i.e. the area that covers the whole n × n grid, has a relative distance that is half of the biggest distance Fig. 9 and Fig. 10 show graphically the relative versus the absolute distance of the left-most, lowest cell of a grid from all other square areas that can be defined on the grid. Fig. 9 also shows the area size. Obviously, for the 20×20 grid the relative distance of the biggest area, which covers the whole grid, is half the maximum relative distance of the lowest left-most cell from any other area. Also, this grid cell is relatively farthest away from the right-most upper grid cells than any other area. Fig. 10a and Fig. 10b show the experimental results for the relative distance for the 20 × 20 but also the 15 × 15 grid, keeping θ = 100 = 5 · 20 > 5 · 15. Obviously, the bigger grid areas are considered relatively farther away from the lowest left-most grid cell in Fig. 10b than in Fig. 10a. This means that the peers that correspond to small areas tend to be more loaded with data in the second than in the first case. Fig. 10c shows that the grid is more balanced for a 15 × 15 grid with θ = 70 ≈ 5 · 15. In Section IV, we present an experiment that proves the achievement of load balancing for θ = 5 · n. •

B. Complexity To study the complexity of our approach we consider a basic grid of size n × n and assume that the available information spans into arbitrary areas of the grid (similarly to Section IIB). It is easy to verify that the number of peers in the network is O(n3 ). The number of links to other peers (successors and indexed peer) that a given peer maintains is O(log n). More specifically, if the available information spans into small size areas (less than n2 /2), the average number of links to other peers is 3 log n. In the case where the available information spans into areas of larger size, the respective number is log n. Intuitively, this happens because an area mostly indexes areas of the same size and for larger sizes there are fewer areas in a fixed grid. Finally, the length of a searching path is O(log n) and is always bounded by 3 log n. Just for comparison reasons the indexing and searching complexity bound of Chord is log N , where N is the number

7

Dz values

Dz values

250

Dz values

160 n = 20 and θ = 100

♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ n = 15 and θ = 100 ♦ ♦ ♦ ♦ ♦ ♦ 140 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 120 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 100 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦♦ ♦ ♦♦ ♦♦ ♦ ♦ ♦ 80 ♦ ♦♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 60 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 40 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 20 ♦ ♦ 0♦

♦

♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 200 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 150 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 100 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦♦ ♦ ♦♦ ♦♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 50 ♦♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 0♦

0

50

100 150 D values

200

250

0

20

40

60

(a) Fig. 10.

80 100 D values

120

140

(b)

160

160 140 120 100 80 60 40 20 20

♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

n = 15 and θ = 70

40

60

80 100 D values

120

140

(c)

2D graphs of the relative distance of the lowest left-most grid cell from any other area

experimental results show that by inheriting additionally some of the indices of successors’ successors can achieve almost a ♦ n = 20 and = 100 ♦ Dz values ♦ ♦ nearly constant average index updating cost. Actually, we show ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ experimentally in Section IV that a joining peer can reduce ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 250 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ the cost of index updating up to 90% in the general case and ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 200 ♦♦ ♦♦ ♦♦ ♦♦ ♦♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ 150 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ up to almost 100% for sparse overlays, just by inheriting the ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦♦ ♦♦ ♦♦ ♦♦ ♦♦ ♦ ♦ ♦♦ ♦♦ ♦♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦♦ ♦ ♦ ♦♦ ♦♦ ♦♦ ♦♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ 100 ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦♦ ♦ ♦ ♦ ♦♦ ♦ ♦♦ ♦ ♦♦ ♦ ♦ ♦♦ ♦ ♦♦ ♦♦ ♦♦ ♦♦ ♦♦ ♦ ♦ ♦♦ ♦ ♦♦ ♦♦ ♦♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦ ♦♦ ♦ ♦ index of one successor for each one of its 6 successors, (i.e. ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦♦ ♦♦♦♦♦♦♦♦♦ ♦ ♦ ♦♦ ♦♦♦ ♦♦ ♦♦ ♦ ♦ ♦ ♦♦♦♦ ♦♦ ♦ ♦ ♦ ♦ ♦ ♦♦♦ ♦ ♦♦ ♦♦♦♦♦♦♦ 50 ♦ ♦♦♦ ♦ ♦♦♦♦♦♦♦ ♦♦ by acquiring the indices of 12 close-by peers with O(1) cost. ♦ 0 ♦ As far as the leave operation is concerned the cost can also 250 be reduced significantly because the leaving peer notifies all 200 0 2 4 6 its index entries. The latter may replace the respective index 150 8 10 100 D values entry with the closest peer to the leaving one. However, worst Size of area 12 14 16 50 18 20 0 case leaving cost remains O(log2 n). C. Replication Fig. 9. 3D graph of the relative distance of the lowest left-most grid cell from any other area in a 20 × 20 grid for θ = 100.

of peers. In S PATIAL P2P, the number of peers is O(n3 ). Thus, if we use Chord to index cells and areas of a grid we may expect a complexity bound of 3 log n for both indexing and searching. Thus, for the number of indexed peers, S PATIAL P2P has the same complexity bounds as Chord in the worst case but has better complexity bounds on the average case. Also, the length of the search path has the same complexity bounds as in the worst case. But, as the experimental study demonstrates (Section IV), the expected (average) searching complexity of S PATIAL P2P is lower from Chord. For a thorough complexity analysis the reader is referred to [10]. The S PATIAL P2P framework also achieves low costs for the join and leave operations. If a joining peer p looks up in the overlay each one of its index entries, then the cost of join is O(log2 n). If we follow the alternative join method where the joining peer p inherits the indices of its successors the join cost remains O(log2 n). But, in this case, we may expect a dramatical reduction in the average case. We show experimentally that inheriting the successors indices reduces the cost of the index updating up to 80% (see Section IV). Moreover, the

In situations that some peers are overloaded in terms of processing, other peers can become their replicas in the system. In such a case replica-peers index the rest of the fellow replicas, in order to share the processing load when the latter exceeds their processing limits. Peers that should index the host of an area can maintain an entry for any respective replica randomly (for example whichever they come across first during the bootstrapping procedure). In this way, after the network is stabilized respective replicas are indexed equally by other peers and, thus, searches for the replicated area are equally distributed among the replica-peers. D. Node failure Node failure is handled in a straightforward way in the S PATIAL P2P framework. Periodically, every peer p exchange messages with its successors and its indexed peers to reassure the presence of each other. When a peer is not responding to these messages for a certain amount of time, we assume that this peer is disconnected from the network. Thus, p initiates the stabilization procedure [20] to replace the disconnected peer. IV. E XPERIMENTAL EVALUATION In this section, we present an experimental evaluation of the proposed technique using a simulation that implements the

160

8

hashing, indexing and searching algorithms of S PATIAL P2P. Methodology. As discussed in Section V many competitive methods rely on one-dimensional DHT techniques that use space filling curves to identify multi-dimensional data. Besides, the other category of methods that try to distribute centralized index trees to P2P overlays are not so promising. Thus, we compare the S PATIAL P2P framework implementation against the original widely approved Chord method [20] appropriately modified to store spatial data. Nevertheless, in Section V, we compare S PATIAL P2P to the work that most efficiently distributes a centralized index. To store spatial information in Chord, we map the spatial areas to identifiers using the z-ordering method. We refer to this method as zChord. To fairly compare Chord and S PATIAL P2P, we have provided the same index size. We experiment on the following two kinds of queries. • Queries about single areas. In such queries each peer searches for a specific area. These queries are similar to point queries and they can be evaluated using the S EARCH algorithm. • Queries about multiple areas. In such queries each peer poses a query window and searches information about all areas that fall in the query window. These are similar to range queries and they can be evaluated using the R ANGE S EARCH algorithm. For point queries, we study how the search path is affected by the size of the query area, the network density and the distance between the peer that asks for an area and the area itself. For range queries, we measure the search path length and the number of visited peers that did not provide useful information for the query (false drops) for sparse and dense networks. Datasets. We test S PATIAL P2P and zChord on networks having from 5000 to 20000 peers; however, for the scalability experiments, we tested the framework with up to 80000 nodes. For each network size, we execute 10.000 queries and average the results. The size of the grid varies from 32 × 32 to 142×142. We have used a uniform distribution for the location of peers and for the location and the size of stored areas. We have also tested skewed data distributions but the results were similar, thus, they are omitted. In the following, we present the most important results of the experimental study. Network density. As discussed earlier, the search complexity of the S PATIAL P2P framework depends on the size of the basic grid whereas the complexity of zChord depends on the number of network peers. Fig. 11a shows the search path length of S PATIAL P2P and zChord for networks of sizes up to 80000 using always a 64 × 64 grid. The results show that the search path length does not increase more than 0.5 hops for S PATIAL P2P whereas the path length of zChord increases approximately 2 hops. Note that the increase of the path length of zChord is much bigger when we keep the index size equal to log N . Proximity search. We tested S PATIAL P2P in order to prove experimentally that the proposed framework preserves locality of areas/peers. Fig. 11b shows the results for queries asking

for areas that are in a constrained distance from the querying peer on a 32 × 32 grid. We varied this distance from 1 to 32 thus requesting areas that are as close as adjacent to the area corresponding to the querying peer but also areas that are as far as possible from the querying peer. The results show that S PATIAL P2P preserves locality very well, so that even for pretty distant requested areas the search path length is only 2 hops long. zChord also preserves locality, but only for areas that are very close. This is due to the space-filling curve used to pre-process data. Summarizing this experiment, on the worst case, the search paths that S PATIAL P2P achieves are 30% shorter that the respective paths of zChord and on the best case search paths are 70% shorter. Area size. We conducted an experiment to observe the effect of the size of the requested areas on the search path length. Fig. 11c shows that S PATIAL P2P favors big areas, for which the path length is considerably shorter than for very small areas. This is an advantage, since big areas encompass more spatial information and are more probable to be searched. Also, this feature facilitates range query processing, since big parts of range queries can be easily retrieved. The behavior of zChord is the same for any size of requested area, as expected. Range queries. We conducted experiments in order to observe the performance of S PATIAL P2P for range queries. We used networks of 5000 and 10000 peers in a 128 × 128 grid. We focused our experiments on the worst situations, i.e., networks that store only basic grid cells and not at all bigger areas. Thus, both S PATIAL P2P and zChord had to search for all the grid cells covered by the requested range areas. We produced range queries asking for areas using a square window with size that varies between 4 and 30. We measured the number of hops of the searching process. Fig. 12a and Fig. 13a show the total length of the search path for networks having 5000 and 10000 peers respectively. We also measure the number of false drops. Note that we consider as false drops all the hops of a search process where the visiting peer does not store useful to the specific search process information. Of course, many of the false drops are unavoidable hops, that are necessary in order to reach a target peer. Fig. 12b and Fig. 13b show the number of false drops for networks having 5000 and 10000 peers respectively. Fig. 12c and Fig. 13c show the ratios of false drops over the total number of searching hops for networks having 5000 and 10000 peers respectively. In all our experiments S PATIAL P2P outperforms zChord. S PATIAL P2P, to answer range queries, visits far fewer peers than zChord. Also, the percentage of false drops of S PA TIAL P2P is around 30% whereas the same percentage for zChord around 60%. Finally, the percentage of false drops decreases while the range increases, for S PATIAL P2P whereas the opposite holds for zChord. This nice behavior of S PA TIAL P2P is due to the fact that S PATIAL P2P preserves locality well. Furthermore, we tested S PATIAL P2P for the general case that stores arbitrary sized areas. Thus, range queries may retrieve spatial areas of any size. Experiments in Fig. 14 are performed on a 32 × 32 grid. Fig. 14a and Fig. 14b

9

Number of hops

Number of hops

Number of hops

7

8 S PATIAL P2P zChord

7.5

+

♦ +

+

♦

+

+

2 ♦

♦

5

♦

30

40

50

60

70

80

♦

♦

3

0

♦

5 10 15 20 25 30 35 Proximity of requested area to the requesting peer

(a)

0

5

10

15 Area size

(b)

20

25

30

(c)

Single area queries (Point queries) Total number of hops

Total number of hops

800 S PATIAL P2P zChord

700

+

♦ +

600 500 400

♦ +

300 200

♦

100 0

+ ♦

+ ♦

+ ♦

0

5

10 15 20 Range of query

25

30

500 450 400 350 300 250 200 150 100 50 0

Fig. 12.

Ratio of total number of hops vs. false drops +

♦ +

S PATIAL P2P zChord

0.65 S PATIAL P2P zChord +

0.6 0.55

+

♦ +

+

+

+

0.5 0.45 +

+ ♦

+ ♦

0

0.35 0.3

♦

♦

5

♦

0.4 ♦

+

♦

♦

♦

♦

0.25


(a)

25

30

5

0


(b)

25

30

(c)

Range queries for the worst case: Sparse networks of 5000 peers Total number of hops

Total number of hops

1800

Ratio of total number of hops vs. false drops

1200 ♦ +

S PATIAL P2P zChord

1600

0.65 ♦ +

S PATIAL P2P zChord

+

1000

1400

+

S PATIAL P2P zChord + +

0.6

+

+

♦ + +

0.55

1200

800

1000

0.5

600

800

0.45

+

600

♦

0.4

+

400

♦

0.35

400 + ♦

+ ♦

0 0

5

200

♦

+

200

+

♦

0 10 15 20 Range of query

25

30

+ ♦

+ ♦

0

5

♦

♦

0.3

♦

♦

♦

♦

0.25


(a) Fig. 13.

♦

2.5

Number of network nodes ·103

Fig. 11.

♦

3.5

♦

0 20

+

♦

+

♦

4.5 10

+

4

♦

1

♦

♦ +

5 4.5

3

5.5

+ S PATIAL P2P zChord

+

+

+

4

6

+

6 5.5

5

+

6.5

♦ +

S PATIAL P2P zChord

6

+

7

6.5

25

30

0

5


(b)

♦

25

30

(c)

Range queries for the worst case: Dense networks of 10000 peers

(respectively Fig. 14c and Fig. 14d) refer to networks having 5000 (respectively 10000) peers. Fig. 14a and Fig. 14c show the total number of hops while Fig. 14b and Fig. 14d show the results for the false drops. As expected the results for the average case are much better than the respective of the worst case. While the network becomes more dense there are relatively more false drops for the average case than for the worst one. The reason is that, in dense networks, in the worst case, the range query process is actually wandering in a highly connected and indexed neighborhood of peers of similar levels

where almost all of them have parts of the sought information. In the average case, the range process may still have to traverse many levels of peers, since the fragmentation of the initial area may produce very diverse, in size, parts.

levels one level all levels

sparse networks 4,46 4.64

dense networks 4,29 4,58

TABLE I E XPERIMENT ON Dz

10

Total number of hops (networks of 5000 peers)

False drops (networks of 5000 peers)

120

35 average case worst case

100

♦ +

average case worst case

+

30

+

♦ +

25

80

20 60 15 40

10

+

0

+ ♦

5+ ♦

♦

♦

+

♦

20

♦

0

4

6


14

16

4

6


(a) Total number of hops (networks of 10000 peers) average case worst case

False drops (networks of 10000 peers) +

♦ +

40

+

average case worst case

35

140

25

100

20

80

15

60

+

40

♦

♦

4

6

+ ♦

10 ♦

20 +

5+ ♦

♦

0


14

16

4

(c) Fig. 14.

♦ +

30

120

0

16

(b)

180 160

14

6


14

16

(d)

Range queries for the average and worst case in sparse and dense networks

Testing D distance We experimented on random single area queries for networks that included peers and data only from the basic grid and networks that included peers and data from all grid levels. Table I presents the average search path length for sparse (having 10000 peers) and dense networks (having 20000 peers) with peers/areas on one and all grid levels and shows that Dz with θ = 5 · n. Thus, since the measured search path lengths are similar, this value of θ results in a good distance metric. Table II shows the results of experiments on load balancing for 64 × 64 grids. The results correspond to networks of sizes 10000 to 80000. For instance, for networks having 10000 peers, most of the peers (7544) store just one area and there are only 2 peers storing 8 areas. Summarizing, we observe that from very sparse to very dense networks peers are very evenly loaded with stored information and as the networks are more dense the load is even more balanced. Thus, Dz is proved to be a good metric for the data hash function. The cost of index updating. Finally, we studied the overall cost of updating the peer indices due to join operations. We performed tests for both sparse and dense overlays on a basic 32×32 grid. Figures 15 and 16 present the results for overlays that comprise only areas of the same size, (therefore there are no perpendicular successors), and overlays that comprise areas of all sizes, respectively. The graphs in Fig. 15a and Fig. 16a show the results if the indices of the successors are inherited, (i.e. the indices of 6 and 4 at most peers are inherited for

Number of areas 1 2 3 4 5 6 7 8 9

10000 7544 1653 564 173 50 10 4 2 0

20000 13442 4301 1554 493 150 44 14 1 1

40000 34820 4355 701 104 18 2 0 0 0

80000 78454 1488 47 6 4 0 0 1 0

TABLE II E XPERIMENT FOR LOAD BALANCING

overlays with areas of one or all sizes, respectively), over the cost if search operations are performed for all index entries. The graphs in Fig. 15b and Fig. 16b show the results if the indices of the successors are inherited as well as the index of one successor per successor is inherited, (i.e. the indices of 12 and 8 at most peers are inherited for overlays with areas of one or all sizes, respectively), over the cost if search operations are performed for all index entries. For completeness, each graph presents the result of the reduced cost, W , over: •

•

the cost of constructing the correct index at the joining time, C, (b) the cost of the index as it is initialized using the inherited indices, I and the cost of constructing the final index, when all the peers have joined, F .

The results show that the cost of index updating is reduced significantly by just inheriting the successors’ indices to the

11

% of wrong fingers %W/C %W/I %W/F

0.25

% of wrong fingers + ♦

0.3 ♦ +

+ ♦

0.14 %W/C %W/I %W/F

0.12

+ ♦

0.1

♦ + + ♦

0.2

0.08

+ ♦

0.15

+ ♦

0.06

0.04

0.1

+ ♦

0.02

+ ♦

0.05 200 300 400 500 600 700 800 900 1000 1100 Number of nodes

0 200 300 400 500 600 700 800 900 1000 1100 Number of nodes

(a) Fig. 15.

(b)

Updating indices: all areas are of the same size % of wrong fingers

% of wrong fingers

0.35

0.18 %W/C %W/I %W/F

0.3

♦ +

+ ♦

0.25

+ ♦

0.2

+ ♦

0.15

0.14 0.12

+ ♦

♦ +

+ ♦

0.1 0.08

%W/C %W/I %W/F

0.16

+ ♦

0.06

♦ 0.1 +

♦ 0.04 +

0.05 0.02 2000 3000 4000 5000 6000 7000 8000 90001000011000 2000 3000 4000 5000 6000 7000 8000 90001000011000 Number of nodes Number of nodes

(a) Fig. 16.

(b)

Updating indices: areas of all sizes

newly joined peer. The reduction is about 90% for sparse overlays and about 75% for dense overlays. The cost reduction for sparse overlays is bigger for those that comprise areas of the same size, than for areas of all sizes; the opposite holds for dense overlays. It is more possible for sparse than for dense overlays that the successors’indices can provide most of the index entries of the newly joined peer, since all peers towards a direction are of the same size. Moreover, in dense overlays it is possible to acquire planar index entries from indices of perpendicular successors, which are not possible to acquire from indices of planar successors. Furthermore, the results show that by an increase of 50% of the peers that provide their indices (and this can be performed with constant cost) we achieve to reduce the index updating cost to about 100% for sparse overlays and more about 90% for dense overlays. Overall, the S PATIAL P2P framework proves to be significantly efficient as far as the index updating is concerned. V. R ELATED WORK The solution to the indexing and search problem of onedimensional data in P2P networks has proved to be the DHT paradigm [20], [15], [3], [24], [13]. However, the problem of handling multidimensional data in structured P2P systems has not been addressed until recently. In the past two years, interesting works have appeared towards this direction. Some of the first attempts towards managing multi-dimensional content in P2P systems, such as SCRAP [4], MAAN [2], CISS

[11], SQUID [18] and SkipIndex [23], propose partitioning the data using a space-filling curve and indexing them using a one-dimensional DHT method. In [4] a more elaborated version of SCRAP is presented: MURK partitions space with the kd-tree method. Since space partitioning is static and decided beforehand, the above proposals are weak in dynamic environments where data distribution changes over time. Built on MURK, ProBe [17] is a system that supports range queries on multidimensional data, focusing on load balancing: virtual peers are used to transfer content from one neighbor to another. The whole setting is fundamentally static, so that not much can be done for dynamic content reallocation. Moreover, psearch [21] performs well for retrieving related documents, but does not focus on range queries. A noticeable work that does deal with multi-dimensional queries is Mercury [1]. However, Mercury does not deal with proximity in the multi-dimensional space and, furthermore, uses onedimensional indices. Mercury replicates each data record to many peers, in order be able to retrieve the record when querying any dimension involved. The authors in [19] use skip graphs to connect and index the overlay network and z-ordering in order to partition space. This approach is an improvement against those described earlier: z-ordering is used to encode areas of various sizes and the connectivity and indexing of skip graphs allows for reallocation of data when peers join or leave the network, or data distribution changes.

12

Another approach to the problem is to construct distributed indices for multi-dimensional data. However, most such indices have been developed for centralized environments [12]. Thus, existing works towards this direction try to distribute traditional multi-dimensional indices in P2P networks. Starting with spatial data, the authors in [14] and in [22] propose the distribution of hierarchical centralized spatial indices in a P2P network. In [22], [6] the authors propose the distribution of a centralized Quad-tree index to the peers; they try to elude the root bottleneck problem by statically imposing a limit to the upper tree level allowed to be indexed in the P2P system. Moreover, since the Quad-tree stores data only in the leaves, the peers do not hold entire areas bigger than the finest granularity; thus, queries asking for such areas will have to visit a set of peers rather than only one, in order to acquire the respective information. Recently, in [9] VBI-tree proposes the hashing and distribution of multi-dimensional data based on a tree structure. Actually, space is partitioned and indexed using a tree-like centralized index, such as R-tree or M-tree. Then, tree nodes (both inner nodes and leaves) are distributed to overlay peers. Peers are linked so that ancestors are neighbors with their descendants. Beyond vertical indexing, indexing is also performed horizontally in the tree: peers index their cousins according to a logarithmic distance (as it is done using the Chord method). Overall, the novelty of this work lies in the fact that it overcomes the traffic bottleneck due to searching from the root, or in general from ancestor to descendant, and it avoids to adopt the solution of reducing the problem to one-dimensional space in order to use existing DHTs. The VBI-tree is based on BATON [8] which is an earlier proposal of distributing a centralized index in a P2P overlay. BATON and therefore the index that VBI-tree distributes is a binary tree; moreover, load balancing is actually performed with network restructuring using rotation operations as in the centralized AVL tree. The combination of these two characteristics deprives the VBItree of easily adapting to changes in data distribution that are fast and/or big; such changes are possible to occur in really dynamic environments. The reason is that when data/peers are added or removed the height of the tree changes rapidly, since every parent has only two children. Thus, network restructuring, which is an expensive operation, may be often necessary in order to keep the tree balanced. Recently, BATON is extended to BATON* [7]; BATON* considers centralized indexes like B-trees with more than two children. As the fanout degree of the tree increases, the size of the index on each peer changes. The authors in [7] focus on the cost of updating the peer indices. As explained in Section I both the VBI-tree and BATON are not compliant with the P2P paradigm, since peers do not have any kind of autonomy in which data they store and which peers they neighbor (and therefore which data they have easy access to). Actually, this is due to the fact that they are not inherently distributed, but they try to distribute a centralized index; thus, a joining peer has to take a specific place in the distributed tree that depends on the joining order of peers.

Comparison of S PATIAL P2P with other techniques. As we have discussed above there is a variety of methods that have been proposed for handling multidimensional data in P2P structured overlays. In Fig. 17 we present a representative comparison with the most recent such method, the VBI-tree [9]. We note that, in contrast with S PATIAL P2P, the focus of the VBI-tree is the scalability of the framework operations (i.e. search, join etc) over multiple dimensions of data, and not the efficiency of the operations for 2-dimensional, and therefore, spatial data. Yet, it is in general the most competitive existing method. For comparison, we have also included the results for zChord. We compare the methods for the most important operation of P2P overlays, i.e. searching for discrete data. Of course, all methods have a O(log N ) cost for N peers. We limit our comparison to the basic search operation since the VBI-tree has different characteristics and focuses on different aspects of the problem of handling data in P2P structured overlays. Actually, it focuses on keeping a small cost while the dimensions of data increase and does not keep locality of data, as S PATIAL P2P does. We have considered the VBI-tree for two dimensions and for data with small sizes1 . The VBI-tree has a noticeable worse cost than zChord for all N s. Thus, zChord which is the most straightforward solution for handling multidimensional data in P2P overlays, is much more efficient than the complex framework of the VBI-tree. Evidently, S PATIAL P2P has a much lower cost, (at least 30 − 40%), of the two compared methods for any number of overlay nodes. Specifically, the VBI-tree [9] achieves search paths of 8 hops long for a uniform data distribution in networks with only 10000 peers, whereas S PATIAL P2P reduces these paths to 4 hops. More importantly, S PATIAL P2P has low search paths even for much larger networks. For instance, as shown in Fig. 11a for a networks of 80000 peers, search path is about 5 hops. Number of hops 8.5 ♦

8

♦

♦

7.5 ♦

7 6.5 ♦

6 5.5

♦ +

♦ ♦

+

+

+

+

+

+

5 4.5

♦

♦

+

+ +

4

3.5 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Number of nodes VBI-tree

♦

zChord

+

S PATIAL P2P

Fig. 17. Comparison of VBI-tree, BATON, Chord, zChord and S PATIAL P2P

1 Note that we give an advantage to VBI-tree using data with small size, since, as mentioned in [9], the search paths are longer when the internal nodes of the tree hold data of big sizes.

13

VI. C ONCLUSIONS We have presented the S PATIAL P2P framework for handling spatial data in a P2P network. S PATIAL P2P provides efficient storing, indexing and searching services by preserving the locality and directionality. As a result, S PATIAL P2P performs exceptionally well for point and range query operations. In the future, we intend to adjust and test the S PATIAL P2P framework for data of higher dimensions. R EFERENCES [1] A. Bharambe, M. Agrawal, and S. Seshan, “Mercury: Supporting scalable multi-attribute range queries,” in Proc. of SIGCOMM, 2004. [2] M. Cai, M. Frank, J. Chen, and P. Szekely, “Maan: A multiattribute addressable network for grid information services,” in Proc. of Grid, 2003. [3] P. Druschel and A. Rowstron, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems,” in Proc. of Middleware, 2001. [4] P. Ganesan, B. Yang, and H. Garcia-Molina, “One torus to rule them all: Multi-dimensional queries in p2p systems,” in Proc. of WebDB, 2004. [5] A. Guttman, “R-trees: A dynamic index structure for spatial searching.” in Proc. of SIGMOD, 1984, pp. 47–57. [6] A. Harwood and E. Tanin, “Hashing spatial content over peer-to-peer networks,” in Proc. of ATNAC, 2003. [7] H. Jagadish, B. Ooi, K.-L. Tan, Q. Vu, and R. Zhang, “Speeding up search in peer-to-peer networks with a multi-way tree structure,” in Proc. of SIGMOD, 2006. [8] H. Jagadish, B. Ooi, and Q. Vu, “BATON: a balanced tree structure for peer-to-peer networks,” in Proc. of VLDB, 2005. [9] H. Jagadish, B. Ooi, Q. Vu, R. Zhang, and A. Zhou, “VBI-tree: A peerto-peer framework for supporting multi-dimensional indexing schemes,” in Proc. of ICDE, 2005. [10] V. Kantere, “Query management in p2p overlays,” Ph.D. dissertation, School of Electrical and Computer Engineering, 2007, forthcoming work. [11] J. Lee, H. Lee, S. Kang, S. Choe, and J. Song., “CISS: An efficient object clustering framework for DHT-based peer-to-peer applications,” in Proc. of DBISP2P, 2004. [12] Y. Manolopoulos, A. Nanopoulos, A. Papadopoulos, and Y. Theodoridis, R-Trees: Theory and Applications, ser. Series in Advanced Information and Knowledge Processing. Springer, 2005. [13] P. Maymounkov and D. Mazieres, “Kademlia: A peer-to-peer information system based on the xor metric,” in Proc. of IPTPS, 2002. [14] A. Mondal, Yilifu, and M. Kitsuregawa, “P2PR-tree: An R-tree-based spatial index for peer-to-peer environments,” in Proc. of DBISP2P, 2004. [15] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A scalable content addressable network,” in Proc. of SIGCOMM, 2001. [16] P. Rigaux, M. Scholl, and A. Voisard, Spatial Data Bases. Morgan Kaufman, 2001. [17] O. Sahin, S. Antony, D. Agrawal, and A. E. Abbadi, “PRoBe: Multidimensional range queries in p2p networks,” in Proc. of WiSe, 2005. [18] C. Schmidt and M. Parashar, “Flexible information discovery in decentralized distributed systems,” in Proc. of HPDC, 2003. [19] Y. Shu, B. Ooi, K.-L. Tan, and A. Zhou, “Supporting multi-dimensional range queries in peer-to-peer systems,” in Proc. of P2P, 2005. [20] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” in Proc. of SIGCOMM, 2001. [21] C. Tang, Z. Xu, and S. Dwarkadas, “Peer-to-peer information retrieval using self-organizing semantic overlay networks,” in Proc. of SIGCOMM, 2003. [22] E. Tanin, A. Harwood, and H. Samet, “A distributed quadtree index for peer-to-peer settings,” in Proc. of ICDE, 2005. [23] C. Zhang, A. Krishnamurthy, and R. Y. Wang, “Skipindex: Towards a scalable peer-to-peer index service for high dimensional data,” Technical Report (TR-703-04), 2004. [24] B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J. Kubiatowicz, “Tapestry: A resilient global-scale overlay for service deployment,” JSAC, 2003.