Some Geometric Lower Bounds

Some Geometric Lower Bounds Hank Chien1

?

and William Steiger2

?? ???

1 2

Computer Science, Harvard University, Cambridge, MA Computer Science, Rutgers University, New Brunswick, NJ

Dobkin and Lipton introduced the connected components argument to prove lower bounds in the linear decision tree model for membership problems, for example the element uniqueness problem. In this paper we apply the same idea to obtain lower bound statements for a variety of problems, each having the avor of element uniqueness. In fact one of these problems is a parametric version of element uniqueness which asks, given n inputs a1 ; : : : ; an and a query x 0, whether there is a pair of inputs satisfying jai 0 aj j = x; the case x = 0 IS element uniqueness. Then we apply some of these results to establish the fact that \search can be easier than uniqueness"; speci cally we give two examples (one is the planar ham-sandwich cut) where nding or constructing a geometric object - known to exist - is less complex than answering the question about whether that object is unique. Finally we apply some of these results, along with a reduction argument, to get a nontrivial lower bound for the complexity of the least median of squares regression problem in the plane. Abstract.

1 Introduction and Summary Given inputs a1; : : :; an , the element uniqueness question is to decide whether or not the inputs are distinct. This is really a membership problem asking whether a = (a1 ; : : :; an) 2 Sn , the set of points in Rn with distinct coordinates. Dobkin and Lipton [4] showed that every linear decision tree that solves the membership problem for a set S must have depth (log (C (S ))), where C (S ) denotes the number of path-connected components of S . In the element uniqueness problem, C (Sn) is easily seen to be at least n!, showing the obvious O(n log n) algorithm to be optimal. More importantly, it established a method of utilizing topological arguments in deducing lower bounds on the complexity of computation. This point was emphasized by Steele and Yao [15] who extended the Dobkin-Lipton lower bound for membership problems to algebraic decision trees of xed degree and Ben-Or [1] was able to improve these bounds. Bjorner, Lovasz, and Yao [2] used more re ned topological properties to obtain lower bounds for membership ? ?? ???

Research Supported by a Research Experiences for Undergraduates (REU) supplement to the NSF center grant to DIMACS. Research Supported in Part by NSF grant CCR-9111491. The author expresses gratitude to the NSF DIMACS Center at Rutgers.

problems whose sets are not covered by the previous methods, for example the k-equal-inputs problem. In the present paper we prove lower bound statements for a variety of problems, each having the avor of element uniqueness. Many of the familiar lower bounds in computational geometry (e.g. convex hull [5], slope selection [3]) depend on reduction to sorting, or to element uniqueness. Some of the results given here seem to require direct topological reasoning. Although we only use the simple connected components arguments of Dobkin and Lipton, we still think that these lower bounds are interesting in themselves and that they may be of some use. Because of space limitations, some of the details will be left for the nal paper. In Section 2 we establish some preliminary results that are either used later or are of independent interest. One is for a parametric version of element uniqueness, PEU. Given a1; : : :; an , and x 0 the problem is to decide whether there is a pair i 6= j for which jai 0 aj j = x. Element uniqueness is the case x = 0 which has complexity 2(n log n). Notice that if x max(ai ) 0 min(ai ), then we may discover that fact and then answer the query in linear time. Similarly if x is greater than all but a constant number of the absolute dierences. On the other hand in Section 2 we prove Theorem 1 Suppose that for some constant

c>0

#fi < j : x jai 0 aj jg c Then any decision tree for PEU must have depth

n : 2

(1)

(n log n).

It is interesting to ask about possible super-linear lower bounds g(n) for PEU if c in (1) is replaced by a sequence cn ! 0. Section 2 also has lower bounds for two other comparison tasks, the k-gap ranking and selection problem, and the even/odd partition problem. In addition there are lower bounds for two problems about line arrangements. Given n lines `1 ; : : :; `n in general position in the plane, the j th level of the arrangement is the set of points j on some line of the arrangement, and which have j 0 1 lines above them (w.r.t. y-coordinate). A natural question that arises in many geometric algorithms is whether a given line ì is ever part of the j th level. We prove Lemma 3: The level membership query \ì 2 j ?" has complexity

(n log n).

Next, given a set S of n elements, we consider search problems that seek an element x 2 S which has property P . It is known in advance in these problems that jfx 2 S : x has Pgj 1. The uniqueness question asks whether jfx 2 S : x has Pgj 1. We establish lower bounds for the uniqueness problem that exceed upper bounds on the search problem. This separation justi es the title of Section 3: \Search Can be Easier Than Uniqueness". Perhaps the most interesting example of this phenomenon concerns ham-sandwich cuts in R2 . Given n red points R = fr1; : : :; rng and m blue points B = fb1; : : :; bm g

R2 (w.l.o.g both m and n are taken as odd) a ham-sandwich cut is a pair ri ; bj such that the line incident to them has equal numbers of red points on in

each side and also equal numbers of blue points on each side. The ham-sandwich theorem [5] states that the set H of ham-sandwich cuts in R 2 B is not empty. Lo and Steiger, by giving an optimal algorithm, proved that the search problem for R 2 B is linear.

Proposition 1 (Lo and Steiger [8]) The complexity of nding a ham sandwich cut is 2(m + n). In Section 3 we show that the uniqueness question is more dicult. Assuming w.l.o.g. that m n

Theorem 2 The complexity of deciding if jHj > 1 is (n log n). In fact even knowing jHj > 1 we can give an (n log n) lower bound for nding a second ham-sandwich cut. The last three statements all pertain to the unit cost RAM model where each arithmetic operation and each binary comparison has cost = 1. Section 4 considers the least median of squares (LMS) regression problem. Let Pi = (xi; yi ), i = 1;: : :; n, be n given data points in the plane. The (LMS) regression problem asks for the minimizer (m3 ; b3) of the function

g(m; b) = medianfjyi 0 (mxi + b)j ; i = 1; : : :; n g;

(2)

this function measures how well the points are t by the line y = mx + b. The minimizer de nes the line y = m3 x + b3 that \best" ts the points according to the criterion in (2). Proposed by Rousseeuw [11], it is important in robust statistics because of its high (50%) breakdown point: Informally, in order to make an arbitrarily large change in (m3 ; b3 ), at least half of the data points must be perturbed. Steele and Steiger [14] showed that least median of squares regression is really a discrete optimization problem and they gave an O(n3) algorithm to compute it. Souvaine and Steele [13] improved this to an O(n2) time, O(n2 ) space algorithm and nally, Edelsbrunner and Souvaine [7], using a clever adaptation of topological sweep, gave an O(n2 ) time, O(n) space algorithm to compute the LMS line. Despite the fact that (1) Edelsbrunner and Souvaine asked for a matching

(n2) lower bound and (2) Rousseeuw and Leroy [12] actually conjectured that the Souvaine-Steele algorithm had optimal time complexity, no non-trivial lower bound was known. Here we can only prove

Theorem 3 The RAM cost to nd the LMS regression line for

(n log n).

n

points is

As we point out in Section 4 there are some compelling reasons to believe that

(n1+" ) steps are necessary to compute the LMS line.

2

Preliminary Results

In this section we prove some simple results which we think are interesting in their own right and which may be used later on. The rst concerns the even/odd partition problem in which an input a = (a1 ;: : :; an ) is given, n = 2m (even). The task is to nd (but not sort) the elements of odd rank. This problem arose in trying to prove Theorem 3. If we knew the permutation for which a1 1 1 1 an , then the set O = fa1 ; a3 ; : : :; a2m01 g with odd ranks would give the partition we seek, so O(n log n) is clearly an upper bound. Since there are fewer than 2n partitions, the information theory lower bound for this task is only linear. Nevertheless Lemma 1. Every decision tree for the even/odd partition problem has depth at

(n log n). Proof: Let a = (a1 ; : : :; an ) be an input with distinct coordinates. Each node in a decision tree asks about the sign of a linear function of the ai 's. Therefore

least

each leaf of the tree is reached by inputs that are in the same intersection of halfspaces (i.e., convex set). Each leaf of the tree returns a set of n=2 indices 1 j1 < j2 < 1 1 1 < jn=2 n pointing to the elements of odd rank. We claim that inputs that are sorted by permutations 6= , both having j1 ; : : :; jn=2 as their odd (or even) entries, must go to dierent leaves. If not, there are points p 6= q, respectively sorted (component-wise) by permutations and , that are in the same connected component of Rn . Let `(t) = p + t(q0p) be a point on the line joining p and q and let (t) be the permutation that sorts the coordinates of `(t). As t varies from 0 to 1, (t) changes from to via a sequence of at most n2 =2 transpositions. Immediately after the rst one, say at t1, `(t+1 ) +is a point for which j1; : : :; jn=2 is not the set of odd (or the even) entries of (t1 ), and in fact there are in nitely many such points (e.g., if there is another transposition at t2 > t1 , all points `(t), t 2 (t1 ; t2) have dierent even/odd partitions than p). tu Next we consider ranking and selection of k-gaps, problems that arose naturally in proving Theorem 3. Given a = (a1 ; : : :; an), ai ; aj form a k-gap if jRank(ai ) 0 Rank(aj )j = k; its size is ij = jai 0 aj j. Any of the n 0 k k-gaps can be found in linear time and clearly, all can be found and sorted in time O(n log n) . If there are a linear number of them, this is the optimal complexity for ranking and selection: Lemma 2. If

k cn for some constant c < 1, the depth of a decision tree to

(n log n).

rank or select k-gaps is

First we show that deciding whether the smallest k-gap is unique is easier than four selections and some linear-time computations. An algorithm that selects the smallest k-gap returns integers i; j which signify that i and j , the ranks of ai and aj , satisfy ji 0 j j = k and that the other n 0 k 0 1 k-gaps have size at least = jai 0 aj j. Consider ai < aj , the other possibility being similar. is unique if and only if (i) the minimum k-gap (if any) of far : ar ai g Proof:

is larger than ; (ii) the minimum k-gap (if any) of far : ar aj g is larger than ; (iii) the minimum (k-1)-gap of far : ar 6= ai and i 0 k < Rank(ar ) < i + kg is larger than ; (iv) the minimum (k-1)-gap of far : ar = 6 aj and j 0 k < Rank(ar ) < j + kg is larger than . The four sets can be computed in linear

time after which, the four selections are enough to answer the uniqueness query. Now consider n = 2k and p = (p1 ; : : :; pn ) where pi = i, i = 1; : : :; k + 1, and pk+i = k + i + ", i = 2; : : :; k. The components are the inputs to a k-gap uniqueness algorithm. For each permutation = (1; : : :; k+1) of the rst k + 1 integers, the point p = (p1 ; : : :; pk+1 ; pk+2 ; : : :; pn) is the center of an open ball whose points all go to the same leaf of any decision tree that answers the uniqueness question. It is a YES leaf because the coordinates of p have one k-gap of size = k and the rest have size k + ". Note that the largest (k-1)-gap has size k 0 1 + ". Suppose 6= 0 are two permutations with 1 = 1 = 10 and k + 1 = k+1 = 0 k+1. It is straightforward to verify that there must be a value of t 2 (0; 1) for which tp + (1 0t)p has two k-gaps of size . Therefore points p and p must belong to dierent connected components, which implies that the tree has depth

(n log n). tu 0

0

Certainly if k n 0 c for some constant c, all k-gaps may be found and sorted in linear time. On the other hand, it is interesting to consider the case when n 0 k = cn " 1. Now we turn to a parametric version of the element uniqueness problem, (PEU): decide whether, given inputs a1 ; : : :; an and a query x 0, there is a pair with jai 0 aj j = x. Theorem 1 Suppose that for some constant c > 0, #fi < j : x jai 0 aj jg 0 1 c n2 . Then any decision tree for PEU must have depth at least (n log n).

Proof: (Reduction from set disjointness [1]). Let S = fs1 ; : : :; sm g and T = ft1; : : :; tng be the inputs for the problem \S\T = ?", n m. In time O(m + n) we compute min S , max S , min T , max T , and the medians of both sets. We may assume min S < min T or else, in time O(m + n), discard all ti < min S ; similarly, we suppose max S > max T . We can also assume that the median of S is less than the median of T . Let = max S 0 min S , and de ne T 0 = T + 2 . We solve PEU for A = S [ T 0 using x = 2 . If a dierence jai 0 aj j = x, one of the elements was in S , the other in T 0 . Also t0i 0 sj = 2 if and only if ti = sj , so [1] implies that PEU is ((m + n) log n). Finally, because of the condition on the medians, 0 x 1is smaller than at least mn=4 of the pairwise dierences in A. If mn=4 > c m+2 n for some c 2 (0; 1) (see (1)), then n > 2cm and the PEU lower bound for the set A of size m + n is ((m + n) log c(m + n)). tu 0 1

It is interesting to consider the complexity of PEU when the function c n2 in Theorem 1 is replaced by one which constrains x less strigently, for example cn2= log n. The present proof seems to give no clue. The last two results of this section concern line arrangements in the plane. We are given (m1 ; b1); : : :; (mn ; bn), the slopes and intercepts of n lines `1; : : :; `n

in general position. The lines partition the plane into a complex of convex cells called the arrangement. The j th level of the arrangement, j , is de ned as the closure of the set of all points which lie on a unique line of the arrangement and which have exactly j 0 1 lines above them. In geometric algorithms it is often necessary to test whether a certain line ì is part of the j th level. The level membership query asks if ì 2 j . It is easy to see that if Rank(mi ) = k, the answer is YES for all j 2 [k;n 0 k + 1] and this may be decided in linear time. On the other hand if j 62 [k; n 0 k + 1], the intersections of ì with each of the other lines may be computed, the x-coordinates sorted, and then we can compute every level that ì meets in linear time. In fact this algorithm is optimal. Lemma 3. Any RAM algorithm for the level membership query must make

(n log n) comparisons in the worst case. Proof: Consider problems with 2n + 1 inputs (m1 ;b1 );: : :; (m2n+1 ; b2n+1), regarded as a point in R4n+2. Our \canonical" input will have slopes mi = 1, mn+i = 01, and intercepts bi = 0i, bn+i = i + ", i = 1 : : :; n. Line `2n+1 has equation y = 0 (see Fig. 1). The level membership query for line `2n+1 and level n (\is the x-axis ever above the median level?") has answer NO for " > 0. l2 l1

1+ ε

ln l3

l 2n+1

-1

l n+3 l n+1

l 2n

l n+2

Fig. 1. The canonical line arrangement.

For any permutation of the rst

n integers the point P = (m1 ; b1; : : :; mn ; bn; mn+1 ; bn+1 ; : : :; mn+n ; bn+n ) describes the canonical input, but with the last n line numbers permuted according to ; the answer is still NO. For permutations = 6 points P and P belong to dierent connected components. If not, for each t 2 (0; 1), the point P (t) = tP + (1 0 t)P is an input in the same component, by convexity. However as t varies from 0 to 1, lines forming levels n+1 through 2n will become permuted, starting with the ordering given by and ending with the ordering

given by . A line that is a dierent levels at t = 0 and t = 1 will \move" at a constant rate between these positions, in a parallel fashion, as t varies through (0; 1). Some line, say `n+j must move downwards. For some t 1=(n + 1) it will have decreased by exactly " and will still not have crossed any other line as long as " < 1=(n + 1). This point, P (t), is an input where the answer is YES, a contradiction. Therefore the tree must have n! NO leaves. tu

The nal geometric lower bound is for the slope selection problem [3]. 0 1n points Pi = (xi ;yi ), i = 1;: : :; n are given, along with a rank k; 1 k n2 . It is required to compute that pair i 6=0j 1so the slope (yi 0 yj )=(xi 0 xj ) is the kth smallest amongst the slopes of the n2 lines determined by the given points. The question was settled by Proposition 2

slope selection is

(Cole, Salowe, Steiger, Szemeredi [3]) The RAM complexity of 2(n log n).

The lower bound is a reduction to element uniqueness, and the upper bound, an elaborate algorithm. If we now consider a restricted class of inputs where the points are presented in order of increasing x-xoordinate (i.e., x1 < 1 1 1 < xn), the optimality of the algorithm presented in [3] is open (because now element uniqueness of the x-coordinates may be checked in linear time). However we can prove Even if a slope selection problem is known to have x1 < 1 1 1 every RAM algorithm makes (n log n) comparisons in the worst case.

Lemma 4.

< xn,

Proof: This is still a reduction to element uniqueness. Given a1 ; : : :; an we want to decide if they are distinct. We compute M = max(ai ) and de ne the 2n points Pi = (i; ai 0 M 0 1), i = 1; : : :; n and Pn+i = (n + i; M + 1 0 ai), i = 1; : : :; n. The idea is that if the line through Pi and Pj has positive (negative) slope then the line through Pn+i and Pn+j has negative (positive) slope. The only other possibility is that both are zero (if and only if ai = aj ). In addition the rst n points have negative y-coordinates and the last n have positive y-coordinates. Since xi < xj when i < j , we can use any algorithm to select the kth smallest slope determined by points with increasing x-coordinate. We use k = n2 0 n. The selected slope is negative if and only if the ai are distinct. tu

3 Search Can be Easier Than Uniqueness We study two search problems exemplifying the above heading. Both concern equi-partitioning of planar point sets. Given a set S of n points in the plane, a line ` is said to bisect S if both jS \ `+ j n=2 and jS \ `0 j n=2; i.e., each open halfspace has at most n=2 points of S . If n is odd, a bisecting line must be incident with a point of S . For the task of nding a bisecting line, we may assume n odd since otherwise, just delete any point p; a line that now bisects Snp (odd) also bisects S . In the bisecting diameter problem [6] S consists of n (unsorted) points 1 ; : : :; n on the circumference of the unit circle. The task is

to nd a diameter d that bisects S . Let H S be the inputs i such that the diameter di , incident with i , bisects S . It is easy to prove (i) jHj 1 (existence); (ii) if the inputs are sorted by radial angle (O(n log n) ), all i 2 H may be found in time O(n) (slow algorithm, cost = O(n log n)); (iii) a i 2 H can be found in 2(n) (fast search). The uniqueness question is to decide if jHj 1. The following result separates the search and uniqueness problems for bisecting diameters. Lemma 5. Every decision tree for the bisecting diameter uniqueness problem has depth at least

(n log n).

The proof depends on the lower bound for set equality [10] and is omitted. The bisecting diameters problem may be somewhat arti cial. The computation of a ham-sandwich cut however, is a partitioning problem which has some real applicability. In its dual version, we are given n red lines R = fr1 ;: : :; rn g and m blue lines B = fb1 ; : : :; bm g in general position in R2 . W.l.o.g. assume both n = 2k + 1 and m = 2j + 1 are odd and take m n. It is required to nd a ham-sandwich cut, namely a pair ri ; bj such that the point ri \bj has k red lines above and below and also j blue lines above and below. These points are the set H = k+1 \ j +1 of intersections of the red and blue median levels. The hamsandwich theorem says that jHj 1 and the Lo-Steiger theorem (Proposition 1) showed how to nd an element of H in linear time. Here we prove

b2 b1

1+ ε

bj b3 p

rk+1

-1

bj+3 bj+1

b2j+1

bj+2

all red vertices

Fig. 2. Uniqueness of a ham-sandwich cut.

n red and m blue lines, a RAM algorithm to decide if jHj > 1

((m + n) log(m + n)) comparisons in the worst case. Proof: (sketch, based on Lemma 3) Consider the 2k + 1 red lines and 2j + 1 blue lines shown in Figure 2. The blues have j positive slopes and j + 1 negative slopes, so the blue median level is the heavy line, ending with b2 +1. There are k red lines with positive slopes and k with negative slopes. The x-axis is r +1, a red

Theorem 2 Given must make

j

k

line and also the median level of the reds outside the vertical strip containing the red vertices. Once p = rk+1 \b2j +1 is discovered and b2j +1 is discarded, the level membership query \rk+1 2 j +1 ?" for b1; : : :; b2j ; rk+1 determines the existence of any other ham-sandwich cut. tu

4 Least Median of Squares Regression Given P1 = (x1; y1 ); : : :; Pn = (xn ; yn), n data points in general position in the plane, the least median of squares regression problem seeks a minimizer of the function g(m; b) = medianfjyi 0 (mxi + b)j ; i = 1; : : :; n g: Steele and Steiger [14] studied the combinatorics of this optimization problem by rst noting that the line `m;b with equation y = mx + b partitions the n points into sets of small, median-sized, and big residuals.

S M B

m;b

m;b m;b

= fi : jyi 0 (mxi + b)j < g(m; b)g = fi : jyi 0 (mxi + b)j = g(m; b)g = fi : jyi 0 (mxi + b)j > g(m; b)g

They characterized local minima of g by proving

Proposition 3

(Steele and Steiger [14]) The pair

g if and only if 1. jM j = 3. 2. there are i; j;k 2 M such 0[y 0 (mx + b)] = y 0 (mx 3. jB j 0 jS j 1. m;b

m;b

j

m;b

j

k

that

(m; b)

x 1=16. It is hard to imagine that a correct algorithm would not have to examine a xed fraction of them. In view of Proposition 3 each local minimizer has the slope of the line joining a pair of the points. We are trying to reduce the problem of minimizing h(n) inputs, n = o(h(n)) to a LMS regression problem for n points.

References 1. 2. 3. 4.

M. Ben-Or. \Lower Bounds for Algebraic Computation Trees". Proc. 15th STOC, (1983) 80-86. A. Bjorner, L. Lovasz, and A. Yao. \Linear Decision Trees: Volume estimates and Topological Bounds". Proc. 24th STOC, (1992) 170-177. R. Cole, J. Salowe, W. Steiger, and E. Szemeredi. \An Optimal Time Algorithm for Slope Selection", SIAM J. Comp. 18, (1989) 792-810. D. Dobkin and R. Lipton. \On the Complexity of Computations under Varying Sets of Primitives". Lecture Notes in Computer Science 33, 110-117, H. Bradhage, Ed., Springer-Verlag, 1975. 5. H. Edelsbrunner. Algorithms in Combinatorial Geometry. Springer-Verlag, Berlin, 1987. 6. H. Edelsbrunner. pers. com. 1986. 7. H. Edelsbrunner and D. Souvaine. \Computing Least Median of Squares Regression Lines and Guided Topological Sweep". J. Amer. Statist. Assoc. 85 (1990) 115-119. 8. Chi-Yuan Lo and W. Steiger. \An Optimal-Time Algorithm for Ham-Sandwich Cuts in the Plane". Second Canadian Conference on Computational Geometry, (1990), 5-9. 9. Chi-Yuan Lo, J. Matousek, and W. Steiger. \Algorithms for Ham-sandwich Cuts". Discrete and Comp. Geom. 11, (1994) 433-452 10. F.P. Preparata and Shamos, M.I. Computational Geometry. Springer-Verlag, New York, NY, 1985. 11. P. Rousseeuw. \Least Median of Squares Regression". J. Amer. Statist. Assoc. 79 (1984) 871-880. 12. P. Rousseeuw and A. Leroy. Robust Regression and Outlier Detection. John Wiley, New York, 1987. 13. D. Souvaine and M. Steele. \Ecient Time and Space Algorithms for Least Median of Squares Regression". J. Amer. Statist. Assoc. 82 (1987) 794-801. 14. M. Steele and W. Steiger. \Algorithms and Complexity for Least Median of Squares". Regression. Discrete Applied Math. 14, (1986) 93-100. 15. M. Steele and A. Yao. \Lower Bounds For Algebraic Decision Trees". J. Algorithms 3, (1982) 1-8. This article was processed using the LaTEX macro package with LLNCS style