Colored stochastic dominance problems

Colored stochastic dominance problems Jie Xue˚ [email protected]

Yuan Li˚ [email protected]

arXiv:1612.06954v2 [cs.CG] 3 Jan 2017

Abstract In this paper, we study the dominance relation under a stochastic setting. Let S be a set of n colored stochastic points in Rd , each of which is associated with an existence probability. We investigate the problem of computing the probability that a realization of S contains inter-color dominances, which we call the colored stochastic dominance (CSD) problem. We propose the first algorithm to solve the CSD problem for d “ 2 in Opn2 log2 nq time. On the other hand, we prove that, for d ě 3, even the CSD problem with a restricted color pattern is #P-hard. In addition, even if the existence probabilities are restricted to be 21 , the problem remains #P-hard for d ě 7. A simple FPRAS is then provided to approximate the desired probability in any dimension. We also study a variant of the CSD problem in which the dominance relation is considered with respect to not only the standard basis but any orthogonal basis of Rd . Specifically, this variant, which we call the free-basis colored stochastic dominance (FBCSD) problem, considers the probability that a realization of S contains inter-color dominances with respect to any orthogonal basis of Rd . We show that the CSD problem is polynomial-time reducible to the FBCSD problem in the same dimension, which proves the #P-hardness of the latter for d ě 3. Conversely, we reduce the FBCSD problem in R2 to the CSD problem in R2 , by which an Opn4 log2 nq time algorithm for the former is obtained.

1

Introduction

A point p P Rd is said to dominate another point q P Rd (denoted by p ą q) if the coordinate of p is greater than or equal to the coordinate of q in every dimension. The dominance relation is an important notion in multi-criteria decision-making, and has been well-studied in computational geometry, database, optimization, and other related areas. In the last decades, many problems regarding dominance relation have been proposed and investigated, e.g., deciding the dominance-existence of a dataset, counting the number of dominance pairs, reporting the dominance pairs, etc. The dominance relation is traditionally studied on certain datasets, in which the information of each data point is exactly known. However, in many real-world applications, due to noise and limitation of devices, the data obtained may be imprecise or not totally reliable. In this situation, uncertain datasets (or stochastic datasets), in which the data points are allowed to have some uncertainties, can better capture the features of real data, and is thus more preferable than certain ones. Motivated by this, in this paper, we investigate several problems regarding the dominance relation on stochastic datasets. Following [4, 5, 7, 8], the stochastic datasets to be considered in this paper are provided with existential uncertainty: each data point has a certain (known) location but an uncertain existence depicted by an associated existence probability. Given such a stochastic dataset in Rd , there is a natural question regarding the dominance relation: how likely does a realization of the dataset (i.e., a random sample of the points with the corresponding existence probabilities) contain dominance pairs? This question can be formulated as the problem of computing the probability that a realization contains dominance pairs. Also, one may consider a bichromatic version of the problem: the given stochastic dataset consists of bichromatic (say red and blue) points and when computing the probability, the dominance pairs of interest are those formed by one red point and one blue ˚ Dept. of Computer Science and Engg., Univ. of Minnesota — Twin Cities, 4-192 Keller Hall, 200 Union St. SE, Minneapolis, MN 55455, USA

1

point (similarly to the red/blue dominance problems in conventional setting). In this paper, we combine and generalize the above two problems into a so-called colored stochastic dominance (CSD) problem. Let S be a colored stochastic dataset in Rd (i.e., each data point is associated with a color and an existence probability). The CSD problem considers the probability that a realization of S contains inter-color dominances, i.e., dominance pairs formed by two points of different colors. The dominance relation is not preserved even under isometric transformations of Rd . However, feature transformations are in fact commonly used for analyzing complex data in many real-world applications. Therefore, in some situations, people are interested in the dominance relation not only in the original feature space but also under (proper) transformations of the feature space. For example, given a set of points in Rd , one may ask whether there always exists dominance pairs under any isometric transformation of Rd , or equivalently, with respect to any orthogonal basis of Rd (see Section 3 for a formal definition). Motivated by this, we also investigate a variant of the CSD problem, which we call the free-basis colored stochastic dominance (FBCSD) problem. Let S be a colored stochastic dataset in Rd . The FBCSD problem considers the probability that a realization of S contains inter-color dominances with respect to any orthogonal basis of Rd . Our results. (n is the number of the points and d is the dimension which is fixed.) 1. We solve the CSD problem for d “ 2 in Opn2 log2 nq time. 2. We prove that, for d ě 3, even the CSD problem with a restricted color pattern is #P-hard. Furthermore, even if the existence probabilities of the points are restricted to be 12 , the problem remains #P-hard for d ě 7. 3. We provide a simple FPRAS for the CSD problem in any dimension. 4. We show that the CSD problem is polynomial-time reducible to the FBCSD problem in the same dimension, which implies the #P-hardness of the latter for d ě 3. 5. We solve the FBCSD problem for d “ 2 in Opn4 log2 nq time. Related work. The classical study regarding the dominance relation can be found in many works such as [6, 9, 10]. Recently, there have been a few works considering the dominance relation on stochastic datasets [1, 11, 19]; however, their main focus are the skyline (or dominance-maxima) problems under locational uncertainty, which is quite different from the problems studied in this paper. Besides problems regarding the dominance relation, many other fundamental geometric problems have also been investigated under stochastic settings in recent years, e.g., closest pair [8], minimum spanning tree [7], convex hull [3, 14], linear separability [5, 18], nearest neighbor search [2, 13], range-max query [4], etc. Basic notions and preliminaries. We give the formal definitions of some basic notions used in this paper. A colored stochastic dataset S in Rd is represented by a 3-tuple S “ pS, cl, πq, where S Ă Rd is the point set, cl : S Ñ N is the coloring (or coloring function) indicating the colors of the points, and π : S Ñ r0, 1s is the function indicating the existence probabilities of the points, i.e., each point a P S has the color (label) clpaq and the existence probability πpaq. A realization of S refers to a random sample R Ď S where each point a P S is sampled with probability πpaq. For any A Ď S, an inter-color dominance in A with respect to cl (or simply an inter-color dominance in A if the coloring cl is unambiguous) refers to a pair pa, bq with a, b P A such that clpaq ‰ clpbq and a ą b. A sub-dataset of S is a colored stochastic dataset S 1 “ pS 1 , cl1 , π 1 q where S 1 Ď S, cl1 “ cl|S 1 , π 1 “ π|S 1 . A bipartite graph is represented as G “ pV Y V 1 , Eq, where V, V 1 are the two parts (of vertices) and E is the edge set. All the detailed proofs of our lemmas and theorems are presented in Appendix A.

2

The colored stochastic dominance problem

Let S “ pS, cl, πq be a colored stochastic dataset in Rd with S “ ta1 , . . . , an u. Define ΛS as the probability that a realization of S contains inter-color dominances. Set ΓS “ 1 ´ ΛS , which is the probability that a realization of S contains no inter-color dominances. The goal of the CSD problem is to compute ΛS (or ΓS ). 2

2.1

An algorithm for d “ 2

The na¨ıve method for solving the CSD problem is to enumerate all subsets of S and “count” those containing inter-color dominances. However, it requires exponential time, as there are 2|S| subsets of S to be considered. In this section, we show that the CSD problem in R2 can be solved much more efficiently. Specifically, we propose an Opn2 log2 nq-time algorithm to compute ΓS . For simplicity, we assume that the points in S have distinct x-coordinates and y-coordinates (if this is not the case, we can first “regularize” S by Lemma 15). When computing ΓS , we need to consider the realizations which contain no A Z(A) inter-color dominances. As we will see, in the case of d “ 2, these realizations have good properties, which allows us to solve the problem efficiently in a recursive way. For any point a P R2 , we use xpaq (resp., ypaq) to denote the x-coordinate (resp., y-coordinate) of a. Suppose the points a1 , . . . , an P S are already sorted such that xpa1 q ă ¨ ¨ ¨ ă xpan q. For convenience of exposition, we add a dummy point a0 to S with xpa0 q ă xpa1 q and ypa0 q ą ypai q for all i P t1, . . . , nu. The color clpa0 q is defined to be different from clpa1 q, . . . , clpan q, and πpa0 q “ 1. Note that including a0 does not change ΓS . For a subset A “ tai1 , . . . , air u of S with i1 ă ¨ ¨ ¨ ă ir , we Figure 1: Illustrating define ZpAq “ H if A is monochromatic, and otherwise ZpAq “ tai1 , . . . , ail u such A and ZpAq. that clpail q ‰ clpail`1 q “ ¨ ¨ ¨ “ clpair q. In other words, ZpAq is the subset of A obtained by dropping the “rightmost” points of the same color as air ; see Figure 1. We have the following important observation. Lemma 1 A realization R of S contains no inter-color dominances iff ZpRq contains no inter-color dominances and for any a P ZpRq, b P RzZpRq it holds that ypaq ą ypbq. With this in hand, we then consider how to compute ΓS . For a nonempty subset A Ď S, we define the signature sgnpAq of A as a pair pi, jq such that ai , aj P A and ai (resp., aj ) has the greatest x-coordinate (resp., smallest y-coordinate) among all points in A. Let Ei,j be the event that a realization R of S contains no inter-color dominances and satisfies sgnpRq “ pi, jq. Note that if a realization R contains no inter-color dominances, then either R “ ta0 u or some Ei,j happens for i, j P t1, . . . , nu. So we immediately have ΓS “

n ź

p1 ´ πpai qq `

i“1

n ÿ n ÿ

PrrEi,j s.

i“1 j“1

Now the problem is reduced to computing all PrrEi,j s. Instead of working on the events tEi,j u directly, we 1 consider a set of slightly different events tEi,j u defined as follows. For p P t0, . . . , nu, set Sp “ ta0 , . . . , ap u, 1 and we use Sp to denote the sub-dataset of S with point set Sp Ď S. Define Ei,j as the event that a realization R of Si contains no inter-color dominances and satisfies sgnpRq “ pi, jq. It is quite easy to see the equations n ź 1 PrrEi,j s “ PrrEi,j s¨ p1 ´ πpat qq. t“i`1 1 PrrEi,j s.

Set F pi, jq “ We show how to compute all F pi, jq recursively by applying Lemma 1. Observe that F pi, jq “ 0 if xpai q ă xpaj q (equivalently, i ă j) or ypai q ă ypaj q or clpai q ‰ clpaj q. Thus, it suffices to compute all F pi, jq with i ě j, ypai q ě ypaj q, clpai q “ clpai q (we say the pair pi, jq is legal if these three conditions hold). Let pi, jq be a legal pair. Trivially, for i “ j “ 0, we have F pi, jq “ 1. So suppose i, j ą 0. Let R be a realization of Si . To compute F pi, jq, we consider the signature sgnpZpRqq under 1 1 the condition that Ei,j happens. First, when Ei,j happens, we always have ZpRq ‰ H, because R at least contains a0 , ai , aj (possibly i “ j) and clpa0 q ‰ clpai q “ clpaj q. Therefore, in this case, sgnpZpRqq is defined and must be a legal pair pi1 , j 1 q for some i1 , j 1 P t0, . . . , i ´ 1u. It follows that F pi, jq can be computed by considering for each such pair pi1 , j 1 q the probability that R contains no inter-color dominances and sgnpRq “ pi, jq, sgnpZpRqq “ pi1 , j 1 q, and then summing up these probabilities. Note that if sgnpRq “ pi, jq and sgnpZpRqq “ pi1 , j 1 q, then i1 ă j and clpi1 q ‰ clpiq. In addition, if R contains no inter-color dominances, then we must have ypai q ă ypaj 1 q by Lemma 1. As such, we only need to consider the legal pairs pi1 , j 1 q 3

satisfying i1 ă j, ypai q ă ypaj 1 q, clpi1 q ‰ clpiq (we denote the set of these pairs by Ji,j ). Fixing such a pair pi1 , j 1 q P Ji,j , we investigate the corresponding probability. By the definition of ZpRq and Lemma 1, we observe that if R contains no inter-color dominances and sgnpRq “ pi, jq, sgnpZpRqq “ pi1 , j 1 q, then ‚ R X Si1 contains no inter-color dominances and sgnpR X Si1 q “ pi1 , j 1 q; ‚ R X pSi zSi1 q includes ai and aj , but does not include any point at for t P ti1 ` 1, . . . , iu satisfying clpat q ‰ clpai q or ypat q ă ypaj q or ypaj 1 q ă ypat q. Conversely, one can also verify that if a realization R of Si satisfies the above two conditions, then R contains no inter-color dominances (by Lemma 1) and sgnpRq “ pi, jq, sgnpZpRqq “ pi1 , j 1 q (note that ZpRq “ RXSi1 ). Therefore, the probability that R contains no inter-color dominances and sgnpRq “ pi, jq, sgnpZpRqq “ pi1 , j 1 q ˚ ˚ ˚ is just the product F pi1 , j 1 q ¨ πi,j ¨ Πi,j,i1 ,j 1 , where πi,j “ πpai q ¨ πpaj q if i ‰ j and πi,j “ πpai q if i “ j, and 1 Πi,j,i1 ,j 1 is the product of all p1 ´ πpat qq for t P ti ` 1, . . . , iu satisfying clpat q ‰ clpai q or ypat q ă ypaj q or ypaj 1 q ă ypat q. Based on this, we can compute F pi, jq as ÿ ` ÿ ` ˘ ˘ ˚ ˚ F pi, jq “ F pi1 , j 1 q ¨ πi,j ¨ Πi,j,i1 ,j 1 “ πi,j ¨ F pi1 , j 1 q ¨ Πi,j,i1 ,j 1 . (1) pi1 ,j 1 qPJi,j

pi1 ,j 1 qPJi,j

The straightforward way to compute each F pi, jq takes Opn3 q time, which results in an Opn5 q-time algorithm for computing ΓS . Indeed, the runtime of the above algorithm can be drastically improved to Opn2 log2 nq, by properly using dynamic 2D range trees with some tricks. We provide some brief ideas here and defer the details to Appendix B. A legal pair pi, jq is composed of two points, ai and aj . We examine all such pairs and compute the corresponding F p¨, ¨q values in a fashion by first enumerating ai from left to right and then aj from bottom to top (when ai is fixed). When we are about to compute F pi, jq, we need a data structure that stores F pi1 , j 1 q ¨ Πi,j,i1 ,j 1 as the weight for each legal pair pi1 , j 1 q and also supports range sum queries (as only those pi1 , j 1 q P Ji,j are of interest). A 2D range tree seems to fit here because each legal pair pi, jq can be uniquely represented by a 2D point pxpai q, ypaj qq and we can thus associate on it the corresponding weight. On the other hand, such a range tree must be dynamic since the weights of legal pairs keep varying from time to time. We show, in Appendix B.1, how to carefully design such a data structure, and more importantly, how to use it to efficiently and correctly update those weights throughout the entire computation of F pi, jq’s. As such, we conclude the following. Theorem 2 The CSD problem for d “ 2 can be solved in Opn2 log2 nq time.

2.2

Hardness results in higher dimensions

In this section, we prove the #P-hardness of the CSD problem for d ě 3. Indeed, our hardness result is even stronger, which applies to restricted versions of the CSD problem. As introduced in Section 1, we may have two specializations of the CSD problem, in one all data points have distinct colors, in the other data points are bichromatic. We want our hardness result to cover these two specializations. To this end, we need to introduce a notion called color pattern. A partition of a positive integer p is defined as a multi-set ∆ of positive integers whose summation is p. In a colored stochastic dataset S “ pS, cl, πq, the coloring cl naturally induces a partition of n “ |S| given by the multi-set t|cl´1 ppq| ą 0 : p P Nu, which we denote by ∆pSq. Let P “ p∆1 , ∆2 , . . . q be an infinite sequence where ∆p is a partition of p. We say P is a color pattern if it is “polynomial-time uniform”, i.e., one can compute ∆p for any given p in time polynomial in p. In addition, P is said to be balanced if p ´ max ∆p “ Ωppc q for some constant c ą 0 (here max ∆p denotes the maximum in the multi-set ∆p ). Then we define the CSD problem with respect to a color pattern P “ p∆1 , ∆2 , . . . q as the (standard) CSD problem with the restriction that the input dataset S “ pS, cl, πq must satisfy ∆pSq “ ∆n where n “ |S|. Besides specializing the CSD problem using color pattern, we may also make assumptions for the existence probabilities of the points. An important case is that all points have the same existence probability equal to 21 . In this case, each of the 2n subsets of S occurs as a realization of S with the same probability 2ń , and computing ΛS (or ΓS ) is equivalent to counting the subsets of S satisfying the desired properties. Our hardness result is presented in the following theorem. 4

Theorem 3 Let P be any balanced color pattern. Then the CSD problem with respect to P is #P-hard for d ě 3. In addition, even if the existence probabilities of the points are all restricted to be 12 , the CSD problem with respect to P remains #P-hard for d ě 7. Note that our result above implies the hardness of both the distinct-color and bichromatic specializations. The former can be seen via a balanced color pattern P “ p∆1 , ∆2 , . . . q with ∆p “ t1, . . . , 1u (i.e., a multiset consisting of p 1’s), while the latter can be seen via a balanced color pattern P “ p∆1 , ∆2 , . . . q with p`1 ∆p “ t p2 , p2 u for even p and ∆p “ t p´1 2 , 2 u for odd p. The proof of Theorem 3 is nontrivial, so we break it into several stages. 2.2.1

Relation to counting independent sets

For a colored stochastic dataset S “ pS, cl, πq, define GS “ pS, ES q as the (undirected) graph with vertex set S and edge set ES “ tpa, bq : a, b P S with clpaq ‰ clpbq and a ą bu. Since the edges of GS one-toone correspond to the inter-color dominances in S, it is clear that a subset A Ď S contains no inter-color dominances iff A corresponds to an independent set of GS . If πpaq “ 21 for all a P S, then we immediately have the equation ΓS “ Ind pGS q{2n , where Ind pGS q is the number of the independent sets of GS . This observation intuitively tells us the hardness of the CSD problem, as independent-set counting is a well-known #P-complete problem. Although we are still far away from proving Theorem 3 (because for a given graph G it is not clear how to construct a colored stochastic dataset S such that GS – G), it is already clear that we should reduce from some independent-set-counting problem. Regarding independent-set counting, the strongest known result is the following theorem obtained by Xia et al. [17], which will be used as the origin of our reduction. Theorem 4 Counting independent sets for 3-regular planar bigraphs is #-P complete. For a graph G “ pV, Eq, we say a map f : V Ñ Rd is a dominance-preserving embedding (DPE) of G to Rd if it satisfies the condition that pu, vq P E iff f puq ą f pvq or f pvq ą f puq. We define the dimension dimpGq of G as the smallest number d such that there exists a DPE of G to Rd (if such a number does not exist, we say G is of infinite dimension). We have seen above the relation between independent-set counting and the CSD problem with existence probabilities equal to 21 . Interestingly, with general existence probabilities, the CSD problem can be related to a much stronger version of independent-set counting, which we call cardinality-sensitive independent-set counting. Definition 5 Let c be a fixed integer. The c-cardinality-sensitive independent-set counting (c-CSISC) problem is defined as follows. The input consists of a graph G “ pV, Eq and a c-tuple Φ “ pV1 , . . . , Vc q of disjoint subsets of V . The task of the problem is to output, for every c-tuple pn1 , . . . , nc q of integers where 0 ď ni ď |Vi |, the number of the independent sets I Ď V of G satisfying |I X Vi | “ n śi cfor all i P t1, . . . , cu. We denote the desired output by Ind Φ pGq, which can be represented by a sequence of i“1 p|Vi | ` 1q integers. Note that the 0-CSISC problem is just the conventional independent-set counting. Lemma 6 Given any graph G “ pV, Eq with a DPE f : V Ñ Rd and a c-tuple Φ “ pV1 , . . . , Vc q of disjoint subsets of V , one can construct in polynomial time a colored stochastic dataset S “ pS, cl, πq in Rd with cl injective such that (1) GS – G and (2) Ind Φ pGq can be computed in polynomial time if ΓS is provided. In particular, the c-CSISC problem for a class G of graphs is polynomial-time reducible to the CSD problem in Rd , provided an oracle that computes for any graph in G a DPE to Rd . Indeed, Lemma 6 has a more general version which reveals the hardness of (general) stochastic geometric problems, see Appendix C for details. Another ingredient to be used in the proof of Theorem 3 is a lemma regarding color pattern. Lemma 7 Let P “ p∆1 , ∆2 , . . . q be a balanced color pattern. Given a colored stochastic dataset S “ pS, cl, πq in Rd with cl injective, if GS is a bipartite graph, then one can construct in polynomial time another colored stochastic dataset S 1 “ pS 1 , cl1 , π 1 q in Rd satisfying (1) ΓS 1 “ ΓS , (2) S Ď S 1 , (3) π 1 paq “ 21 for any a P S 1 zS, (4) xS 1 y is an instance of the CSD problem with respect to P. 5

2.2.2

#P-hardness for d ě 3

v

v0

In this section, we prove the first statement of Theorem 3, by providing a reduction from the independent-set counting problem for 3-regular planar bipartite graphs. Let G “ pV Y V 1 , Eq be a 3-regular planar bipartite graph. Suppose |V | “ |V 1 | “ n v v0 (note that we must have |V | “ |V 1 | for G is 3-regular), and then |E| “ 3n. Instead 2λ of working on G directly, we first pass to a new graph, which seems to have a lower dimension. Set λ “ 100n2 . We define G˚ as the graph obtained from G by inserting Figure 2: Inserting 2λ new vertices to each edge of G, i.e., replacing each edge of G with a chain of new vertices to each 2λ new vertices (see Figure 2). With an abuse of notation, V and V 1 are also used edge of G. to denote the corresponding subsets of the vertices of G˚ . Note that G˚ is also bipartite, in which V and V 1 belong to different parts. We use U (resp., U 1 ) to denote the set of the inserted vertices of G˚ which belong to the same part as V (resp., V 1 ). Then the two parts of G˚ are V Y U and V 1 Y U 1 . For each edge e P E of G, we denote by Ue (resp. Ue1 ) the set of the λ vertices in U (resp., U 1 ) which are inserted to the edge e. It is not surprising that the independent sets of G are strongly related to those of G˚ . Indeed, as we will show, counting independent sets for G can be done by solving the 4-CSISC instance xG˚ , pV, V 1 , U, U 1 qy. Define Ind p,p1 as the number of the independent sets I of G such that |I X V | “ p, |I X V 1 | “ q. Also, define Ind ˚p,p1 ,q,q1 as the number of the independent sets I ˚ of G˚ such that |I ˚ X V | “ p, |I ˚ X V 1 | “ p1 , |I ˚ X U | “ q, |I ˚ X U 1 | “ q 1 . Lemma 8 For any p, p1 P t0, . . . , nu, we have Ind p,p1 “ Ind ˚p,p1 ,3λp,3λn´3λp . In particular, Ind pGq “

n ÿ n ÿ

Ind i,j “

i“0 j“0

n ÿ n ÿ

Ind ˚i,j,3λi,3λn´3λi .

i“0 j“0

Now it suffices to reduce the 4-CSISC instance xG˚ , pV, V 1 , U, U 1 qy to an instance xSy of the CSD problem in R3 with respect to a given balanced color pattern P. Due to Lemma 6 and 7, the only thing we need for the reduction is a DPE of G˚ to R3 . Therefore, our next step is to show dimpG˚ q ď 3 and construct explicitly a DPE of G˚ to R3 (in polynomial time), which is the most non-obvious part of the proof. Recall that the two parts of G˚ are V YU and V 1 YU 1 . The DPE that we are going to construct makes the image of each vertex in V 1 Y U 1 dominates the images of its adjacent vertices in V Y U . We first consider the embedding for the part V Y U . Our basic idea is to map the vertices in V Y U to the plane H : x ` y ` z “ 0 in R3 . Note that by doing this we automatically prevent their images from dominating each other. However, the locations of (the images of) these vertices on H should be carefully chosen so that later we are able to further embed the part V 1 Y U 1 (to R3 ) to form a DPE. Basically, we map V Y U to H through two steps. In the first step, the vertices in V Y U are mapped to R2 via a map ϕ : V Y U Ñ R2 to be constructed. Then in the second step, we properly project R2 onto H via another map ψ : R2 Ñ H. By composing ψ and ϕ, we obtain the desired map ψ ˝ ϕ : V Y U Ñ H, which gives us the embedding for V Y U . To construct ϕ, we need a notion about graph drawing. Let K “ pZ ˆ Rq Y pR ˆ Zq Ă R2 be the grid. An orthogonal grid drawing (OGD) of a (planar) graph is a planar drawing with image in the grid K such that the vertices are mapped to the grid points Z2 . Note that an OGD draws the edges of the graph as (non-intersecting) orthogonal curves in R2 consisting of unit-length horizontal/vertical segments each of which connects two adjacent grid points (see Figure 3). We will apply the following result from [16]. Theorem 9 For any t-vertex planar graph of (maximum) degree 3, one can compute in polynomial time an OGD with image in K X Q3t where Qi denotes the square r1, is ˆ r1, is Ă R2 . Consider the original 3-regular planar bipartite graph G “ pV Y V 1 , Eq. By applying the above theorem, we can find an OGD g for G with image in K X Q6n . For each vertex v P V Y V 1 of G, we denote by gpvq the image of v in R2 under the OGD g. Also, for each edge e “ pv, v 1 q P E of G, we denote by gpeq the image of e under g, which is an orthogonal curve in R2 connecting gpvq and gpv 1 q. With the OGD g in hand, we construct the map ϕ as follows. For all v P V , we simply define ϕpvq “ gpvq. To determine ϕpuq for u P U , 6

v1 v1

v6

v6

v2

v2 v3

v5

v3

v4

v4

v5

Figure 3: An orthogonal grid drawing. we consider the vertices in Ue for each edge e P E of G separately. Suppose e “ pv, v 1 q and Ue “ tu1 , . . . , uλ u where u1 , . . . , uλ are sorted in the order they appear on e (from v to v 1 ). Consider the curve gpeq. Since g is an OGD, gpeq must consist of unit-length horizontal/vertical segments (each of which connects two grid points). The total number m of these unit segments is upper bounded by p6nq2 as gpeq Ă K X Q6n . Now we pick a set Pe of λ (distinct) points on gpeq as follows. ‚ The m ´ 1 grid points in the interior of gpeq are included in Pe (see Figure 4a). ‚ On each unit vertical segment of gpeq, we pick the point with distance 0.3 from the bottom endpoint and include it to Pe (see Figure 4b). ‚ On the unit segment of gpeq adjacent to gpv 1 q, we pick the point with distance 0.01 from gpv 1 q and include it to Pe (see Figure 4c). ‚ Note that the number of the above three types of points is at most 2m ď 72n2 ă λ. To make |Pe | “ λ, we then arbitrarily pick more (distinct) points on gpeq which have distances at least 0.4 to any grid point, and add them to Pe . Suppose Pe “ tr1 , . . . , rλ u where r1 , . . . , rλ are sorted in the order they appear on the curve gpeq (from gpvq to gpv 1 q). We then define ϕpui q “ ri . We do the same thing for every edge e P E of G. In this way, we determine ϕpuq for all u P U and complete defining the map ϕ. The next step, as mentioned before, is to project R2 onto H. The projection map ψ : R2 Ñ H is defined as ψ : px, yq ÞÑ px ` y, y ´ x, ´2yq. Then the composition ψ ˝ ϕ : V Y U Ñ H gives us the first part of our DPE. The remaining task is to embed the part g(v 0 )

g(v 0 )

g(v 0 )

g(v)

(a)

(b)

(c)

Figure 4: The construction of Pe . 1

1

3

V Y U to R , which completes the construction of our DPE. We must guarantee that the image of each vertex w1 P V 1 Y U 1 dominates and only dominates the images of the vertices in V Y U adjacent to w1 . To achieve this, we first establish an important property of the map ψ ˝ ϕ : V Y U Ñ H constructed above. For a finite set A of points in Rd , we define a point maxpAq P Rd as the coordinate-wise maximum of A, i.e., the i-th coordinate of maxpAq is the maximum of the i-th coordinates of all points in A, for all i P t1, . . . , du. Lemma 10 For each vertex w1 P V 1 Y U 1 , let Adjw1 Ď V Y U be the set of the vertices adjacent to w1 in G˚ , and Aw1 “ pψ ˝ ϕqpAdjw1 q Ă R3 be the set of the corresponding images under ψ ˝ ϕ. Then for any w P V Y U and w1 P V 1 Y U 1 , the point maxpAw1 q P R3 dominates pψ ˝ ϕqpwq iff w P Adjw1 . Once the above property is revealed, the construction of the map V 1 Y U 1 Ñ R3 is quite simple: we just map each vertex w1 P V 1 Y U 1 to the point maxpAw1 q P R3 . Now we complete constructing the embedding of G˚ to R3 , and need to verify it is truly a DPE. Lemma 10 already guarantees that the image of each w1 P V 1 Y U 1 dominates (the images of) the vertices in Adjw1 (i.e., the vertices in V Y U that are adjacent to w1 ) but does not dominate (the images of) any other vertices in V Y U . So it suffices to show that the 7

images of the vertices in V 1 Y U 1 do not dominate each other. Let w11 , w21 P V 1 Y U 1 be two distinct vertices, and assume that maxpAw11 q dominates maxpAw21 q. Then we must have maxpAw11 q dominates the points in Aw21 . By Lemma 10, this implies that Adjw21 Ď Adjw11 . However, as one can easily see from the structure of G˚ , it never happens that Adjw21 Ď Adjw11 unless w11 “ w21 . Thus, we conclude that the map constructed is a DPE of G˚ to R3 . With the DPE in hand, by applying Lemma 6 and 7, the first statement of Theorem 3 is readily proved. 2.2.3

#P-hardness for d ě 7 with half existence probabilities

In this section, we prove the second statement of Theorem 3. When the existence probabilities are restricted to be 21 , we are no longer able to apply the tricks used in the previous section, as the reduction from the CSISC problem (Lemma 6) cannot be done under such a restriction. This is the reason for why we have to “loosen” the dimension to 7 in this case. As we have seen, for a colored stochastic dataset S “ pS, cl, πq with πpaq “ 12 for all a P S, computing ΓS is totally equivalent to counting independent sets for GS . Therefore, we complete the proof by establishing a more direct reduction from independent-set counting for 3-regular planar bipartite graphs, which constructs directly a DPE of the input graph to R7 . However, it is non-obvious that any 3-regular planar bipartite graph G has dimension at most 7 and how to construct a DPE of G to R7 in polynomial time. To prove this, we introduce a new technique based on graph coloring. Indeed, we consider a more general case in which the graph G is an arbitrary bipartite graph. The graph coloring to be used is slightly different from the conventional notion, which we call halfcoloring. Let G “ pV Y V 1 , Eq be a bipartite graph. For any two distinct vertices u, v P V , we define u „ v if there exists a vertex in V 1 adjacent to both u and v. Definition 11 A k-halfcoloring of G on V is a map h : V Ñ t1, . . . , ku. The halfcoloring h is said to be discrete if hpuq ‰ hpvq for any u, v P V with u „ v, to be semi-discrete if it satisfies the condition that for any distinct u, v, w P V with u „ v and v „ w, hpuq, hpvq, hpwq are not all the same. Symmetrically, we may also define halfcoloring on V 1 . We may relate halfcoloring to the conventional graph coloring as follows. Define G1 “ pV, E 1 q with E 1 “ tpu, vq : u „ v in Gu. Clearly, a discrete k-halfcoloring of G on V corresponds to a (conventional) k-coloring of G1 satisfying that no two adjacent vertices share the same color, i.e., the subgraph of G1 induced by each color form an independent set of G1 . Similarly, a semi-discrete k-halfcoloring of G on V corresponds to a k-coloring of G1 satisfying that the subgraph of G1 induced by each color consists of connected components of sizes at most 2. If h is a k-halfcoloring of G on V , then for each v 1 P V 1 we denote by χh pv 1 q the number of the colors “adjacent” to v 1 (the color i is said to be adjacent to v 1 if there is a vertex v P V adjacent to v 1 with hpvq “ i). Our technical result is the following theorem, which establishes a relation between halfcoloring and graph dimension. Theorem 12 Let G “ pV YV 1 , Eq be a bipartite graph. If there exists a semi-discrete k-halfcoloring h : V Ñ t1, . . . , ku of G (on V ), then dimpGq ď 2k. In addition, if χh pv 1 q ă k for all v 1 P V 1 , then dimpGq ď 2k ´ 1. Furthermore, with h in hand, one can compute in polynomial time a DPE of G to R2k , or R2k´1 in the latter case. We then apply the halfcoloring technique to show that dimpGq ď 7 for any 3-regular planar bipartite graph G, which will give us a proof for the second statement of Theorem 3. To achieve this, the only missing piece is the following observation. Lemma 13 Every 3-regular planar bipartite graph has a discrete 4-halfcoloring, which can be computed in polynomial time. Now it is quite straightforward to prove the second statement of Theorem 3. Let G be a 3-regular planar bipartite graph. By combining Theorem 12 and Lemma 13, we can compute a DPE of G to R7 in polynomial time. By taking the images of the vertices of G under the DPE, we obtain a set S of points in R7 . Using the point set S, we further construct a colored stochastic dataset S “ pS, cl, πq by choosing an injection 8

cl : S Ñ N and defining πpaq “ 12 for any a P S. It is clear that GS – G and thus Ind pGq “ 2|S| ΓS . Then by applying Lemma 7, we can compute another colored stochastic dataset S 1 “ pS 1 , cl1 , π 1 q such that ΓS 1 “ ΓS and π 1 paq “ 21 for any a P S 1 , and more importantly, xS 1 y is an instance of the CSD problem with respect to P. With this reduction, the second statement of Theorem 3 is proved. Our conclusion that any 3-regular planar bipartite graph has dimension at most 7 is of independent interest, which results in an implication in order dimension theory [15] (see Appendix D).

2.3

A simple FPRAS

In this section, we describe a simple FPRAS (i.e., fully polynomial-time randomized approximation scheme) for approximating ΛS in any dimension. Recall that a FPRAS is a randomized algorithm which takes the input of the problem with an additional parameter ε ą 0, and computes an ε-approximation of the answer in polynomial (in both the size of the problem and 1{ε) time with high probability (say at least 2{3). A natural idea to design a FPRAS for approximating ΛS is to randomly generate a large number of realizations of S, and estimate ΛS using the proportion of the number of the realizations containing intercolor dominances to the total number of the realizations. However, since we are only allowed to generate polynomial number of realizations, this method does not guarantee to produce an ε-approximation of ΛS with high probability. For instance, if ΛS “ 2ń , then the estimation of ΛS obtained by generating polynomial number of realizations would be 0 with probability almost 1 (as one can easily verify using union bound). Interestingly, by slightly making some changes to this simple method, we can truly obtain a FPRAS for computing ΛS . Our FPRAS works as follows. Suppose the points a1 , . . . , an are already sorted by their existence probabilities from large to small, i.e., πpa1 q ě ¨ ¨ ¨ ě πpan q. Instead of estimating ΛS directly, what we do is to estimate a set of conditional probabilities and use them to compute an estimation of ΛS . For any i, j P t1, . . . , nu with i ă j, we define Ei,j as the event that a realization R of S includes ai , aj and any other points in R have indices smaller than i. Then we immediately have ΛS “

n´1 ÿ

n ÿ

PrrEi,j s ¨ Cond i,j ,

(2)

i“1 j“i`1

where Cond i,j is the conditional probability that a realization of S contains inter-color dominances under the condition that Ei,j happens. The probabilities PrrEi,j s can be straightforwardly computed. But we are not able to exactly compute Cond i,j in polynomial time, so we try to estimate them by randomly generating realizations. For p P t0, 1, . . . , nu, set Sp “ ta1 , . . . , ap u, and we use Sp to denote the sub-dataset of S with point set Sp Ď S. We randomly generate N “ 10n5 {ε2 realizations of Sp for each p P t0, 1, . . . , nu. Let Rp,q be the q-th realization of Sp . Naturally, we compute an estimation Est i,j for each Cond i,j as Est i,j “

N ÿ σpRi´1,k Y tai , aj uq , N k“1

where σpRq “ 1 if R contains inter-color dominances and σpRq “ 0 otherwise. Then we can apply Equation 2 to compute an estimation Λ of ΛS , simply by replacing each Cond i,j with its estimation Est i,j . It is quite surprising that Λ is, with high probability, an ε-approximation of ΛS (note that each Est i,j is not necessarily an ε-approximation of Cond i,j with high probability). The following theorem completes the discussion. Theorem 14 We have p1 ´ εqΛS ă Λ ă p1 ` εqΛS with probability at least 2{3.

3

The free-basis colored stochastic dominance problem

Let S “ pS, cl, πq be a colored stochastic dataset in Rd with S “ ta1 , . . . , an u. By naturally generalizing the conventional dominance relation, one can define dominance relation with respect to a specific orthogonal basis

9

of Rd . For two distinct points p, q P Rd and an orthogonal basis B “ pb1 , . . . , bd q of Rd , we say p dominates q with respect to B (denoted by p ąB q) if xbi , py ě xbi , qy for every i P t1, . . . , du, where x¨, ÿ is the inner product. With this generalized definition, the conventional dominance relation is just dominance relation with respect to the standard basis E “ pe1 , . . . , ed q of Rd . Define Λ˚S as the probability that a realization of S contains inter-color dominances with respect to any orthogonal basis of Rd . Set ΓS˚ “ 1 ´ Λ˚S , which is the probability that a realization of S contains no inter-color dominances with respect to some orthogonal basis of Rd . The goal of the FBCSD problem is to compute Λ˚S (or ΓS˚ ).

3.1

Reduction from the CSD problem

In this section, we show that the (standard) CSD problem in Rd is polynomial-time reducible to the FBCSD problem in the same dimension, which implies the latter is #P-hard for d ě 3. Given a colored stochastic dataset S “ pS, cl, πq in Rd as an instance of the CSD problem, our reduction tries to construct another colored stochastic dataset S 1 “ pS 1 , cl1 , π 1 q in Rd such that Λ˚S 1 “ ΛS . The intuition of our reduction is the following. First, consider the given colored stochastic dataset S. Clearly, we have Λ˚S ď ΛS , as every realization of S counted in Λ˚S is also counted in ΛS . The reason for why Λ˚S may be smaller than ΛS is that perhaps some realization contains inter-color dominances with respect to the standard basis E of Rd but does not contain inter-color dominances with respect to some other basis. To handle this, our basic idea is to add a set Ψ of (colored) auxiliary points with existence probabilities 1 to S, that is, we want S 1 “ S Y Ψ with π 1 pbq “ 1 for all b P Ψ (and π 1 paq “ πpaq, cl1 paq “ clpaq for all a P S). The goal of adding these auxiliary points is to guarantee that a subset A Ď S contains inter-color dominances with respect to the standard basis E iff A Y Ψ Ď S 1 contains inter-color dominances with respect to any orthogonal basis. Note that as long as Ψ has this property, it obviously holds that Λ˚S 1 “ ΛS . Therefore, the critical part of our reduction is to construct such a set Ψ with the desired property. We achieve this through several steps. First of all, we need to make the point set S “regular”. Formally, we say a (finite) point set X Ă Rd is regular if X Ă t1, 2, . . . , |X|ud and any two distinct points x, x1 P X have distinct coordinates in all dimensions. It is easy to see that one can always “regularize” a point set without changing the dominance relation (with respect to E) among the points. Lemma 15 Given a set S “ ta1 , . . . , an u Ă Rd of distinct points, one can construct in Opn log nq time a regular set Snew “ tˆ a1 , . . . , a ˆn u Ă Rd such that a ˆ i ąE a ˆj iff ai ąE aj . Now we may assume S is regular. To construct Ψ , we need to introduce some notions. Definition 16 Let B “ pb1 , . . . , bd q be an orthogonal basis of Rd . We define the cone CB of B as # + # + d d ÿ ÿ CB “ βi bi : β1 , . . . , βd ě 0 Y βi bi : β1 , . . . , βd ď 0 Ă Rd . i“1

i“1

Also, we define the projective cone PC B Ă Pd´1 as the image of CB zt0u in Pd´1 under the obvious quotient map Rd zt0u Ñ Pd´1 . For a point x P Rd with x ‰ 0, we denote by x its image in Pd´1 under the quotient map Rd zt0u Ñ Pd´1 . The notion of (projective) cone defined above gives us another way to view dominance relations with respect to an orthogonal basis. Consider two distinct points p, q P Rd , and an orthogonal basis B of Rd . It is easy to see that p, q form a dominance with respect to B (i.e., p ąB q or q ąB p) iff p ´ q P CB , or equivalently, p ´ q P PC B . Another notion we need is a metric on any projective space Pk . Definition 17 For two points l, l1 P Pk , we define angpl, l1 q P r0, π2 s to be the angle between l and l1 as lines in Rk`1 through the origin (there are two supplementary angles, take the smaller one which is in r0, π2 s). It is easy to see that angp¨, ¨q defines a metric on Pk . The following lemmas establish some geometric properties of the projective cone and the ang-metric, which will be helpful for constructing Ψ . 10

Lemma 18 Let B be an orthogonal basis of Rd , and l be a point in Pd´1 . If l R PC B , then there exists x P PC B perpendicular to l, i.e., angpl, xq “ π2 . Lemma 19 For any orthogonal basis B of Rd , any point x P PC B , and any real number ε P p0, π2 s, there ε ε exists y P PC B with angpx, yq ă ε such that the 3? -ball at y, i.e., the set tz P Pd´1 : angpz, yq ď 3? u, is d d contained in PC B . Lemma 20 Let l be a point in Pd´1 and ε ě ξ ą 0 be two real numbers. Then one can compute m “ Opε{ξ d´1 q points l1 , . . . , lm P Pd´1 in Opmq time such that (1) angpl, li q ą π2 ´ ε for all i P t1, . . . , mu and (2) for any y P Pd´1 with angpl, yq ą π2 ´ ε, there exists some li satisfying angpli , yq ă ξ. With the above lemmas in hand, we now describe the construction of Ψ . We look at all pairs pa, a1 q of points in S such that clpaq ‰ clpa1 q and a ąE a1 . For each such pair pa, a1 q, we do the following. ε Set l “ a ´ a1 P Pd´1 , ε “ arcsinp ?1dn q, and ξ “ 3? . By applying Lemma 20 with l, ε, ξ, we compute d d´1 d´2 d´1 m “ Opε{ξ q “ Opn q points l1 , . . . , lm P P satisfying the conditions (1) and (2) in the lemma. In addition, we observe the following. ‚ li R PC E for all i P t1, . . . , mu. ‚ For any orthogonal basis B of Rd , if l R PC B , then there exists some li P PC B . To see the first observation, recall that S is already regular. Since a ąE a1 and S is regular, l can be represented by homogeneous coordinates rα1 : ¨ ¨ ¨ : αd s with α1 , . . . , αd P t1, . . . , n ´ 1u. Based on this, one can easily verify that angpl, l1 q ă arccosp ?1dn q for any l1 P PC E . But we have angpl, li q ą π2 ´ε “ arccosp ?1dn q by Lemma 20. Thus, li R PC E . To see the second observation, let B be an orthogonal basis of Rd with l R PC B . By Lemma 18, there exists x P PC B with angpl, xq “ π2 . Then by Lemma 19, there exists y P PC B ε -ball at y is contained in PC B . Since angpl, xq “ π2 and angpx, yq ă ε, such that angpx, yq ă ε and the 3? d π we have angpl, yq ą 2 ´ ε. Therefore, according to the condition (2) in Lemma 20, there must exist some ε ε li such that angpli , yq ă ξ. Recall that ξ “ 3? , so li is in the 3? -ball at y and hence in PC B . These two d d observations will be used later to verify that Ψ satisfies the desired property. Now we continue to discuss the construction of Ψ . We have computed m points l1 , . . . , lm P Pd´1 for a specific pair pa, a1 q. We do the same thing for all pairs pa, a1 q of points in S with clpaq ‰ clpa1 q and a ąE a1 . After this, we obtain M “ Opn2 mq “ Opnd q points in Pd´1 (with an abuse of notation, we denote them by l1 , . . . , lM ). The set Ψ we construct consists of 2M points b1 , . . . , bM , b11 , . . . , b1M P Rd where bi , b1i correspond to li for i P t1, . . . , M u. We set the coordinates of each bi in Rd to be pí, . . . , í, n ` iq. Then we choose location for each b1i in Rd such that kb1i ´ bi k2 ă 0.1 and b1i ´ bi “ li (there are infinitely many choices, we arbitrarily pick one of them). Finally, we need to define the coloring of the points in Ψ , i.e., cl1 pbq for all b P Ψ . We arbitrarily color the points in Ψ under the only restriction that bi and b1i must have different colors, i.e., cl1 pbi q ‰ cl1 pb1i q, for all i P t1, . . . , M u. It suffices to verify the property that A Ď S contains inter-color dominances with respect to E iff A Y Ψ Ď S 1 contains inter-color dominances with respect to any orthogonal basis. To see the “if” part, let A Ď S be a subset such that A Y Ψ Ď S 1 contains inter-color dominances with respect to any orthogonal basis. Since S Ă r1, nsd (as S is regular) and li R PC E for all i P t1, . . . , M u (as observed above), the points in Ψ do not dominate each other and do not form dominance with any points in S, with respect to E. But by assumption, A Y Ψ contains inter-color dominances with respect to E. So the inter-color dominances must be formed by the points in A, i.e., A contains inter-color dominances with respect to E. To see the “only if” part, let A Ď S be a subset containing inter-color dominances with respect to E. Suppose a, a1 P A are two points such that clpaq ‰ clpa1 q and a ąE a1 . Consider an orthogonal basis B of Rd , and we must show that A Y Ψ contains inter-color dominances with respect to B. If a ąB a1 or a1 ąB a, then we are done. Otherwise, recall that we have m points in tl1 , . . . , lM u which are chosen for the pair pa, a1 q (assume they are l1 , . . . , lm without loss of generality). By our observation above, one of these m points must be in PC B , say l1 P PC B . Then the two points b1 , b11 P Ψ form an inter-color dominance with respect to B. By the above construction, we obtain a colored stochastic dataset S 1 “ pS Y Ψ, cl1 , π 1 q in Rd satisfying ˚ ΛS 1 “ ΛS . Clearly, this reduction can be done in polynomial time. Thus, the FBCSD problem is #P-hard for d ě 3. In fact, with some efforts, one can make this result stronger by considering the FBCSD problem with respect to a balanced color pattern. 11

Theorem 21 Let P 1 “ p∆11 , ∆12 , . . . q be a balanced color pattern. Then for any fixed d, there exists a balanced color pattern P “ p∆1 , ∆2 , . . . q such that the CSD problem in Rd with respect to P is polynomialtime reducible to the FBCSD problem in Rd with respect to P 1 . In particular, the FBCSD problem in Rd with respect to P 1 is #P-hard for d ě 3.

3.2

Reduction to the CSD problem for d “ 2

In this section, we study the FBCSD problem for d “ 2 and show that an instance of the FBCSD problem in R2 can be reduced to Opn2 q instances of the CSD problem in R2 . By combining this reduction with our algorithm given in Section 2.1, we directly obtain an Opn4 log2 nq-time algorithm for the FBCSD problem in R2 . For simplicity of exposition, we assume that S is in general position in R2 , i.e., no three points are collinear. We try to compute ΓS˚ . When computing ΓS˚ , we need to consider the realizations of S which contain no inter-color dominances with respect to some orthogonal basis of R2 (these realizations are said to be good ). We first establish a criterion for testing whether a realization is good. Recall that for a nonzero point x P Rd , the notation x denotes the image of x in Pd´1 under the quotient map Rd zt0u Ñ Pd´1 . For a subset A Ď S, we define LA “ tai ´ aj : ai , aj P A and clpai q ‰ clpaj qu Ă P1 . For two points l, l1 P P1 , we denote by θpl, l1 q the angle between l and l1 whose counterclockwise boundary is l and clockwise boundary is l1 (when talking about angle we regard l and l1 as lines in R2 through the origin). Then we have the following observation. Lemma 22 A realization R of S is good iff LR “ H or there exists a unique l P LR such that θpl, l1 q ą π2 for any l1 P LR not equal to l.

a1

Note that LR “ H iff R is monochromatic. Based on the above lemma, we a2 now define a notion called witness pair as follows. Let R be a good but not a8 monochromatic realization of S. Then by Lemma 22, there exists a unique l P LR a5 such that θpl, l1 q ą π2 for any l1 P LR not equal to l. According to the definition a4 of LR , we must have l “ ai ´ aj for some ai , aj P R with clpai q ‰ clpaj q. Note that the choice of ai , aj is not necessarily unique (though l is unique). Let Y be the set of all pairs pai , aj q with ai , aj P R satisfying clpai q ‰ clpaj q and Figure 5: An examl “ ai ´ aj . We claim that there exists a unique pair pai˚ , aj ˚ q P Y such that ple of witness pair. for any pai , aj q P Y we have j ˚ ě j. The existence is obvious, so it suffices to l “ a1 ´ a8 “ a2 ´ a5 . show the uniqueness. Indeed, if pai , aj q and pai1 , aj q are two pairs in Y , then witpRq “ pa1 , a8 q. the points ai , ai1 , aj must be collinear in R2 . However, because of the general position assumption for S (and hence for R), we must have i “ i1 . It follows that for any j P t1, . . . , nu there is at most one pair pai , aj q P Y , which further implies the uniqueness of pai˚ , aj ˚ q. We define the pair pai˚ , aj ˚ q as the witness pair of R, denoted by witpRq. See Figure 5 for an example. Now it is clear that ΓS˚ “ Pr mono `

n ÿ n ÿ

Pr i,j ,

i“1 j“1

where Pr mono is the probability that a realization R of S is monochromatic, and Pr i,j is the probability that R is good (but not monochromatic) with witpRq “ pai , aj q. It is easy to compute Pr mono in linear time. The problem remaining is how to compute Pr i,j for all i, j P t1, . . . , nu. Fix a pair pi˚ , j ˚ q. Obviously, if clpai˚ q “ clpaj ˚ q, we immediately have Pr i˚ ,j ˚ “ 0. So suppose clpai˚ q ‰ clpaj ˚ q. We try to reduce the the task of computing Pr i˚ ,j ˚ to an instance of the CSD problem in R2 . Let b1 “ pai˚ ´ aj ˚ q{kai˚ ´ aj ˚ k2 be a unit vector of R2 , and b2 be another unit vector obtained by rotating b1 clockwise with angle π2 . Clearly, B “ pb1 , b2 q is an orthogonal basis of R2 . We define n points a11 , . . . , a1n P R2 as follows. Let δ be a small enough real number such that for any i, j P t1, . . . , nu we have |xb2 , ai y ´ xb2 , aj y| ą δ unless xb2 , ai y “ xb2 , aj y. Consider a specific index p P t1, . . . , nu. If p ď j ˚ and there exists q ď j ˚ satisfying clpap q ‰ clpaq q, ap ąB aq , xb2 , ap y “ xb2 , aq y, then we set the coordinates of a1p in R2 to be pxb2 , ap y ´ δ, xb1 , ap yq. Otherwise, we set the coordinates of a1p to be pxb2 , ap y, xb1 , ap yq. Based 12

on this, we can construct a colored stochastic dataset S 1 “ pS 1 , cl1 , π 1 q in R2 by defining S 1 “ ta11 , . . . , a1n u, cl1 pa1i q “ clpai q for all i P t1, . . . , nu, and π 1 pa1i˚ q “ π 1 pa1j ˚ q “ 1, π 1 pa1i q “ πpai q for all i P t1, . . . , nuzti˚ , j ˚ u. We observe the following equation, which allows us to compute Pr i˚ ,j ˚ by solving the instance xS 1 y of the CSD problem in R2 . Lemma 23 Pr i˚ ,j ˚ “ πpai˚ q ¨ πpaj ˚ q ¨ ΓS 1 . In this way, an instance of the FBCSD problem in R2 is reduced to Opn2 q instances of the CSD problem in R2 . By plugging in our Opn2 log2 nq algorithm for solving the CSD problem in R2 , we have the following result. Theorem 24 The FBCSD problem in R2 can be solved in Opn4 log2 nq time.

References [1] P. Afshani, P.K. Agarwal, L. Arge, K.G. Larsen, and J.M. Phillips. (approximate) uncertain skylines. In Proc. of the 14th ICDT, pages 186–196. ACM, 2011. [2] P.K. Agarwal, B. Aronov, S. Har-Peled, J.M. Phillips, K. Yi, and W. Zhang. Nearest neighbor searching under uncertainty II. In Proc. of the 32nd PODS, pages 115–126. ACM, 2013. [3] P.K. Agarwal, S. Har-Peled, S. Suri, H. Yıldız, and W. Zhang. Convex hulls under uncertainty. In Algorithms-ESA, pages 37–48. Springer, 2014. [4] P.K. Agarwal, N. Kumar, S. Sintos, and S. Suri. Range-max queries on uncertain data. In Proc. of the 35th SIGMOD/PODS, pages 465–476. ACM, 2016. [5] M. Fink, J. Hershberger, N. Kumar, and S. Suri. Hyperplane separability and convexity of probabilistic point sets. In Proc. of the 32nd SoCG. ACM, 2016. [6] H.N. Gabow, J.L. Bentley, and R.E. Tarjan. Scaling and related techniques for geometry problems. In Proc. of the 16th STOC, pages 135–143. ACM, 1984. [7] P. Kamousi, T.M. Chan, and S. Suri. Stochastic minimum spanning trees in euclidean spaces. In Proc. of the 27th SoCG, pages 65–74. ACM, 2011. [8] P. Kamousi, T.M. Chan, and S. Suri. Closest pair and the post office problem for stochastic points. Computational Geometry, 47(2):214–223, 2014. [9] H-T. Kung, F. Luccio, and F.P. Preparata. On finding the maxima of a set of vectors. Journal of the ACM, 22(4):469–476, 1975. [10] C.H. Papadimitriou and M. Yannakakis. Multiobjective query optimization. In Proc. of the 20th SIGMOD/PODS, pages 52–59. ACM, 2001. [11] J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In Proc. of the 33rd Intl. Conf. on VLDB, pages 15–26. VLDB Endowment, 2007. [12] N. Robertson, D.P. Sanders, P. Seymour, and R. Thomas. Efficiently four-coloring planar graphs. In Proc. of the 28th STOC, pages 571–575. ACM, 1996. [13] S. Suri and K. Verbeek. On the most likely Voronoi Diagram and nearest neighbor searching. In ISAAC, pages 338–350. Springer, 2014. [14] S. Suri, K. Verbeek, and H. Yıldız. On the most likely convex hull of uncertain points. In Algorithms– ESA, pages 791–802. Springer, 2013.

13

[15] W.T. Trotter. Combinatorics and partially ordered sets: Dimension theory, volume 6. JHU Press, 2001. [16] L.G. Valiant. Universality considerations in vlsi circuits. IEEE Transactions on Computers, 100(2):135– 140, 1981. [17] M. Xia and W. Zhao. #3-regular bipartite planar vertex cover is #P-complete. In Intl. Conf. on TAMC, pages 356–364. Springer, 2006. [18] J. Xue, Y. Li, and R. Janardan. On the separability of stochastic geometric objects, with applications. In Proc. of the 32nd SoCG. ACM, 2016. [19] W. Zhang, X. Lin, Y. Zhang, M.A. Cheema, and Q. Zhang. Stochastic skylines. ACM TODS, 37(2):14, 2012.

14

Appendix A A.1

Missing proofs Proof of Lemma 1

To see the “if” part, assume that ZpRq contains no inter-color dominances and ypaq ą ypbq for any a P ZpRq, b P RzZpRq. In this case, any two points in ZpRq cannot form an inter-color dominance. Also, any two points in RzZpRq cannot form an inter-color dominance for RzZpRq is monochromatic. It suffices to show that any a P ZpRq and b P RzZpRq cannot form an inter-color dominance. By assumption, we have ypaq ą ypbq. But by the definition of ZpSq, we also have xpaq ă xpbq. Thus, a and b do not dominate each other. To see the “only if” part, assume R contains no inter-color dominances. Since ZpRq is a subset of R, it also contains no inter-color dominances. Let a P ZpRq and b P RzZpRq be two points. As argued before, we have xpaq ă xpbq. If clpaq ‰ clpbq, then it is clear that ypaq ą ypbq (otherwise pa, bq forms an inter-color dominance). The only remaining case is clpaq “ clpbq. Since a P ZpRq, by the definition of ZpRq, we may find a point o P ZpRq such that xpaq ă xpoq ă xpbq and clpoq ‰ clpaq “ clpbq. If ypaq ă ypbq, then either ypaq ă ypoq or ypoq ă ypbq, i.e., either pa, oq or po, bq forms an inter-color dominance. Because R contains no inter-color dominances, we must have ypaq ą ypbq.

A.2

Proof of Lemma 6

Suppose |V | “ tv1 , . . . , vn u. We construct the colored stochastic dataset S “ pS, cl, πq as follows. Define S “ ta1 , . . . , an u where ai “ f pvi q P Rd and set clpai q “ i (so cl is injective). Let S1 , . . . , Sc be the (disjoint) subsets of S corresponding to V1 , . . . , Vc respectively, i.e., Si “ taj : vj P Vi u. Without loss of generality, we cí`1 may assume S1 , . . . , Sc are all nonempty. For all points a P Si , we define πpaq “Ť4ń (note that this real c number can be represented in polynomial length). Then for all points a P Szp i“1 Si q, we define πpaq “ 21 . With S constructed above, we already have GS – G, since f is a DPE and all the points in S have distinct colors. It suffices to show how to “recover” Ind Φ pGq from ΓS . Equivalently, we have to compute, for every c-tuple φ “ pn1 , . . . , nc q of integers where 0 ď ni ď |Si |, the number of the subsets A Ď S containing no inter-color dominances and satisfying |A X Si | “ ni for all i P t1, . . . , cu (we use Aφ to denote the collection of these subsets). For each c-tuple φ “ pn1 , . . . , nc q with 0 ď ni ď |Si |, we notice that any A P Aφ occurs as a realization of S with probability Pφ “

1 2n´m

c ˆ ź i“1

˙ ni ˆ 1´ ncí`1 1

4

1 4ncí`1

˙|Si |ńi ,

řc śc where m “ i“1 |Si |. Set N “ i“1 p|Si | ` 1q, then we have in total N c-tuples φ1 , . . . , φN (of integers) to be considered (N is polynomial in n as c is constant). Suppose φ1 , . . . , φN are already sorted in lexicographical order from small to large. Our first key observation is that Pφi ą 2n Pφi`1 for all i P t1, . . . , N ´ 1u. To see this, assume φi “ pn1 , . . . , nc q and φi`1 “ pn11 , . . . , n1c q. Note that φ1 , . . . , φN are sorted in lexicographical order, so there exists k P t1, . . . , cu such that nj “ n1j for all j ă k and n1k “ nk ` 1. Then it is easy to see that c´k`1 c ź c´j`1 1 ´ 4ń Pφ i ě p4ń q|Sj | . Pφi`1 4ńc´k`1 j“k`1 řc If k “ c, we already have Pφi ą 2n Pφi`1 . For the case of k ă c, since j“k`1 |Sj | ď n ´ 1, the above inequality implies that c´k`1 c´k p1 ´ 4ń q ¨ 4´pn´1q¨n Pφ i ě ą 2n . Pφi`1 4ńc´k`1

15

With this observation in hand, we now consider how to compute |Aφi | for all i P t1, . . . , N u from ΓS . It is clear that N ÿ ΓS “ Pφi ¨ |Aφi |. i“1

řN

řN For j P t1, . . . , N u, we set γj “ i“j`1 Pφi ¨ |Aφi |. By the facts that Pφi ą 2n Pφi`1 and i“1 |Aφi | ď 2n , we can deduce Pφi ą γi for all i P t1, . . . , N u. Then we are ready to compute |Aφ1 |, . . . , |AφN | in order. Since Pφ1 ą γ1 , |Aφ1 | must be the greatest integer that is smaller than or equal to ΓS {Pφ1 , and hence can be immediately computed. Suppose now |Aφ1 |, . . . , |Aφm´1 | are already computed, and we consider |Aφm |. Via |Aφ1 |, . . . , |Aφm´1 | and ΓS , we may compute γm´1 . Because Pφm ě γm , |Aφm | must be the greatest integer that is smaller than or equal to γm´1 {Pφm , and hence can be computed directly. In this way, we are able to compute all |Aφ1 |, . . . , |AφN | and equivalently Ind Φ pGq (in polynomial time). The statements in the lemma follow readily.

A.3

Proof of Lemma 7

Since P is balanced, we can find an constants c ą 0 such that n ´ max ∆n ě nc for any sufficiently large n. Suppose GS “ pV Y V 1 , Eq where |V | “ n and |V 1 | “ n1 . We may write S “ ta1 , . . . , an`n1 u where a1 , . . . , an correspond to the vertices in V and an`1 , . . . , an`n1 correspond to those in V 1 . Because cl is injective (i.e., the points in S are of distinct colors), we have that a1 , . . . , an do not dominate each other, and the same holds for an`1 , . . . , an`n1 . Set N “ maxt2n ` n1 , pn1 q1{c u. Now we construct S 1 “ pS 1 , cl1 , π 1 q as follows. First, we pick a set A of N ´ pn ` n1 q points in Rd which do not dominate each other and do not form dominances with any points in S. Set S 1 “ S Y A, so S Ď S 1 and |S 1 | “ N . The points in A are used as dummy points, and can never influences ΓS 1 (since they are not involved in any dominances). With a little bit abuse of notation, we also use a1 , . . . , an`n1 to denote the non-dummy points in S 1 . We then define π 1 as π 1 paq “ πpaq for a P S and π 1 paq “ 21 for a P A. It suffices to assign colors to the points in S 1 , i.e., define the coloring function cl1 . Since we want xS 1 y to be an instance of the CSD problem with respect to P, the coloring cl1 must induce the partition ∆N of N . Suppose ∆N “ tr1 , . . . , rk u (as a multi-set) where řl řk r1 ě ¨ ¨ ¨ ě rk . Let l be the smallest integer such that i“1 ri ě n. It is easy to see that i“l`1 ri ě n1 . Indeed, if l “ 1, then we have m ÿ ri “ N ´ max ∆N ě N c ě n1 i“2

řl řk by assumption. In the case of l ą 1, we have that i“1 ri ă 2n and thus i“l`1 ri ą N ´ 2n ě n1 . This fact implies that we are able to define the coloring function cl1 with image t1, . . . , ku such that (1) there are exactly ri points in S 1 mapped to the color i by cl1 , (2) cl1 paq P t1, . . . , lu for any a P ta1 , . . . , an u, (3) cl1 paq P tl ` 1, . . . , mu for any a P tan`1 , . . . , an`n1 u. With this cl1 , we have that cl1 pai q ‰ cl1 paj q for any i P t1, . . . , nu and j P tn ` 1, . . . , n ` n1 u. Therefore, if two points ai , aj P S form an inter-color dominance in with respect to cl, then they also form an inter-color dominance with respect to cl1 , and vice versa. Since the dummy points in A can never contribute inter-color dominances, we have ΓS 1 “ ΓS , which completes the proof.

A.4

Proof of Lemma 8

Fixing p, p1 P t0, . . . , nu, we denote by I the collection of the independent sets I of G such that |I X V | “ p, |I X V 1 | “ p1 . Also, we denote by I ˚ the collection of the independent sets I ˚ of G˚ such that |I ˚ X V | “ p, |I ˚ X V 1 | “ p1 , |I ˚ X U | “ 3λp, |I ˚ X U 1 | “ 3λn ´ 3λp. It suffices to establish an one-to-one correspondence between I and I ˚ . Let I P I be an element. If e “ pv, v 1 q P E is an edge of G (where v P V and v 1 P V 1 ), we say e is of Type-1 if v P I (and hence v 1 R I), otherwise of Type-2. Recall that for each e P E, Ue (resp., Ue1 ) denotes the set of the λ vertices in U (resp., U 1 ) which are inserted to the edge e. Now let I ˚ be the set consists of the vertices in I, the vertices in Ue for all Type-1 edges e, and the vertices in Ue1 for all Type-2 edges e. Clearly, I ˚ is an independent set of G˚ . Furthermore, by the definition of I and the fact that G 16

is 3-regular, we know that G has 3p Type-1 edges and 3n ´ 3p Type-2 edges. It follows that |I ˚ X V | “ p, |I ˚ X V 1 | “ p1 , |I ˚ X U | “ 3λp, |I ˚ X U 1 | “ 3λn ´ 3λp. Thus, I ˚ P I ˚ . By mapping I to I ˚ , we obtain a map from I to I ˚ , which is obviously injective. To see it is surjective, let I ˚ P I ˚ be an element. Set I “ I ˚ X pV Y V 1 q. We claim that I P I and I is mapped to I ˚ by our map defined above. First, since I ˚ is an independent set of G˚ , we must have |I ˚ X pUe Y Ue1 q| ď λ for any edge e “ pv, v 1 q P E of G (with equality only if at least one of v and v 1 is in I). But |I ˚ X pU Y U 1 q| “ 3λn “ λ|E|, which implies |I ˚ X pUe Y Ue1 q| “ λ for all e P E. It follows that for every edge e “ pv, v 1 q P E, v and v 1 are not included in I simultaneously, i.e., I is an independent set of G. In addition, |I X V | “ |I ˚ X V | “ p, |I X V 1 | “ |I ˚ X V 1 | “ p1 . Therefore, I P I. To see I is mapped to I ˚ , we apply again the fact that |I ˚ X pUe Y Ue1 q| “ λ for any e P E. Based on this, we further observe that for any e P E, either Ue Ď I ˚ or Ue1 Ď I ˚ (since I ˚ is an independent set of G˚ ). As before, we say an edge e “ pv, v 1 q P E (with v P V and v 1 P V 1 ) is of Type-1 if v P I, otherwise of Type-2. Note that if an edge e P E is of Type-1, we must have Ue Ď I ˚ (and then I ˚ X Ue1 “ H). Since G has 3p Type-1 edges, |I ˚ X U | ě 3λp. But in fact |I ˚ X U | “ 3λp as I ˚ P I ˚ . So the only possibility is that Ue Ď I ˚ (and I ˚ X Ue1 “ H) for all Type-1 edges e and Ue1 Ď I ˚ (and I ˚ X Ue “ H) for all Type-2 edges e. As a result, I is mapped to I ˚ and |I| “ |I ˚ |, completing the proof.

A.5

Proof of Lemma 10

The “if” part is obvious, because maxpAq clearly dominates every point in A for any (finite) A Ă Rd with |A| ě 2 (note that |Aw1 | ě 2 for any w1 P V 1 Y U 1 ). It suffices to prove the “only if” part. For a point p P R3 , we denote by Hp the set of the points on the plane H which are dominated by p. We first observe that if Hp ‰ H, then the preimage ψ ´1 pHp q of Hp under ψ (which is a region in R2 ) must be a (closed) right-angled isosceles triangle in R2 whose hypotenuse is horizontal (we call this kind of triangles standard triangles). To see this, assume p “ pxp , yp , zp q and Hp ‰ H (this is equivalent to saying xp ` yp ` zp ą 0). Then ψ ´1 pHp q consists of all the points px, yq P R2 satisfying x ` y ď xp , y ´ x ď yp , y ě ´zp {2, and hence is a standard triangle. Furthermore, it is easy to see that if p “ maxpAq for a finite set A Ă H with |A| ě 2, then Hp ‰ H and ψ ´1 pHp q is the minimal standard triangle containing ψ ´1 pAq (by “minimal” we mean that any standard triangle containing ψ ´1 pAq is a superset of ψ ´1 pHp q, both as subsets of R2 , see Figure 6). Therefore, we only need to show that for any vertex w1 P V 1 Y U 1 , the minimal standard triangle containing ψ ´1 pAw1 q “ ϕpAdjw1 q does not contain ϕpwq for any vertex w P pV Y U qzAdjw1 . We consider two cases, w1 P V 1 and w1 P U 1 .

Figure 6: The minimal standard triangle in R2 containing a set of points.

s ri

ri−1

Figure 7: The case that s is horizontal.

17

ri−1 s ri

s s0

Figure 8: The case that s is vertical. In the case of w1 P V 1 , Adjw1 consists of three vertices (for G is 3-regular) in U , say w1 , w2 , w3 . Recall that g is the OGD of G used in constructing the map ϕ. By retrospecting our construction of ϕ, we see that each of ϕpw1 q, ϕpw2 q, ϕpw3 q has distance 0.01 from gpw1 q. On the other hand, one can easily verify that for any vertex w P pV Y U qzAdjw1 , ϕpwq is “far away” from gpw1 q (more precisely, with distance at least 0.3). Therefore, the minimal standard triangle containing ϕpw1 q, ϕpw2 q, ϕpw3 q does not contain ϕpwq for any vertex w P pV Y U qzAdjw1 . In the case of w1 P U 1 , we may assume w1 P Ue1 for some edge e “ pv, v 1 q P E of G. Then Adjw1 consists of two vertices in tvu Y Ue , say w1 , w2 . Recall that Pe is the set of the λ points chosen on the curve gpeq for sake of defining ϕpuq for u P Ue . As before, we suppose Pe “ tr1 , . . . , rλ u where r1 , . . . , rλ are sorted in the order they appear on the curve gpeq (from gpvq to gpv 1 q). For convenience, set r0 “ gpvq. Then we may assume ϕpw1 q “ ri´1 and ϕpw2 q “ ri for some i P t1, . . . , λu. Let s “ ri´1 ri be the segment in R2 with endpoints ri´1 and ri , and 4 be the minimal standard triangle containing ri´1 and ri . Since all the grid points on gpeq are included in Pe , s must be a horizontal or vertical segment contained in gpeq. Furthermore, the interior of s does not contain ϕpwq for any vertex w P V Y U and in particular does not contain any grid points. We discuss two cases separately: s is horizontal and s is vertical. Recall that K “ pZ ˆ Rq Y pR ˆ Zq Ă R2 is the grid. If s is horizontal, then 4 is just the standard triangle having s as its hypotenuse (see Figure 7). In this case, we have 4 X K “ s, which implies that 4 does not contain ϕpwq for any vertex w P pV Y U qztw1 , w2 u. For the case that s is vertical, assume that ri´1 is the top endpoint and ri is the bottom one. Then ri´1 is the right-angled vertex of 4, and ri is the midpoint of the hypotenuse of 4. If ri is not a grid point, we again have 4 X K “ s and thus we are done (see the left part of Figure 8). If ri is a grid point, the distance between ri´1 and ri must be 0.3, by our construction of Pe . In this situation, 4 X K consists of s and a horizontal segment s1 of length 0.6 which is the hypotenuse of 4 (see the right part of Figure 8). We claim that ϕpwq is not on s1 for any vertex w P pV Y U qztw2 u. Indeed, by our construction of ϕ, if ϕpwq is in the interior of some unit horizontal segment, then ϕpwq is either with distance 0.01 from gpv 1 q for some v 1 P V 1 or with distance at least 0.4 from any grid point. In each of the cases, ϕpwq is “far away” from ri (more precisely, with distance at least 0.4). But any point on s1 has distance at most 0.3 from ri . Therefore, ϕpwq is not on s1 . It immediately follows that 4 does not contain ϕpwq for any vertex w P pV Y U qztw1 , w2 u, which completes the proof.

A.6

Proof of Theorem 12

Suppose n “ |V Y V 1 |. Let h : V Ñ t1, . . . , ku be a semi-discrete k-halfcoloring of G (on V ). We show dimpGq ď 2k by explicitly constructing a DPE f : V Y V 1 Ñ R2k of G. For i P t1, . . . , ku, we define Vi “ h´1 ptiuq Ď V (i.e., Vi consists of the vertices in V colored with color i by h) and define Gi as the subgraph of G with the vertex set Vi Y V 1 . We first construct k functions f1 , . . . , fk : V Y V 1 Ñ R2 , and then obtain the DPE f by identifying R2k with pR2 qk and “combining” the functions f1 , . . . , fk , i.e., setting f pvq “ pf1 pvq, . . . , fk pvqq for all v P V Y V 1 . Fixing p P t1, . . . , ku, we describe the construction of fp . Suppose the graph Gp consists of m connected components. For each i P t1, . . . , mu, let Ci be the set of the vertices in the i-th connected component of Gp . Also, for each i P t1, . . . , mu, let Bi “ tpx, yq P R2 : i ´ 1 ă x ă i, m ´ i ă y ă m ´ i ` 1u 18

be an open box in R2 (see the left part of Figure 9). The function fp to be constructed maps the vertices in Ci to points in Bi as follows. Since h is semi-discrete, we know that |Ci X V | ď 2. If |Ci X V | “ 0, then Ci only contains an isolated vertex v 1 P V 1 , and we set fp pv 1 q to be an arbitrary point in Bi . If |Ci X V | “ 1, let v be the only vertex in Ci X V and suppose Ci X V 1 “ tv11 , . . . , vr1 u. In this case, we set fp pv11 q, . . . , fp pvr1 q to be a sequence of r points in Bi with increasing x-coordinates and decreasing y-coordinates, and fp pvq to be an arbitrary point in Bi dominated by all of fp pv11 q, . . . , fp pvr1 q. See the middle part of Figure 9 for an intuitive illustration for this case. If |Ci X V | “ 2, let v1 , v2 be the two vertices in Ci X V and again suppose Ci X V 1 “ tv11 , . . . , vr1 u. We may assume that the vertices in Ci X V 1 adjacent to v1 (resp., v2 ) are exactly v11 , . . . , vα1 (resp., vβ1 , . . . , vr1 ) for some α, β P t1, . . . , ru with α ě β (if not, one can easily relabel the points to achieve this). Again, we set fp pv11 q, . . . , fp pvr1 q to be a sequence of r points in Bi with increasing x-coordinates and decreasing y-coordinates. Then we set fp pv1 q to be a point in Bi which is dominated by exactly fp pv11 q, . . . , fp pvα1 q, and set fp pv2 q to be a point in Bi which is dominated by exactly fp pvβ1 q, . . . , fp pvr1 q. Note that we can definitely find such two points, since fp pv11 q, . . . , fp pvr1 q have increasing x-coordinates and decreasing y-coordinates. In addition, by carefully determining the locations of fp pv1 q and fp pv2 q in Bi , we may further require that fp pv1 q and fp pv2 q do not dominate each other. See the right part of Figure 9 for an intuitive illustration for this case. After considering all Ci , the function fp is defined for all vertices in Vp Y V 1 (which is the vertex set of Gp ). So it suffices to define fp on V zVp . For each v P V zVp , we simply set fp pvq to be an arbitrary point in the box rŃ, Ń ` 1s ˆ rŃ, Ń ` 1s for a sufficiently large integer N ą 10n (recall that n “ |V Y V 1 |), which completes the construction of fp . We observe that fp has the following properties. (1) For any v P V and w P Vp , fp pvq č fp pwq. (2) For any v 1 P V 1 , fp pv 1 q is not dominated by any point in the image of fp . (3) For any v P Vp and v 1 P V 1 , fp pv 1 q ą fp pvq iff v and v 1 are adjacent in G. We do the same thing for all p P t1, . . . , ku and obtain the functions f1 , . . . , fk . As mentioned before, we then define f : V Y V 1 Ñ R2k as f pvq “ pf1 pvq, . . . , fk pvqq. We now prove that f is a DPE of G. First, for any v P V , we claim that f pvq does not dominate any point in the image of f . Indeed, f pvq č f pv 1 q for any v 1 P V 1 , since f1 pv 1 q is not dominated by any point in the image of f1 by the property (2) above. Also, f pvq č f pwq for any w P V , since fp pvq č fp pwq for p “ hpwq by the property (1) above. Second, for any v 1 P V 1 , we have that f pv 1 q is not dominated by any point in the image of f , simply because f1 pv 1 q is not dominated by any point in the image of f1 by the property (2) above. Finally, consider two vertices v P V and v 1 P V 1 . We claim that f pv 1 q ą f pvq iff v and v 1 are adjacent in G. If v and v 1 are adjacent, then fi pv 1 q ą fi pvq for all i P t1, . . . , ku by the property (3) above, and hence f pv 1 q ą f pvq. If v and v 1 are not adjacent, then fp pv 1 q č fp pvq for p “ hpvq by the property (3) above, and hence f pv 1 q č f pvq. In sum, we have f pv 1 q ą f pvq iff v P V , v 1 P V 1 , v and v 1 are adjacent in G. Therefore, f is a DPE of G to R2k . Clearly, f can be constructed in polynomial time if the k-halfcoloring h is provided, which completes the proof of the first part of the theorem. Next, we prove the second part of the theorem. Again, let h : V Ñ t1, . . . , ku be a semi-discrete khalfcoloring of G (on V ). Suppose χh pv 1 q ă k for all v 1 P V 1 . If k “ 1, then χh pv 1 q “ 0 for all v 1 P V 1 , which implies that G has no edges and thus the statement is trivial (any constant map f : V Y V 1 Ñ R is a DPE of G). So assume k ě 2. We show dimpGq ď 2k ´ 1 by explicitly constructing a DPE f : V Y V 1 Ñ R2k´1 of fp (v10 ) fp (vβ0 )

fp (v10 ) Bi

B1 B2

Bi fp (vα0 ) fp (vr0 )

fp (vr0 ) B3 ..

. Bm

fp (v)

fp (v1 ) fp (v2 )

Figure 9: A local structure of fp in the box Bi .

19

G. In the same way as before, we define the functions f1 , . . . , fk : V Y V 1 Ñ R2 . But we need a different way 1 to define f . To this end, we first construct k ´ 1 functions f11 , . . . , fk´1 : V Y V 1 Ñ R2 based on f1 , . . . , fk as 1 follows. Fixing p P t1, . . . , k ´ 1u, we describe the construction of fp . For all v P V zVk , we set fp1 pvq “ fp pvq. For all v P Vk , we set fp1 pvq “ fk pvq ´ pn, nq, that is, if fk pvq “ px, yq P R2 then fp1 pvq “ px ´ n, y ´ nq. Now consider the vertices in V 1 . If a vertex v 1 P V 1 is “adjacent” to the color p (recall that v 1 is said to be “adjacent” to the color p if there exists v P V adjacent to v 1 with hpvq “ p), then we set fp1 pv 1 q “ fp pv 1 q, otherwise 1 fp1 pv 1 q “ fk pv 1 q ´ pn, nq. By doing this for all p P t1, . . . , k ´ 1u, we complete constructing f11 , . . . , fk´1 . 1 1 1 2k´2 However, if we “combine” f1 , . . . , fk´1 , we only obtain a map V Y V Ñ R which is not guaranteed to be a DPE. So the last ingredient needed for defining f is a function ρ : V Y V 1 Ñ R. The definition of ρ is quite simple. We set ρpvq “ 1 for all v P V zVk , and ρpvq “ 3 for all v P Vk . For v 1 P V 1 , if v 1 is “adjacent” to the color k or χh pv 1 q “ 0, then we set ρpv 1 q “ 4, otherwise ρpv 1 q “ 2. Finally, f : V Y V 1 Ñ R2k´1 is defined 1 by identifying R2k´1 with pR2 qk´1 ˆ R and “combining” the functions f11 , . . . , fk´1 , ρ, i.e., setting 1 f pvq “ pf11 pvq, . . . , fk´1 pvq, ρpvqq

for all v P V Y V 1 . We need to verify that f is truly a DPE of G to R2k´1 . First, we show that for any v P V , f pvq does not dominate any point in the image of f . Let v P V be a vertex. We consider two cases, v P V zVk and v P Vk . In the case of v P V zVk , we first notice that f pvq č f pwq for any w P Vk Y V 1 , simply because ρpvq ă ρpwq. To see this f pvq č f pwq for any w P V zVk , set p “ hpwq ‰ k. Then fp1 pvq “ fp pvq does not dominate fp1 pwq “ fp pwq by the property (1) above, and hence f pvq č f pwq. In the case of v P Vk , we first claim that f pvq č f pwq for any w P V . If w R Vk , then by setting p “ hpwq ‰ k we have fp1 pvq “ fk pvq ´ pn, nq does not dominate fp1 pwq “ fp pwq, which implies f pvq č f pwq. If w P Vk , then f11 pvq “ fk pvq ´ pn, nq does not dominate f11 pwq “ fk pwq ´ pn, nq since fk pvq č fk pwq by the property (1) above, which also implies f pvq č f pwq. It suffices to show that f pvq č f pv 1 q for any v 1 P V 1 . Indeed, we have either f11 pv 1 q “ f1 pv 1 q or f11 pv 1 q “ fk pv 1 q ´ pn, nq. In each case, f11 pvq “ fk pvq ´ pn, nq does not dominate f11 pv 1 q (the former case is obvious and the latter case follows from the property (2) above). Thus f pvq č f pv 1 q. Second, we show that for any v 1 P V 1 , f pv 1 q is not dominated by any point in the image of f . Let v 1 P V 1 be a vertex. By the argument above, it suffices to verify that f pw1 q č f pv 1 q for any w1 P V 1 . If v 1 is “adjacent” to some color p P t1, . . . , k ´ 1u, then we are done because fp1 pv 1 q “ fp pv 1 q is not dominated by fp1 pw1 q for any w1 P V 1 . Suppose v 1 is not “adjacent” to any color in t1, . . . , k ´ 1u. In this case, we must have ρpv 1 q “ 4 and fi1 pv 1 q “ fk pv 1 q ´ pn, nq for all i P t1, . . . , k ´ 1u. We first notice that f pw1 q č f pv 1 q for any w1 P V 1 such that χh pw1 q ą 0 and w1 is not “adjacent” to the color k, simply because ρpw1 q “ 2 ă ρpv 1 q. Then we consider the case that w1 P V 1 is “adjacent” to the color k or χh pw1 q “ 0. By the assumption χh pw1 q ă k, we know that w1 cannot be “adjacent” to all the k colors. In other words, if w1 is “adjacent” to the color k or χh pw1 q “ 0, w1 must miss some color in t1, . . . , k ´ 1u. Without loss of generality, we may assume w1 is not “adjacent” to the color 1. Thus, f11 pv 1 q “ fk pv 1 q ´ pn, nq is not dominated by f11 pw1 q “ fk pw1 q ´ pn, nq by the property (2) above, and hence f pw1 q č f pv 1 q. Finally, we show that for any v P V and v 1 P V 1 , f pv 1 q ą f pvq iff v and v 1 are adjacent in G. Let v P V and 1 v P V 1 be two vertices. If v and v 1 are adjacent in G, one can easily verify (by checking various cases) that ρpv 1 q ą ρpvq and fi1 pv 1 q dominates fi1 pvq for all i P t1, . . . , k ´ 1u, which implies f pv 1 q ą f pvq. Now suppose v and v 1 are not adjacent in G. We consider two cases, v P V zVk and v P Vk . In the case of v P V zVk , set p “ hpvq ‰ k. Then fp1 pvq “ fp pvq. Besides, we have either fp1 pv 1 q “ fp pv 1 q or fp1 pv 1 q “ fk pv 1 q ´ pn, nq. For the former, fp1 pv 1 q č fp1 pvq follows from the property (3) above, while for the latter fp1 pv 1 q č fp1 pvq follows obviously. Thus, f pv 1 q č f pvq. In the case of v P Vk , we have fi1 pvq “ fk pvq ´ pn, nq for all i P t1, . . . , ku and ρpvq “ 3. If v 1 is not “adjacent” to the color k and χh pv 1 q ą 0, then ρpv 1 q “ 2 ă ρpvq and hence f pv 1 q č f pvq. If v 1 is “adjacent” to the color k or χh pv 1 q “ 0, then as argued before v 1 must miss some color in t1, . . . , k ´1u. Without loss of generality, we may assume w1 is not “adjacent” to the color 1. Thus, f11 pv 1 q “ fk pv 1 q ´ pn, nq does not dominate f11 pvq “ fk pvq ´ pn, nq by the property (3) above, which implies f pv 1 q č f pvq. In sum, two vertices in G share a common edge iff their images under f form a dominance. Therefore, f is a DPE of G to R2k´1 . It is clear that the construction of f can be done in polynomial time if the k-halfcoloring h is provided. 20

A.7

Proof of Lemma 13

Let G “ pV Y V 1 , Eq be a 3-regular planar bipartite graph. As before, we define the graph G1 “ pV, E 1 q by setting E 1 “ tpa, bq : a „ b in Gu. Then a discrete k-halfcoloring of G on V corresponds to a (conventional) k-coloring of G1 satisfying that no two adjacent vertices share the same color. We first show that G1 is planar. Fix a planar drawing ϕ of G. Let v 1 P V 1 be a vertex. Since G is 3-regular, v 1 must be adjacent to three vertices v1 , v2 , v3 P V . We now delete v 1 as well as its three adjacent edges from G and add three new edges pv1 , v2 q, pv2 , v3 q, pv3 , v1 q to G. We claim that the resulting graph is still planar. Indeed, in the drawing ϕ, after we remove ϕpv 1 q and its adjacent edges, ϕpv1 q, ϕpv2 q, ϕpv3 q will share a common face, which is the one previously containing ϕpvq. So we can draw the edges pv1 , v2 q, pv2 , v3 q, pv3 , v1 q inside this face along with the image of the deleted edges (see Figure 10). In this way, we keep deleting the vertices in V 1 (as well as the adjacent edges) and adding new edges. In this process, the planarity of the graph always keeps. Until all the vertices in V 1 are deleted, the resulting graph, which is still planar, is nothing but G1 , as two vertices u, v P V are connected (in the resulting graph) iff u „ v in G. By applying the well-known Four Color Theorem, we know that G1 is 4-colorable. Furthermore, to find a 4-coloring for G1 can be done in quadratic time using the approach in [12]. As a result, a discrete 4-halfcoloring of G can be computed in polynomial time, completing the proof. v1

v1

v0 v2

v2 v3

v3

Figure 10: Deleting a vertex and adding three new edges.

A.8

Proof of Theorem 14

We show that for any i, j P t1, . . . , nu with i ă j, PrrEi,j s ¨ |Est i,j ´ Cond i,j | ă

ε ΛS n2

(3)

with probability 1 ´ Opeń q. As long as this is true, by using union bound, we can immediately conclude that |Λ ´ ΛS | ă εΛS with probability at least 2{3, which completes the proof. Consider a realization R of Si´1 . Clearly, the probability that R Y tai , aj u contains inter-color dominances is nothing but Cond i,j . Therefore, by Hoeffding’s inequality and the definition of Est i,j , we have that ” 2 4 ε ı Pr |Est i,j ´ Cond i,j | ě 2 ď 2e´2N ε {n “ 2e´2n . n If PrrEi,j s ď ΛS , we are done because the above already implies that Inequality 3 holds with probability 1 ´ Opeń q. So assume PrrEi,j s ą ΛS . Note that πpai q ¨ πpaj q ě PrrEi,j s, which implies πpai q ¨ πpaj q ą ΛS . We claim that Cond i,j “ 0. It suffices to show that for any realization R of Si´1 , R Y tai , aj u contains no inter-color dominances. Let ap , aq P R Y tai , aj u be two distinct points. Assume clpap q ‰ clpaq q and ap ą aq . Then we must have ΛS ě πpap q ¨ πpaq q because a realization of S does contain inter-color dominances if it includes both ap and aq . However, recall that πpa1 q ě ¨ ¨ ¨ ě πpan q. Thus, πpap q ¨ πpaq q ě πpai q ¨ πpaj q ą ΛS , which gives us a contradiction. Since Cond i,j “ 0, Est i,j is for sure 0. It follows that Inequality 3 holds with probability 1 in this case. As a result, p1 ´ εqΛS ă Λ ă p1 ` εqΛS with probability at least 2{3.

21

A.9

Proof of Lemma 15

Fixing p P t1, . . . , du, we determine the p-th coordinates of a ˆ1 , . . . , a ˆn as follows. For all i P t1, . . . , nu, define a triple φi “ pγi , σi , iq where γi is the p-th coordinate of ai and σi is the sum of the d coordinates of ai . Then we sort all φi in lexicographic order from small to large, and suppose φi1 , . . . , φin is the resulting sorted sequence. We have φi1 ă ¨ ¨ ¨ ă φin under lexicographic order, since there exist no ties. Now we simply set the p-th coordinates of a ˆ i1 , . . . , a în to be 1, . . . , n respectively. In this way, we obtain the new set Snew “ tˆ a1 , . . . , a ˆn u Ă Rd in Opn log nq time (note that d is assumed to be constant). It is clear that Snew is regular. We verify that Snew satisfies the desired property. Assume ai ąE aj . Then in each dimension, the coordinate of ai is greater than or equal to the coordinate of aj . In addition, the sum of the d coordinates of ai is greater than that of aj . Therefore, in all dimensions, the coordinates of a î are greater than the coordinates of a ˆj , i.e., a ˆ i ąE a ˆj . Assume ai čE aj . Then there exists p P t1, . . . , du such that the p-th coordinate of ai is smaller than the p-th coordinate of aj . By definition, the p-th coordinate of a î is also smaller than the p-th coordinate of a ˆj . Therefore, a ˆ i čE a ˆj .

A.10

Proof of Lemma 18

Without loss of generality, we may assume B “ E. Let rr1 : ¨ ¨ ¨ : rd s be the homogeneous coordinates of l. Since l R PC B , we may find rp and rq such that rp ą 0 and rq ă 0. Now we define r11 , . . . , rd1 P R by setting rp1 “ ´rq , rq1 “ rp , and ri1 “ 0 for any i R tp, qu. Consider the point x “ rr11 : ¨ ¨ ¨ : rd1 s P Pd´1 . Note that rp1 and rq1 are nonzero so that x is well-defined. Since r11 , . . . , rd1 are nonnegative, we have x P PC B . řd Furthermore, we know that angpl, xq “ π2 , because i“1 ri ri1 “ 0.

A.11

Proof of Lemma 19

Without loss of generality, we may assume B “ E. Let rr1 : ¨ ¨ ¨ : rd s be the homogeneous coordinates of x řd such that i“1 ri2 “ 1. Since x P PC B , the coordinates can be chosen such that r1 , . . . , rd are nonnegative. Consider the point y “ rr11 : ¨ ¨ ¨ : rd1 s P Pd´1 where ri1 “ ri ` ?εd . It is clear that y is well-defined and in PC B . Set θ “ angpx, yq. To see θ ă ε, we note that řd p i“1 ri ri1 q2 pd ´ γ 2 qε2 2 2 ? sin θ “ 1 ´ cos θ “ 1 ´ řd , “ 1 2 d ` 2ε dγ ` dε2 i“1 pri q ? řd where γ “ i“1 ri ě 1. Therefore, sin2 θ ă ε2 {p1 ` ε2 q and sin θ ă ε{ 1 ` ε2 , which implies θ ă ε. It ε ε -ball at y is contained in PC B . Equivalently, we want that angpz, yq ą 3? for suffices to show that the 3? d d ř d d d 2 any z P P zPC B . Let z “ rs1 : ¨ ¨ ¨ : sd s be a point in P zPC B and assume i“1 si “ 1. We have that ´ř ¯2 d 1 r s i i“1 i . cos2 pangpz, yqq “ řd 1 2 i“1 pri q Because z P Pd zPC B , there must exist some p, q such that sp ą 0 and sq ă 0. Since r11 , . . . , rd1 ą 0 and řd 2 i“1 si “ 1, we have that ´ř ¯2 ´ř ¯2 d d řd 1 1 1 2 2 r s r |s | ´ η2 i“1 i i i“1 i i i“1 pri q ´ η ď ď , řd ř ř d d 1 2 1 2 1 2 i“1 pri q i“1 pri q i“1 pri q where η “ mint|rp1 |, |rq1 |u. It follows that sin2 pangpz, yqq ě řd

η2

1 2 i“1 pri q

Therefore, angpz, yq ě sinpangpz, yqq ą

ε ? . 3 d

22

ě

ε2 {d ε2 ą . 2 p1 ` εq 9d

A.12

Proof of Lemma 20

By taking ε ą π2 , the statement in the theorem implies that for any ξ ą 0, one can compute m “ Op1{ξ d´1 q points l1 , . . . , lm P Pd´1 in Opmq time such that mini angpli , yq ă ξ for any y P Pd´1 . With this observation, we complete the proof by applying induction on the dimension. In P1 , the statement is quite obvious. Without loss of generality, we may assume l “ r0 : 1s. Set γ “ tε{ξu and m “ 2γ ` 1. Then one can simply take the m points rcospiξq : sinpiξqs for all i P t´γ, . . . , 0, . . . , γu as l1 , . . . , lm . The two desired properties of l1 , . . . , lm can be readily verified. Now suppose the theorem holds in Pk´1 , and we consider the case in Pk . Similarly, we may assume l “ r0 : ¨ ¨ ¨ : 0 : 1s P Pk . As argued at the beginning, our induction 1 k´1 hypothesis implies that we can compute m1 “ Op1{ξ k´1 q points l11 , . . . , lm in Opm1 q time such that 1 P P mini angpli1 , yq ă ξ{2 for any y P Pk´1 . We then use these m1 points to achieve our construction in Pk as follows. For any real number α P r0, 1q, we define the inclusion map fα : Pk´1 Ñ Pk as c  „ t , fα : rr1 : ¨ ¨ ¨ : rk s ÞÑ r1 : ¨ ¨ ¨ : rk : 1 ´ α2 řk where t “ i“1 ri2 (note that fα is well-defined). Set γ “ t2ε{ξu and m “ p2γ ` 1qm1 “ Opε{ξ k q. Also, set αi “ sinpiξ{2q for i P t´γ, . . . , 0, . . . , γu. Then we take the m points fαi plj1 q for all i P t´γ, . . . , 0, . . . , γu and all j P t1, . . . , m1 u as l1 , . . . , lm . It suffices to show that l1 , . . . , lm satisfy the two desired conditions. Clearly, for any i “ t´γ, . . . , 0, . . . , γu and j “ t1, . . . , m1 u, we have that angpl, fαi plj1 qq “ π2 ´ iξ{2 ą π2 ´ ε. řk`1 To verify the condition (2), let y “ rr1 : ¨ ¨ ¨ : rk`1 s be a point in Pk where i“1 ri2 “ 1. Suppose angpl, yq ą π2 ´ ε, so |rk`1 | ă sin ε. If rk`1 ě 0, we define p as the largest integer in t0, . . . , γu such that sinppξ{2q ď rk`1 , otherwise define p as the smallest integer in t´γ, . . . , 0u such that sinppξ{2q ě rk`1 . Set y 1 “ rr1 : ¨ ¨ ¨ : rk s P Pk´1 . Then by assumption, there exists some q P t1, . . . , m1 u such that angplq1 , y 1 q ă ξ{2. We claim that angpfαp plq1 q, yq ă ξ. Indeed, we consider the point fαp py 1 q P Pk . We have angpfαp plq1 q, fαp py 1 qq ď angplq1 , y 1 q ă ξ{2. Also, we have angpfαp py 1 q, yq “ | arcsinprk`1 q ´ pξ{2| ă ξ{2. Therefore, angpfαp plq1 q, yq ă ξ, which implies that the points l1 , . . . , lm satisfy the condition (2). The induction argument then completes the proof.

A.13

Proof of Theorem 21

To prove the result, we first determine some constants. Since P 1 is balanced, there is a constant c1 ă 1 such that N ´ max ∆1N ě N c1 for any sufficiently large N . Recall that our construction of the auxiliary point set Ψ satisfies |Ψ | “ 2M “ Opnd q where n “ |S|. So we can find a constant c2 such that |S Y Ψ | ď c2 nd . We construct the desired balanced color pattern P “ p∆1 , ∆2 , . . . q as follows. For an integer p ą 0, in order to determine ∆p , set q “ pc2 pd q2{c1 . We consider two cases, |∆1q | ě c2 pd and |∆1q | ă c2 pd . In the case of |∆1q | ě c2 pd , we define ∆p “ t1, . . . , 1u, i.e., a multi-set consisting of p 1’s. In the case of |∆1q | ă c2 pd , we p`1 define ∆p “ t p2 , p2 u if p is even and ∆p “ t p´1 2 , 2 u if p is odd. This completes the construction of P. We claim that the CSD problem in Rd with respect to P is polynomial-time reducible to the FBCSD problem in the same dimension with respect to P 1 . Let S “ pS, cl, πq be a colored stochastic dataset in Rd such that xSy is an instance of the CSD problem with respect to P. Suppose |S| “ n and set N “ pc2 nd q2{c1 . We want to construct another colored stochastic dataset S 1 “ pS 1 , cl1 , π 1 q in Rd with |S 1 | “ N such that Λ˚S 1 “ ΛS and xS 1 y is an instance of the FBCSD problem with respect to P 1 . As before, we first construct the auxiliary point set Ψ “ tb1 , . . . , bM , b11 , . . . , b1M u. By our assumption, we have |S Y Ψ | “ n ` 2M ď c2 nd ă N . In order to have |S 1 | “ N , we then arbitrarily choose a set D of N ´ pn ` 2M q dummy points in Rd (these points can be chosen arbitrarily as we will assign them existence probabilities 0 later) and set S 1 “ S Y Ψ Y D. The existence probabilities of the points in S 1 are defined as $ & πpaq if a P S, 1 if a P Ψ, π 1 paq “ % 0 if a P D. It suffices to define the coloring cl1 of S 1 . Since we need xS 1 y to be an instance of the FBCSD problem with respect to P 1 , cl1 must induce the partition ∆1N . Besides, it should be guaranteed that cl1 paq “ clpaq 23

for all a P S and cl1 pbi q ‰ cl1 pb1i q for i P t1, . . . , M u (as observed previously, Λ˚S 1 “ ΛS as long as we have this). We consider two cases, |∆1N | ě c2 nd and |∆1N | ă c2 nd . In the case of |∆1N | ě c2 nd , we have ∆n “ t1, . . . , 1u by definition and therefore all the points in S have distinct colors (under the coloring cl). Note that |S Y Ψ | ď c2 nd ď |∆1N |. As such, one can easily find a coloring cl1 inducing ∆1N which assigns distinct colors to the points in S Y Ψ and satisfies cl1 paq “ clpaq for all a P S (note that the coloring on D is “free”, so we can easily make cl1 induces ∆1N ). This cl1 completes our reduction. In the n`1 case of |∆1N | ă c2 nd , we have that ∆n “ t n2 , n2 u if n is even and ∆n “ t n´1 2 , 2 u if n is odd. Without 1 loss of generality, we may assume that clpSq “ t1, 2u. Suppose ∆N “ tr1ř , . . . , rm u where m ă c2 nd and m d d r1 ě ¨ ¨ ¨ ě rm . We claim that r1 ě r2 ě c2 n . Indeed, if r2 ă c2 n , then i“2 ri ă mc2 nd ă pc2 nd q2 and 1{c1 d 2{c1 d 2{c1 hence N ď pN ´ r1 q ă pc2 n q , contradicting the fact that N “ pc2 n q . With this observation, we try to construct cl1 with cl1 pS 1 q “ t1, . . . , mu such that cl1 assigns color i to exactly ri points in S 1 . We define cl1 paq “ clpaq for all a P S, cl1 pbi q “ 1 and cl1 pb1i q “ 2 for i P t1, . . . , M u. Note that by doing this we do not “exhaust” the colors 1 and 2, because r1 ě r2 ě c2 nd ě |S Y Ψ |. So we can carefully determine cl1 paq for all a P D such that exactly ri points in S 1 have color i. By the defined cl1 , we completes our reduction and the proof.

A.14

Proof of Lemma 22

We first consider the “if” part. If LR “ H, then R is monochromatic and hence is good. If there exists l P LR such that θpl, l1 q ą π2 for any l1 P LR not equal to l, one can slightly rotate l clockwise to obtain l0 P P1 such that θpl0 , l1 q ą π2 for any l1 P LR . Suppose the homogeneous coordinates of l0 is rα : βs with α2 ` β 2 “ 1. Take the orthogonal basis B “ pb1 , b2 q of R2 with b1 “ pα, βq and b2 “ pβ, ´αq. Since θpl0 , l1 q ą π2 for any l1 P LR , we know that R contains no inter-color dominances with respect to B. To see the “only if” part, let R be a good realization of S. Suppose R contains no inter-color dominances with respect to some orthogonal basis B “ pb1 , b2 q of R2 (assume b2 is in the clockwise direction of b1 with angle π2 ). Let b P P1 be the point corresponding to b1 (i.e., b is the image of b1 under the obvious quotient map S 1 Ñ P1 ). If LR “ H, we are done. So assume LR ‰ H. Define l P LR as the point which minimizes θpl, bq. We claim that θpl, l1 q ą π2 for any l1 P LR not equal to l. Let l1 P LR be a point not equal to l. If θpl, l1 q ď π2 , then either θpl1 , bq ă θpl, bq or l1 P PC B (recall that PC B is the projective cone of B defined in Section 3.1). The former contradicts the definition of l while the latter contradicts the fact that R contains no inter-color dominances with respect to B.

A.15

Proof of Lemma 23

First, we observe (Observation 1 hereafter) that a realization R of S is good with witpRq “ pai˚ , aj ˚ q iff (1) ai˚ , aj ˚ P R and (2) for any ai , aj P R with clpai q ‰ clpaj q and ai ąB aj , we have maxpi, jq ď j ˚ and xb2 , ai y “ xb2 , aj y. To see the “if” part, assume R satisfies the conditions (1) and (2). Since ai˚ , aj ˚ P R, we know that ai˚ ´ aj ˚ P LR . Set l “ ai˚ ´ aj ˚ . The condition (2) guarantees that θpl, l1 q ą π2 for any l1 P LR not equal to l. Thus, by Lemma 22, R is good (but not monochromatic since ai˚ , aj ˚ P R). Furthermore, it is easy to see that the conditions (1) and (2) also guarantee witpRq “ pai˚ , aj ˚ q. To see the “only if” part, assume R is good with witpRq “ pai˚ , aj ˚ q. By the definition of witness pair, we immediately have ai˚ , aj ˚ P R. Again, set l “ ai˚ ´ aj ˚ . Let ai , aj P R be two points such that clpai q ‰ clpaj q and ai ąB aj . By the definition of B, we have θpl, l1 q ď π2 for l1 “ ai ´ aj . According to Lemma 22, it implies that l “ l1 , i.e., xb2 , ai y “ xb2 , aj y. Besides, we must have maxpi, jq ď j ˚ , otherwise pai˚ , aj ˚ q is not the witness pair of R. Second, we observe (Observation 2 hereafter) that our construction of S 1 satisfy the following property. Let i, j P t1, . . . , nu be any indices such that clpai q ‰ clpaj q, or equivalently, cl1 pa1i q ‰ cl1 pa1j q. Then we have a1i ąE a1j iff (1) ai ąB aj and (2) xb2 , ai y ą xb2 , aj y or maxpi, jq ą j ˚ . To see the “if” part, assume ai ąB aj . In this case, we have ypa1i q “ xb1 , ai y ě xb1 , aj y “ ypa1j q. If xb2 , ai y ą xb2 , aj y, then xpa1i q ě xb2 ´ δ, ai y ą xb2 , aj y ě xpa1j q so that a1i ąE a1j . If xb2 , ai y “ xb2 , aj y and maxpi, jq ą j ˚ , we also have xpa1i q “ xb2 , ai y ą xb2 , aj y “ xpa1j q (recall the general position assumption) so that a1i ąE a1j . To see the “only if” part, first assume ai čB aj . In this case, we have either ypa1i q ă ypa1j q or xpa1i q ă xpa1j q, which 24

implies a1i čE a1j . Now assume ai ąB aj , xb2 , ai y ď xb2 , aj y, and maxpi, jq ď j ˚ . Because ai ąB aj , it must be the case that xb2 , ai y “ xb2 , aj y and xb1 , ai y ą xb1 , aj y. By our construction, we have xpa1i q “ xb2 , ai y´δ. But xpa1j q “ xb2 , aj y (recall the general position assumption). Thus, a1i čE a1j . With the above two observations, we prove the equation Pr i˚ ,j ˚ “ πpai˚ q ¨ πpaj ˚ q ¨ ΓS 1 . Define a natural one-to-one correspondence µ : S Ñ S 1 as µpai q “ a1i . First, it is clear that for any subset A Ď S including ai˚ , aj ˚ , the probability that A occurs as a realization of S is equal to the product πpai˚ q ¨ πpaj ˚ q ¨ PrrµpAqs, where PrrµpAqs the probability that µpAq occurs as a realization of S 1 . Let R be a realization of S. We claim that R is good with witpRq “ pai˚ , aj ˚ q iff a1i˚ , a1j ˚ P µpRq and µpRq contains no inter-color dominances (with respect to E). To see the “if” part, assume a1i˚ , a1j ˚ P µpRq and µpRq contains no inter-color dominances with respect to E. Then ai˚ , aj ˚ P R. Let ai , aj P R be two points such that clpai q ‰ clpaj q and ai ąB aj . Since µpRq contains no inter-color dominances with respect to E, Observation 2 above implies that maxpi, jq ď j ˚ and xb2 , ai y ď xb2 , aj y (the latter further implies xb2 , ai y “ xb2 , aj y since ai ąB aj ). Thus, by Observation 1 above, R is good with witpRq “ pai˚ , aj ˚ q. To see the “only if” part, assume R is good with witpRq “ pai˚ , aj ˚ q. Then Observation 1 implies ai˚ , aj ˚ P R and hence a1i˚ , a1j ˚ P µpRq. Let ai , aj P R be two points such that clpai q ‰ clpaj q. If ai čB aj , then by Observation 2 we have a1i čE a1j . If ai ąB aj , then Observation 1 implies that maxpi, jq ď j ˚ and xb2 , ai y “ xb2 , aj y. By using Observation 2, we also have a1i čE a1j . Therefore, µpRq contains no inter-color dominances with respect to E. This argument shows that µ induces a one-to-one correspondence between the good realizations of S and the realizations of S 1 which include a1i˚ , a1j ˚ and contain no inter-color dominances (with respect to E). Note that π 1 pa1i˚ q “ π 1 pa1j ˚ q “ 1, hence a realization of S 1 for sure includes a1i˚ , a1j ˚ . As a result, we have Pr i˚ ,j ˚ “ πpai˚ q ¨ πpaj ˚ q ¨ ΓS 1 .

25

B

Solving the CSD problem for d “ 2 in Opn2 log2 nq time

In this section, we give the details of our improved algorithm using 2D range trees. Formally, the 2D range tree, T , used in this paper is built on a fixed collection of planar points and maintains the weights of these points. It supports the following three operations: • QuerypT , rq returns the sum of weights of all the points in the query range r. • UpdatepT , p, wq updates the weight of point p to w. • MultiplypT , r, δq multiples by a factor of δ the weight of every point in the range r. Note that this operation is revertible and the inverse of MultiplypT , r, δq is MultiplypT , r, 1{δq. With a careful implementation, see Appendix B.1, all three operations can run in Oplog2 nq time. Two more notations are defined. For a legal pair pi, jq, we use pi, jqŒ (resp. pi, jqÔ ) to represent the point pxpai q, ypaj qq (resp. pxpaj q, ypai qq); see Figure 11. Also, let Quadppq denote the northwest open quadrant of point p, i.e., p´8, xppqq ˆ pyppq, 8q. We now give the complete solution shown in Algorithm 1 followed by the correctness analysis. Algorithm 1 Computing ΓS in Opn2 log2 nq time. 1: procedure Compute-ΓS (S) Ź Recall S “ pS, cl, πq. 2: Sort all points in S such that xpa1 q ă ¨ ¨ ¨ ă xpan q. 3: Let T be the 2D range tree built on tpi, jqŒ : pi, jq is legalu with initial weights 0. 4: Let Tk be the 2D range tree built on tpi, jqŒ : pi, jq is legal and clpai q “ clpaj q “ ku with initial weights 0, for śnevery color k. 5: prod “ i“1 p1 ´ πpai qq 6: ΓS Ð prod 7: UpdatepT , a0 , 1q Ź This implies F p0, 0q “ 1. Also, no need to update Tclpa0 q . 8: for i Ð 1 to n do 9: prod Ð prod ¨ p1 ´ πpai qq´1 10: k Ð clpai q 11: MultiplypT , Quadpaj q, p1 ´ πpaj qq´1 q and MultiplypTk , Quadpaj q, p1 ´ πpaj qq´1 q for all j P t1, . . . , iu such that clpaj q “ k. 12: Let p`1 , . . . , ì q be a permutation of p1, . . . , iq such that ypa`1 q ă ¨ ¨ ¨ ă ypaì q. 13: for j Ð `1 to ì do 14: if pi, jq is a legal pair then Ź This implies that clpaj q “ k. 15: F pi, jq Ð QuerypT , Quadppi, jqÔ qq ´ QuerypTk , Quadppi, jqÔ q ˚ 16: F pi, jq Ð F pi, jq ¨ πi,j 17: ΓS Ð ΓS ` F pi, jq ¨ prod 18: MultiplypT , Quadpaj q, 1 ´ πpaj qq 19: MultiplypTk , Quadpaj q, 1 ´ πpaj qq 20: end if 21: end for 22: Revert all Multiply operations executed in Line 11, 18, 19. 23: UpdatepT , pi, jqŒ , F pi, jqq and UpdatepTk , pi, jqŒ , F pi, jqq for every j P t1, . . . , iu such that pair pi, jq is legal. 24: MultiplypT , p´8, xpai qq ˆ R, 1 ´ πpai qq 25: MultiplypTk , p´8, xpai qq ˆ R, 1 ´ πpai qq 26: end for 27: return ΓS 28: end procedure

26

(i, j)-

ai

aj

(i, j)&

Figure 11: Illustrating pi, jqŒ and pi, jqÔ for a legal pair pi, jq. Correctness analysis. We compute F pi, jq for each legal pair pi, jq by first enumerating i from 1 to n and then j in an order such points are visited from bottom to top; see the nested loop at Line 8 and 13. For now, assume the fact, which we prove later, that the inner j-loop correctly computes F pi, jq for all legal pairs pi, jq when i is fixed. We then have the following lemma. 1 1 1 Lemma 25 At the śbeginning of the i-th iteration (Line 10), the weight of pi , j qŒ in T , such that i ă i, is 1 1 equal to F pi , j q ¨ pPSXk p1 ´ πppqq, where k denotes the open strip pxpai1 q, xpai qq ˆ R. (See Figure 12a.)

Proof. This statement is trivially true for i “ 1 as all the weights in T are equal to zero except that F p0, 0q “ 1. Assume the statement is true for the i-th iteration, we show it also holds for the pi ` 1q-th iteration. First, we can safely consider T unchanged throughout Line 10-22 because although Line 11 and 18 modifies T , these side-effects are reverted immediately in Line 22. After the inner j-loop is done, by our early assumption, we obtain the value of F pi, jq for every legal pair pi, jq when i is fixed. These values are not currently stored in T but are needed for the next iteration. Thus, we update the weight of each pi, jqŒ P T to F pi, jq, as stated in Line 23. We also need to multiply the factor p1 ´ πpai qq to the weight of each pi1 , j 1 qŒ P T that is to the left of ai because ai will be included in the strip as we proceed from i to i ` 1. This is handled by Line 24. As such, the statement is maintained for the pi ` 1q-th iteration, which completes the proof. l With Lemma 25 in hand, we now give the proof of our aforementioned statement, as restated in Lemma 26. Lemma 26 Line 15-16 correctly computes F pi, jq. ř ˚ Proof. Recall that F pi, jq “ πi,j ¨ pi1 ,j 1 qPJi,j F pi1 , j 1 q ¨ Πi,j,i1 ,j 1 . By Lemma 25, at the beginning of the i-th ś round, the weight of each pi1 , j 1 qŒ P T , where i1 ă i, is equal to F pi1 , j 1 q¨ pPSXk p1 ´ πppqq. This product is off ś from the ideal one, F pi1 , j 1 q¨Πi,j,i1 ,j 1 , by a factor of pPS piq Xl p1 ´ πppqq, where S piq “ tp P S : clppq “ clpai qu and l denotes the box pxpai1 q, xpai qq ˆ rypaj q, ypaj 1 qs; see Figure 12b. To cancel this factor, we observe that N ź ź ź p1 ´ πppqq “ p1 ´ πppqq p1 ´ πppqq, pPS piq Xl

pPS piq X[1

pPS piq X[2

where [1 and [2 respectively denote the three-sided rectangle pxpai1 q, xpai qqˆp´8, ypaj 1 qs and pxpai1 q, xpai qqˆ p´8, ypaj qq; see Figure 12c and 12d. The former product ([1 ) is canceled in Line 11, and the latter ([2 ) is gradually accumulated back via pj ´ 1q calls of Line 18 as a`1 , . . . , a`j´1 are all below a`j . Thus, the weight of each pi1 , j 1 qŒ P T is equal to F pi1 , j 1 q ˆ Πi,j,i1 ,j 1 right before F pi, jq gets evaluated. Finally, the range query in Line 15 sums up the weight of every pi1 , j 1 qŒ P T such that pi1 , j 1 q P Ji,j . (Note that the subtraction in Line 15 is needed because QuerypT , Quadppi, jqÔ qq also counts the probabilities of those legal pairs that have the same color as clpai q.) Therefore, the value of F pi, jq is correctly computed after Line 16. l Though Lemma 25 and 26 are cross-referencing, one can easily figure out that this is not a circular reasoning and is indeed a valid proof. Also, both lemmas can directly apply to Tk ’s as we always query/update T and Tk ’s in the same way. Finally, all F pi, jq’s are computed and added up into ΓS , which completes the correctness proof of the entire algorithm. The overall runtime of Algorithm 1 is Opn2 log2 nq since there are Opn2 q range queries/updates, each of which takes Oplog2 nq time. The space occupied by T , denoted by |T |, is Opn2 log n2 q “ Opn2 log nq as there 27

are Opn2 q legal pairs. Similarly, let nk be the number of points in color k, and then Tk costs Opn2k log nk q space. Assume there are K colors in total. We have n1 `¨ ¨ ¨`nK “ n and thus |T1 |`¨ ¨ ¨`|TK | “ Opn2 log nq. The overall space complexity is Op|T | ` |T1 | ` ¨ ¨ ¨ ` |TK |q “ Opn2 log nq. k ai0

ai0

ai0

aj 0 ai

(i0 , j 0 )& aj

(i, j)&

aj 0

aj 0

(i0 , j 0 )&

ai

(i0 , j 0 )&

aj

(i, j)&

ai0

u1 ai aj

aj 0 ai

(i0 , j 0 )&

(i, j)&

aj

(i, j)& u2

(a) Points in k.

(b) Points in l.

(c) Points in [1 .

(d) Points in [2 .

Figure 12: Illustrating Lemma 26. Orange color is only used to highlight each range and does not represent the color of each point. Dashed (resp. solid) boundaries are exclusive (resp. inclusive).

B.1

Implementation details of our 2D range tree

In this section, we discuss how to implement an augmented 2D range tree, T , to dynamically support Query, Update, and Multiply in Oplog2 mq time, where m is the input size. We first describe how to implement a dynamic 1D range tree, T1D , built on the y-coordinates of a set of planar points, P , to support the three operations, where the range used in Query and Multiply is a 1D interval. The leaves, sorted by increasing y-coordinates, of T1D are points in P with initial weight equal to 0. In addition, in each internal node, u, we store two fields, sumpuq and mul puq, where the former is the sum of weights in the subtree rooted at u and the latter is the multiplication-factor that needs to be applied to all the nodes in the subtree. For simplicity we use the notion sumpuq to denote the weight of u if it is a leaf. Also, set sumpuq “ 0 and mulpuq “ 1 initially. Given a query/update range, we first identify Oplog mq canonical nodes, C, of T1D via a recursive downphase traversal. We then aggregate or modify the data in each canonical node. Finally, we refine the fields of those nodes along the path from every canonical node up to the root, as the recursion gradually terminates. In the down-phase, when a non-leaf node u is visited, we call the following Push method to revise sumpuq based on mulpuq and then push the factor further to its two children. In the up-phase, we apply the Combine method to each node to readjust the sum. Between the down and up phase, we perform one of the following three operations. • Add up sumpuq for every u P C for QuerypT1D , pq. • Update sumpuq to w for the only element u P C for UpdatepT1D , p, wq. • Multiply mul puq by a factor of δ for every u P C for MultiplypT1D , p, δq. Finally, we build our 2D range tree, T , on the x-coordinates of the given input. For each node u P T , we build an aforementioned 1D range tree w.r.t. the set of points in u. We also store at u a similar tag indicating the multiplication factor that needs to be applied to the 1D range tree stored at u as well as all u’s descendants. Given a 2D range query, we do a down-phase traversal identifying Oplog mq canonical nodes of T . For each visited node u during the traversal, we should apply the multiplication tag to the 1D tree stored at u and push it further to u’s two children. This takes Oplog mq time. Then, for every canonical node u, we spend another Oplog mq time querying the 1D range tree stored at u, as stated above. Therefore, all three operations can be done in Oplog2 mq time, and T occupies Opm log mq space.

28

Algorithm 2 Implementation details of Push and Combine. 1: procedure Push(u) Ź Only called in the down-phase. 2: sumpuq Ð sumpuq ¨ mul puq 3: if u is not a leaf then 4: mul plchild puqq Ð mul plchild puqq ¨ mulpuq 5: mul prchild puqq Ð mul prchild puqq ¨ mulpuq 6: end if 7: mul Ð 1 8: end procedure 9: procedure Combine(u) Ź Only called in the up-phase and we must have mul puq “ 1. 10: sumpuq Ð sumplchild puqq ` sumprchild puqq 11: end procedure

B.2

Handling range-multiplication/division with a factor of zero

One may notice that the implementation above contains a flaw for MultiplypT , r, δq when δ “ 0 because the inverse of this operation does not exist as 1{0 is undefined. We can overcome this issue by adding in each node a zero-counter and counting the number of zero factors separately. That is, if Multiply multiplies a factor of zero, we increment the zero-counter of each canonical node instead of modifying sum and mul fields; if Multiply divides a factor of zero, we decrement the corresponding zero-counters. Also, when a Query is triggered, we simply return zero for those canonical nodes whose zero-counter is positive. This solves the problem without increasing the runtime of all three operations.

29

C

A generalization of Lemma 6

In this section, we extend Lemma 6 to a general result revealing the hardness of stochastic geometric problems (under existential uncertainty). Many stochastic geometric problems focus on computing the probability that a realization of the given stochastic dataset has some specific property, e.g., [3, 5, 8, 18] and this paper. This kind of problems can be abstracted and generalized as follows. Let C be a category of geometric objects (e.g., points, lines, etc.), and P be a property defined on finite sets of objects in C. Definition 27 We define the P-probability-computing problem as follows. The input is a stochastic dataset S “ pS, πq where S is a set of objects in C and π : S Ñ p0, 1s is the function defining existence probabilities for the objects. The goal is to compute the probability that a realization of S has the property P. Example 1. Let C be the category of points in Rd , and P be the property that the convex-hull of the set of points (in Rd ) contains a fixed point q P Rd . In this case, the P-probability-computing problem is the convex-hull membership probability problem [3]. Example 2. Let C be the category of bichromatic points in Rd , and P be the property that the set of bichromatic points is linearly separable. In this case, the P-probability-computing problem is the stochastic linear separability problem [5, 18]. Example 3. Let C be the category of points in Rd , and P be the property that the closest-pair distance of the set of points is at most a fixed threshold `. In this case, the P-probability-computing problem is the stochastic closest-pair problem [8]. By generalizing Definition 5, we may also consider an abstract notion of the cardinality-sensitive-counting problem. Definition 28 Let c be a fixed constant. We define the P-cardinality-sensitive-counting problem as follows. The input consists of a set S of objects in C and a c-tuple pS1 , . . . , Sc q of disjoint subsets of S. The goal is to compute, for every c-tuple pn1 , . . . , nc q of integers where 0 ď ni ď |Si |, the number of the subsets S 1 Ď S which have the property P and satisfy |S 1 X Si | “ ni for all i P t1, . . . , cu. Example 4. Consider the following problem: given a set S of bichromatic (red/blue) points in Rd , compute the number of the linearly separable subsets S 1 Ď S which contain an equal number of red and blue points. This is clearly a restricted version of the P-cardinality-sensitive-counting problem, where C is the category of bichromatic points in Rd and P is the property that the set of bichromatic points is linearly separable. The following theorem, which generalizes Lemma 6, implies that the P-probability-computing problem is at least as “hard” as the P-cardinality-sensitive-counting problem. The proof is (almost) the same as that of Lemma 6 (see Appendix A.2). Theorem 29 The P-cardinality-sensitive-counting problem is polynomial-time reducible to the P-probabilitycomputing problem for any P.

30

D

Implication in order dimension theory

In Section 2.2.3, we achieved the result that dimpGq ď 7 for any 3-regular planar bipartite graph G. In this section, we establish an implication of this result in order dimension theory [15]. Let X be a finite set and ăP be a partial order on X. A set tă1 , . . . , ăt u of linear orders (or total orders) on X is said to be a realizer of ăP if t č ăP “ ăi , i“1

that is, for any x, y P X, x ăP y iff x ăi y for all i P t1, . . . , tu. The order dimension dimpăP q of ăP is defined as the least cardinality of a realizer of ăP (see for example [15]). The partial order ăP can be represented by a transitive directed graph GăP “ pX, Eq where E “ txx, yy : x ăP yu. The comparability graph HăP of ăP is defined as the underlying undirected graph of GăP , i.e., HăP “ pX, E 1 q where E 1 “ tpx, yq : x ăP yu. Our result implies the following. Corollary 30 Let pX, ăP q be a partial ordered set. If the comparability graph HăP is 3-regular planar bipartite, then dimpăP q ď 7. Proof. Suppose HăP “ pX1 Y X2 , Eq, which is 3-regular planar bipartite. We must construct a realizer of ăP of size at most 7. Without loss of generality, we may assume HăP is connected (otherwise we could work on each connected components separately). Using our result in Section 2.2.3, we have dimpHăP q ď 7, so there exists a DPE f : X1 Y X2 Ñ R7 of HăP . Let Y be the image of f . By Lemma 15, we may further assume that Y is regular in R7 (see Section 3.1 for definition). Now define a directed graph Gf “ pX, Ef q as Ef “ txx, yy : f pyq ą f pxqu. It is clear that Gf is transitive and HăP is the underlying undirected graph of Gf . Since HăP is connected, the edges in Gf must be all directed from X1 to X2 or all directed from X2 to X1 (otherwise Gf is not transitive). On the other hand, HăP is also the underlying undirected graph of GăP (defined above) and GăP is also transitive. Therefore, either Gf “ GăP or Gf and GăP are “reverses” of each other (Gf is the same as GăP except the orientations of the edges are reversed). If Gf “ GăP , we define a set tă1 , . . . , ă7 u of linear orders on X as x ăi y iff the i-th coordinate of f pxq is smaller than i-th coordinate of f pyq. If Gf and GăP are reverses of each other, we define tă1 , . . . , ă7 u as x ăi y iff the i-th coordinate of f pxq is greater than i-th coordinate of f pyq. Since Y is regular, ă1 , . . . , ă7 are truly linear Ş7 orders on X. It suffices to verify that ăP “ i“1 ăi . We only verify for the case of Gf “ GăP , the other case is similar. Suppose x ăP y. Then xx, yy is an edge of GăP and also an edge of Gf . By the definition of Gf , we have f pyq ą f pxq, which implies x ăi y for all i P t1, . . . , 7u. Suppose x ăi y for all i P t1, . . . , 7u. Then f pyq ą f pxq. Hence, xx, yy is an edge of Gf and also an edge of GăP . It follows that x ăP y. l

31