Removing Randomness in Parallel Computation ... - Semantic Scholar

1 downloads 0 Views 374KB Size Report
Removing Randomness in Parallel Computation. Without a Processor Penalty. Michael Luby. International Computer Science Institute. Berkeley, California.
Removing Randomness in Parallel Computation Without a Processor Penalty Michael Luby

International Computer Science Institute Berkeley, California Abstract

We develop some general techniques for converting randomized parallel algorithms into deterministic parallel algorithms without a blowup in the number of processors. One of the requirements for the application of these techniques is that the analysis of the randomized algorithm uses only pairwise independence. Our main new result is a parallel algorithm for coloring the vertices of an undirected graph using at most  + 1 distinct colors in such a way that no two adjacent vertices receive the same color, where  is the maximum degree of any vertex in the graph. The running time of the algorithm is O(log3 n log log n) using a linear number of processors on a concurrent read, exclusive write (CREW) parallel random access machine (PRAM).1 Our techniques also apply to several other problems, including the maximal independent set problem and the maximal matching problem. The application of the general technique to these last two problems is mostly of academic interest because parallel algorithms that use a linear number of processors which have better running times have been previously found [Israeli, Shiloach 86], [Goldberg, Spencer 87].

1 Introduction Some techniques for removing randomness from parallel algorithms are presented in the work of [Karp, Wigderson 84] and developed into more general techniques in [Luby 85]. In both cases, the techniques are used to convert a randomized NC algorithm for the maximal independent set problem into a deterministic NC algorithm. [Alon, Babai and Itai 86] apply these techniques to other problems in parallel computation. Here is the approach distilled into a couple of paragraphs. Research partially supported by NSERC of Canada operating grant A8092 and by NSF operating grant CCR-9016468 1 see [Fortune, Wyllie 78] for a description of a PRAM. 

1

Suppose a randomized algorithm has the following properties: The randomization in the algorithm is described in terms n independent random variables. The probabilistic analysis is described in terms of a bene t function mapping n-tuples of values for the random variables into R. The bene t function is de ned as the sum of auxiliary functions, where each auxiliary function depends on at most two of the n random variables. The analysis shows that the randomized algorithm is guaranteed to run fast and produce a good solution if the bene t of the chosen n-tuple is at least the average bene t for a randomly chosen n-tuple. A deterministic algorithm can be constructed from such a randomized algorithm as follows: Design a special probability space that contains a polynomial number of n-tuples such that the random variables are pairwise independent. Because the bene t function is of the form described, the average bene t in the special probability space is the same as it is when the random variables are totally independent. The deterministic algorithm executes in parallel the randomized algorithm for each n-tuple in the space. Because there is always at least one n-tuple with bene t at least average, there is at least one execution that is guaranteed to run fast and produce a good solution: this is the execution selected. One of the main drawbacks with this approach is that there is a rather large blowup in the number of processors needed to solve the problem when these techniques are applied. The number of processors used by the randomized algorithm is typically linear in the size of the input, whereas the number of processors used by the deterministic algorithm is linear multiplied by the number of n-tuples in the special probability space. In this paper we introduce techniques that can be used to convert randomized algorithms into deterministic algorithms without a processor penalty. The approach di ers from the previous approach in the following way. The deterministic algorithm performs a series of steps to \zero in" on an n-tuple that has at least average bene t. Initially every n-tuple in the probability space is a candidate. At each step of the algorithm the current candidate set is split into two equal size halves and the average bene t is computed on each half. The new candidate set is the half with the larger average bene t. The algorithm converges on a single n-tuple that has at least average bene t after a number of steps that is logarithmic in the size of the probability space. The requirement that there is a way to quickly compute the average bene t of a candidate set using a linear number of processors led to the design of a probability space that is di erent than the probability spaces described in [Karp, Wigderson 84], [Luby 85], [Alon, Babai, Itai 86]. For example, the new probability space contains O(nlog ) n-tuples, and thus the number of \zero in" steps is O(log2 n). The idea of implementing a deterministic search based on the ability to compute the average bene t is similar to ideas developed previously for sequential algorithms by [Spencer 87] and [Raghavan 88]. All of the randomized algorithms for applications developed in this paper t into the n

2

framework described in italics above. In Section 2, we develop an abstract problem and algorithm that formalizes this approach. In Section 3, we consider the application problems. Let G = (V; E ) be an undirected graph with n vertices, m edges and maximum vertex degree . The  + 1 vertex coloring problem is to color each vertex in V with one of  + 1 colors such that no two adjacent vertices receive the same color. We show that the  + 1 vertex coloring problem can be posed as an instance of the abstract problem. The resulting algorithm for the  + 1 vertex coloring problem has running time O(log3 n log log n) using O(n + m) processors. This is the rst deterministic NC algorithm using a linear number of processors for the  + 1 vertex coloring problem. We apply these techniques to the following problems as well, in each case the resulting NC algorithm uses a linear number of processors: nding a vertex partition such that at least half the edges cross the partition, a vertex indexing problem, the maximal independent set problem and the maximal matching problem. The application of these techniques to the last two problems is mostly of academic interest because NC algorithms that use a linear number of processors with better running times have been previously found [Israeli, Shiloach 86], [Goldberg, Spencer 87]. In Section 4, we conclude with some observations and suggest some open problems. [Luby 88] is a preliminary version of this paper.

1.1 Notation We use lower case Roman italicized letters for bit strings. We let xy denote the concatenation of x and y. If ~x =< x1; : : : ; x > and ~y =< y1; : : : ; y > are vectors of bit strings then we let ~xy~ denote < x1y1; : : : ; x y >. If x is a bit string, we let x be the i bit of x, and if x1; : : :; x is a collection of bit strings, we let x be the k bit of x . We use lower case Greek italicized letters for random bit strings. We say that is a random string of length n if is uniformly distributed in f0; 1g . If B is a function with domain f0; 1g and if is used as an input argument to B (e.g. B ( )) then is a random string of length n. Let k  n and let y 2 f0; 1g . If y is the input argument to B then is a random string of the appropriate length n ? k. n

n

n

n

th

i

n

ik

n

th

i

n

k

2 Abstract Description From simpli ed perspective, the problems in this paper can be viewed as instances of the following general abstract problem. Let n be a positive integer (in our applications, the

3

problem input size is polynomial in n). Let B be a function, called the bene t function,2 from f0; 1g to R. E [B ( )], called the average bene t, is the expected value of B ( ). We say that x 2 f0; 1g is a good point if B (x)  E [B ( )], i.e. B (x) is at least as large as the average bene t. The abstract problem is to nd a good point. We require certain query properties of the function B to be able to solve this problem eciently. For example, consider the case when the only type of queries allowed are sample point queries of the form n

n

\For sample point x 2 f0; 1g , what is the value of B (x)?" n

In this case, the best algorithm is the exhaustive search algorithm, which consists of nding a good point by querying at all possible 2 sample points and choosing the sample point x for which B (x) is largest. The exhaustive search algorithm takes time (2 ), which is totally unreasonable.3 In the following subsections, we describe more powerful types of queries that can substantially speed-up the search for a good point. In later sections, for each of our applications, we describe an appropriate function B and show how these more powerful types of queries can be eciently implemented. n

n

2.1 Conditional Expectation and Binary Search In this subsection we introduce a natural generalization of sample point queries, called conditional expectation queries. When these types of queries can be eciently answered, we can replace the exhaustive search algorithm with a much faster binary search algorithm.4 Let k be such that 0  k  n, let x be a bit string of length k. B has the property that, for all strings x of length strictly less than n, E [B (x )] = (E [B (x0 )] + E [B (x1 )])=2. We call this the martingale property of B because it is analogous to the property of martingales in probability theory. Suppose that we can eciently answer conditional expectation queries of the form \For string x of length at most n, what is the value of E [B (x )]?" The name bene t function is taken from [Karp, Wigderson 84]. They use the bene t function to describe their deterministic parallel algorithm for the maximal independent set problem. 3 The exhaustive search algorithm is optimal when this is the only allowable query, which can be seen as follows. Let the answer to all queries by the algorithm be 0. If the algorithm fails to query at some sample point x, then let B(x0 ) = 0 for all x0 6= x (this is consistent with all queries). If the algorithm outputs x as the good point, then let B(x) = ?1, else if the algorithm outputs some x0 6= x as the good point, then let B(x) = 1. In either case, the algorithm is incorrect. 4 [Spencer 87] and [Raghavan 88] are the rst to develop and use the techniques described in this subsection for nding fast sequential algorithms. They de ne conditional expectation queries and use the binary search algorithm described here to nd a deterministic polynomial time algorithm for the \discrepancy" and \integer packing" problems, respectively. 2

4

We can use the following binary search algorithm to nd a good point x: Initialize x 5 and repeat the following until x is of length n: If E [B (x0 )]  E [B (x1 )] then x x0 else x x1. Let all sample points with x as a pre x be the remaining sample space at some point in the search. The search for a good point narrows to the half of the remaining sample space for which the expected value of B is largest, i.e. all sample points with either x0 or x1 as a pre x. Because of the martingale property of B it follows that the expected value of B on the chosen half is at least as large as E [B (x )]. Thus, by an easy induction argument, the output x of the algorithm has the property that B (x) is at least the average bene t. The binary search algorithm makes 2n queries in total.

2.2 Approximate Conditional Expectation In some of our applications, it is not possible to eciently implement conditional expectation queries directly. Instead, we de ne approximate conditional expectation queries via a function B~ with the following properties. 1. B~ is a function from all strings of length at most n to R. For all such strings x, B~ (x) can be eciently computed. 2. For all x 2 f0; 1g , B~ (x)  B (x). n

3. For all x of length strictly less than n, B~ (x)  (B~ (x0) + B~ (x1))=2: We call the third property of B~ the submartingale property because it is analogous to the property of submartingales in probability theory. We use the following binary search algorithm: Initialize x  and repeat the following until x is of length n: If B~ (x0)  B~ (x1) then x x0 else x x1. The output x of the algorithm satis es B (x)  B~ (). It is easy to see that B~ ()  E [B ( )], and thus x is not necessarily a good point in the original sense. In our applications, we need to de ne B~ so that it satis es the above properties and so that B~ () is \approximately" the same as E [B ( )], where \approximately" is in the sense that any sample point x that satis es B (x)  B~ () (instead of B (x)  E [B ( )]) is good enough to be a solution to the problem. 5

 is the empty string.

5

2.3 Pairwise Independence and Exhaustive Search For our applications, the problem input size is polynomial in n. A straightforward implementation of the binary search algorithm using either conditional expectation or approximate conditional expectation queries asks 2n queries in sequence, and thus does not yield a parallel algorithm with running time polylog in n, even if each query can be answered in constant time. To achieve running time polylog in n, we need more structure on the bene t function.6 Whereas before it was easiest to think of the input to B as being a bit string of length n, from now on it is convenient to view the input to B as a vector of n bits, i.e.

~x =< x 2 f0; 1g : i = 1; : : : ; n > : i

We call x1; : : :; x the basic variables of B . We consider the case when B (~x) can be written as the sum of auxiliary functions, where each auxiliary function in turn is a function of at most two of the basic variables. Let ?1 = f1; : : : ; ng and let ?2 be a collection of pairs from f1; : : : ; ng, where m = j?2j. For each i 2 ?1 we de ne the auxiliary function Y (x ), and for each fi; j g 2 ?2 we de ne the auxiliary function Y (x ; x ). Let X X B (~x) = Y (x ) + Y (x ; x ): n

i

ij

i

2?1

i

i

f g2?2

i

i

j

ij

i

j

i;j

The rst idea to substantially speed-up the search for a good point is to replace the sample space f0; 1g of size 2 with another sample space of size O(n). Let l be such that 2n < 2  4n (and thus l = O(log n)) and let the new sample space be f0; 1g . By choice of l, for each i = 1; : : : ; n, we can express i as a binary string of length l, where the last bit i  1.7 For all i = 1; : : : ; n and for all z 2 f0; 1g , we de ne n

n

l

l

l

l

X (z) =

M l

i

k

=1

(i  z ); k

k

where i  z is the product of i and z , and L is addition modulo 2. X~ =< X1; : : : ; X > plays the role in the new sample space of the basic variables ~x =< x1; : : :; x > in the original sample space. Let ! be a random string of length l. It can be easily veri ed that k

k

k

k

n

n

A special case of the techniques described in the subsection are rst developed and used in [Karp, Wigderson 84]. These techniques are generalized and further developed using a di erent sample space in [Luby 85] to the case where each auxiliary function can take on a polynomial number of values. 7 We use one more bit than necessary to write i in binary and always set the last bit to 1 because later on it makes the algorithms and their analysis easier to explain. 6

6

X1(!); : : : ; X (!) are pairwise independent 8 and identically distributed uniformly in f0; 1g. Because of this, the following two key properties are satis ed, where as before ~ is a vector of n totally independent random bits. n

1. For all i 2 ?1, E [Y (X (!))] = E [Y ( )]. i

i

i

i

2. For all fi; j g 2 ?2 , E [Y (X (!); X (!))] = E [Y ( ; )]. ij

i

j

ij

i

j

From this and the fact that the expected value of a sum of random variables is equal to the sum of the expected values, it follows that E [B (X~ (!))] = E [B ( ~ )]. Thus, X~ (z) is a good point as long as B (X~ (z))  E [B (X~ (!))]. Assuming that sample point queries of the form \For sample point z 2 f0; 1g , what is B (X~ (z))?" l

can be answered eciently, we can use the exhaustive search algorithm on f0; 1g to nd z such that X~ (z) is a good point. The algorithm makes O(n) queries, which can be either executed sequentially or all in parallel in one step. l

2.4 Pairwise Independence and Binary Search So far, we have shown how to use the exhaustive search algorithm to nd a good point X~ (z) 2 f0; 1g using sample point queries in the small sample space. The next idea is to combine two of the previous ideas: use the binary search algorithm in the small sample space f0; 1g to nd a good point. B has the property that, for all strings z of length strictly less than l, E [B (X~ (z!))] = (E [B (X~ (z0!))] + E [B (X~ (z1!))])=2. As before, we call this the martingale property of B because it is analogous to the property of martingales in probability theory. Suppose that we can eciently answer conditional expectation queries of the form n

l

\For string z of length at most l, what is the value of E [B (X~ (z!))]?" We can use the following binary search algorithm to nd a good point z: When ~ is a vector of totally independent random bits, all 2n settings of the bits are equally likely. On the other hand, pairwise independent random variables only have to satisfy the much weaker property that for every pair of variables all combinations of values for the pair are equally likely, but this does not have to be true even for triples of variables. Although pairwise independence is a much weaker property than total independence, it is sucient for our applications and the weakness of the property is what makes it possible to satisfy with such a small sample space. 8

7

Initialize z  and repeat the following until z is of length l: If E [B (X~ (z0!))]  E [B (X~ (z1!))] then z z0 else z z1. For our parallel algorithms, the input for this problem is in the form of the following tables. 1. For each i 2 ?1 , Y (0) and Y (1) are stored in a table that can be accessed in constant time by a processor assigned to i. i

i

2. For each fi; j g 2 ?2, Y (0; 0), Y (0; 1), Y (1; 0) and Y (1; 1) are stored in a table that can be accessed in constant time by a processor assigned to fi; j g. ij

ij

ij

ij

Let z be a string of length at most l. The value of E [B (X~ (z!))] can be computed by the following three steps. 1. For each i 2 ?1, compute E [Y (X (z!))]. i

i

2. For each fi; j g 2 ?2 , compute E [Y (X (z!); X (z!))]. ij

i

j

3. E [B (X~ (z!))] is computed by summing the results from the rst two steps. It takes the processors O(log n) time to compute the sum in the third step, which turns out to be the most time consuming of the three steps. We need the following de nitions to describe the implementation of the rst two steps. Let i 2 ?1 be xed. We use (z) to express the inner product mod 2 of z and the binary expansion of i as follows. Let () = 0 and, for all k = 1; : : :; l, for all strings z of length k, let (z) = L =1(i  z ). Note that (z) = X (z) when k = l. Furthermore, for all strings z of length k strictly less than l and for a 2 f0; 1g, (za) = (z) L(i  a). Thus, when the string z is extended one bit in the binary search algorithm, a single processor can update the new value of (z) in constant time. We now describe how to implement the rst two steps. Let z be a xed binary string of length k. Step 1 is particularly easy to implement as follows. Consider a xed i 2 ?1. There are two cases to consider: i

i

k

i

r

r

r

i

i

i

i

k

i

1. k < l. Because k < l, the l bit of z! is equally likely to be either 0 or 1. Because of this and because i = 1, the value of X (z!) is equally likely to be 0 or 1. Thus, E [Y (X (z!))] = (Y (0) + Y (1))=2. th

l

i

i

i

i

i

2. k = l. In this case z! = z and thus E [Y (X (z!))] = Y ( (z)). i

8

i

i

i

In both cases, the processor assigned to i can compute E [Y (X (z!))] in constant time, given k and (z). Step 2 is a bit more complicated, but not much. Consider a xed fi; j g 2 ?2 . De ne i

i

i

last = maxfk0 : i 0 6= j 0 g; ij

k

k

i.e. last is the last position where the binary expansions of i and j di er. The processor assigned to fi; j g can compute last in O(l) = O(log n) time before execution of the binary search algorithm. There are three cases to consider: ij

ij

1. k < last . Because the binary expansion i and j di er at position last > k and because i = j = 1, X (z!) = a and X (z!) = b with probability 1=4 for all four possible values of a; b 2 f0; 1g. Thus, ij

ij

l

l

i

j

E [Y (X (z!); X (z!))] = (Y (0; 0) + Y (0; 1) + Y (1; 0) + Y (1; 1))=4: ij

i

j

ij

ij

ij

ij

2. last  k < l. Because the binary expansion of i and j are the same in positions k + 1 through l, if (z) = (z) then X (z!) = X (z!) and if (z) 6= (z) then X (z!) 6= X (z!). Because i = j = 1, both X (z!) and X (z!) are equally likely to be either 0 or 1 (but not independently). Thus, ij

i

i

j

j

i

l

j

l

i

i

j

j

 If (z) = (z) then E [Y (X (z!); X (z!))] = (Y (0; 0) + Y (1; 1))=2:  If (z) 6= (z) then E [Y (X (z!); X (z!))] = (Y (0; 1) + Y (1; 0))=2: i

j

ij

i

j

ij

ij

i

j

ij

i

j

ij

ij

3. k = l. In this case, E [Y (X (z!); X (z!))] = Y ( (z); (z)): ij

i

j

ij

i

j

In all three cases, the processor assigned to fi; j g can compute E [Y (X (z!); X (z!))] in constant time, given k, (z) and (z). We call the problem described and the algorithm developed in this subsection the bit pairs bene t problem and algorithm, respectively. This name is derived from the fact that each auxiliary function depends on at most a pair of bit valued variables. The running time of the bit pairs bene t algorithm is O(log2 n) using O(n + m) processors. ij

i

i

j

j

2.5 General Pairs Bene t Problem and Algorithm In the previous two subsections, the bene t function B is expressed as a sum of auxiliary functions, where each auxiliary function is in turn a function of at most two of the basic variables, where each basic variable is a bit. In this subsection we generalize the basic 9

variables from a single bit to strings of length q, where q is a positive integer.9 We call the problem and algorithm described in this subsection the general pairs bene t problem and algorithm, respectively. For i = 1; : : : ; n, let x 2 f0; 1g and let ~x =< x1; : : : ; x > be the basic variables for the general pairs bene t problem. Let ?1 and ?2 be de ned as before. De ne the auxiliary functions with respect to the basic variables analogous to before, i.e. for all i 2 ?1, Y (x ) is an auxiliary function and for all fi; j g 2 ?2, Y (x ; x ) is an auxiliary function. Finally, the bene t function is de ned as B (~x) = P 2? Y (x ) + Pf g2? Y (x ; x ). Let l and X~ be as de ned in Subsection 2.3. Let !1; : : :; ! be totally independent random strings, each of length l. The collection of n random variables X~ (!1)    X~ (! ) =< X1(!1)    X1(! ); : : :; X (!1)    X (! ) > q

i

n

i

ij

i

1

i

i

i

j

i

i;j

ij

2

i

j

q

q

q

n

n

q

are pairwise independent and identically distributed in f0; 1g . This is because, for each p = 1; : : : ; q, the vector of variables X~ (! ) =< X1(! ); : : :; X (! ) > are pairwise independent and identically distributed uniformly in f0; 1g and, since !1; : : : ; ! are totally independent, the the vectors X~ (!1); : : : ; X~ (! ) are totally independent of each other. Our goal is to nd ~x such that B (~x)  E [B (X~ (!1)    X~ (! ))].10 Let p  q and let ~x =< x 2 f0; 1g : i = 1; : : : ; n >. Let ~ be a vector of n empty strings. Our strategy is to start with ~x = ~ and to simultaneously extend each string in ~x one bit at a time using the bit pairs bene t algorithm described in the previous subsection. De ne TB (~x) = E [B (~xX~ (! +1 )X~ (! +2)    X~ (! ))]. Let ~ =< 1; : : : ; > be a vector of single bits with any probability distribution on ~ that is independent of ! +2; : : :; ! and let ~a =< a1; : : :; a > be a vector of single bits (the vector considered as a possible extension of ~x). By de nition, TB (~x~a) = E [B (~x~aX~ (! +2)    X~ (! ))] and thus q

p

p

n

p

q

q

q

p

i

p

p

q

n

p

p

q

n

q

E [TB (~x ~ )] = E [E [B (~x~aX~ (! +2)    X~ (! ))j~a = ~ ]] = E [B (~x ~ X~ (! +2)    X~ (! ))]: p

q

p

q

Substituting X~ (! +1) for ~ yields E [TB (~xX~ (! +1 )] = TB (~x). Our strategy is to initialize ~x = ~ and then repeat the following step for p = 0; : : : ; q ? 1: Find an ~a such that TB (~x~a)  E [TB (~xX~ (! +1))] = TB (~x) and then replace ~x with ~x~a. It is not hard to show by induction that the nal ~x has the property that B (~x)  E [B (X~ (!1)    X~ (! )]. p

p

p

q

In our applications, q = O(logn), and thus each basic variable can take on polynomial in n di erent values. 10As before, we can restrict attention to the case when the n strings of length q are only pairwise independent as opposed to totally independent because the expected value is the same in both cases. 9

10

We use the bit pairs bene t algorithm to nd the one bit extension ~a of ~x. First, we must compute the appropriate input tables for the bit pairs bene t algorithm. Let ~ =< 2 f0; 1g ? ?1 : i = 1; : : : ; n > be a vector of pairwise independent random strings. This requires ecient computation of the following tables given ~x =< x 2 f0; 1g : i = 1; : : :; n >: i

q

p

p

i

1. For all i 2 ?1, for all b 2 f0; 1g, compute TY (x b) = E [Y (x b )]. i

i

i

i

i

2. For all fi; j g 2 ?2 , for all b1; b2 2 f0; 1g, compute

TY (x b1; x b2) = E [Y (x b1 ; x b2 )]: ij

i

j

ij

i

i

j

j

From these quantities, it can be seen that for all ~a, X X TB (~x~a) = TY (x a ) + TY (x a ; x a ): i

2?1

i

i

ij

f g2?2

i

i

i

j

j

i;j

When these tables are used as the input tables for the bit pairs bene t problem, the output ~a of the bit pairs bene t algorithm satis es TB (~x~a)  E [TB (~xX~ (! +1))] as required. Unfortunately, in some cases it is dicult to eciently compute these tables for all ~x. Thus, we are forced to guide our search for a good point using approximating functions that satisfy the following properties. These properties are analogous to the properties described in Subsection 2.2. For all p  q and for all ~x =< x 2 f0; 1g : i = 1; : : : ; n >: p

p

i

~ (x ) is eciently computable, and for all fi; j g 2 ?2, TY ~ (x ; x ) is 1. For all i 2 ?1, TY ~ (x ; x ). ~ (x ) + Pf g2? TY ~ (~x) = P 2? TY eciently computable. De ne TB i

i

ij

i

1

i

i

i;j

2

ij

i

i

j

j

~ (x )  Y (x ); and for all fi; j g 2 ?2, TY ~ (x ; x )  2. When p = q, for all i 2 ?1, TY ~ (~x)  B (~x). Y (x ; x ). Thus, TB i

ij

i

i

i

i

ij

i

j

j

~ (x )  (TY ~ (x 0)+ TY ~ (x 1))=2, and for all fi; j g 2 ?2 , 3. When p < q, for all i 2 ?1, TY ~ (x ; x )  (TY ~ (x 0; x 0) + TY ~ (x 0; x 1) + TY ~ (x 1; x 0) + TY ~ (x 1; x 1))=4: TY ~ (~x)  E [TB ~ (~xX~ (! +1))]. Thus, TB i

ij

i

j

ij

i

j

i

i

ij

i

i

j

i

ij

i

i

j

ij

i

j

p

~ the submartingale property because it is analogous to the We call the third property of TB property of submartingales in probability theory.

General Pairs Bene t Algorithm ~x

~ 11

Repeat q times: ~ (x 0) and TY ~ (x 1). For all i 2 ?1, compute TY ~ (x 0; x 0), TY ~ (x 0; x 1), TY ~ (x 1; x 0) and For all fi; j g 2 ?2, compute TY ~ (x 1; x 1): TY Call the bit pairs bene t algorithm using as input the tables just computed. Set ~x ~x~a, where ~a =< a1; : : :; a > is the output of the bit pairs bene t algorithm. i

i

i

ij

ij

i

i

j

i

ij

i

j

ij

i

j

j

n

Let ~x be a vector of n strings, each of length p. At each execution, the bit pairs algorithm ~ (~x~a)  E [TB ~ (~xX~ (! +1 ))], and thus by Property 3, TB ~ (~x~a)  outputs an ~a such that TB ~ (~x). This, together with Property 2, implies that the nal output of the entire algorithm TB ~ (~ ). It is easy to see that TB ~ (~ )  E [B (X~ (!1)    X~ (! )], and thus ~x is satis es B (~x)  TB ~ not necessarily a good point in the original sense. In our applications, we need to de ne TB ~ (~ ) is \approximately" the same so that the above properties are satis ed and so that TB as E [B (X~ (!1)    X~ (! )], where \approximately" is in the sense that any ~x that satis es ~ (~ ) (instead of B (~x)  E [B (X~ (!1)    X~ (! )]) is good enough to be a solution B (~x)  TB to the problem. Let T and P be the running time and number of processors, respectively, to compute the input tables for the bit pairs bene t problem. The total running time is O(qT + q log2 n) and the total number of processors is O(n + m + P ). In a later subsection, using the vertex indexing algorithm as a subroutine, we show how the running time can be improved by a multiplicative factor of log n= log log n for the  + 1 vertex coloring algorithm. p

q

q

q

3 Applications In this section, we describe the problems to which we can apply the techniques developed in the abstract description. For each application problem, part of the input is an undirected graph G = (V; E ) with jV j = n and jE j = m. We assume that the vertices are named with numbers, that initially there is a processor assigned to each vertex and each edge in the graph and that the processor assigned to an edge knows the names of its endpoints. Furthermore, we assume that the indices of processors assigned to vertices are consecutive and that the indices of processors assigned to edges are consecutive. To make the presentation readable, we have suppressed many implementation details. Two typical examples of types of algorithmic details omitted are captured by the following two tasks. Consider an array of length n0 with a processor assigned to each entry in the array. 12

(1) Let the array be lled with numbers that can be compared in constant time. Sort the numbers into increasing order. By the results in [Cole 88], this can be done in O(log n0) time on a CREW PRAM using the n0 processors. (2) Let the array be lled with pairs of numbers of the form < i; j > such that all pairs with the same rst number are consecutive in the array. For all i in parallel compute S (i), where S (i) is the sum of the second numbers among pairs where the rst number is i. By the results in [Ladner, Fischer 80], this can be done in O(log n0) time on a CREW PRAM using the n0 processors. Most of the omitted algorithmic details can be expressed as special cases of these or very similar tasks. For example, computing the maximum degree of any vertex in G can be solved as follows. First, for each fi; j g 2 E , place two entries in an array, < i; 1 > and < j; 1 >. Sort the array by rst number in each pair using the algorithm for (1). Then, use the algorithm for (2) to compute for each vertex i 2 V the value of deg . i

3.1 The Vertex Partition Problem The input to the vertex partition problem is an undirected graph G = (V; E ) with n vertices and m edges. The output of the vertex partition problem is a 0/1 labeling of the vertices of G such that the endpoints of at least m=2 of the edges have di erent labels. We apply the bit pairs algorithm to solve the vertex partition problem. Let 1; : : : ; n be the names of the vertices and let ~x =< x1; : : :; x > be a 0/1 labeling of the vertices. For all i 2 ?1 , let Y (0) = Y (1) = 0, let ?2 = E , and for each fi; j g 2 E , de ne Y (0; 1) = Y (1; 0) = 1 and Y (1; 1) = Y (0; 0) = 0. Then, B (~x) = Pf g2 Y (x ; x ) is exactly the number of edges that cross the partition with respect to labeling ~x. Furthermore, it is easy to see that E [B ( ~ )] = m=2 when ~ =< 1; : : :; > is a random labeling of the vertices, and this is even true when the random bit labels are uniformly distributed in f0; 1g but only pairwise independent. Thus, since it is trivial to compute the input tables for the bit pairs problem from the input graph G, the vertex partition problem can be solved using the bit pairs algorithm in O(log2 n) time using O(n + m) processors. n

i

i

ij

ij

ij

i;j

E

ij

i

ij

j

n

3.2 The Vertex Indexing Problem In this section we de ne the vertex indexing problem and describe a parallel solution to this problem using the algorithm for the vertex partitioning problem. The input to the vertex indexing problem is ?2. Our primary interest in this problem is that after one execution of the vertex indexing algorithm on input ?2, each subsequent execution of the bit pairs bene t 13

algorithm with respect to the same ?2 has running time O(log n log log n) using O(n + m) processors. The running time of the algorithm for the vertex indexing problem is O(log2 n log log n) using O(n + m) processors. Thus, it may seem that any savings in the running time of the bit pairs bene t algorithm is more than o set by the extra running time for the vertex indexing algorithm. However, in our applications we run the bit pairs bene t algorithm on several inputs with auxiliary functions de ned on the same ?2 (although the auxiliary functions may be di erent in these di erent inputs). Let l0 = dlog me. Note that l0  2 log n. The output of the vertex indexing problem is

index 2 f0; 1g 0 : i = f1; : : : ; ng with the following property. Let LAST (a) = ffi; j g 2 ?2 : last  l0 ? ag: (The last function is de ned in Subsection 2.4.) For all a = 1; : : : ; l0 the index function satis es jLAST (a)j  m=2 . We now describe a preliminary version of the parallel algorithm for the vertex indexing problem with running time O(log3 n) using O(n + m) processors. The value of index is determined one bit at a time, in parallel for all i 2 f1; : : :; ng, starting with the last bit and working towards the rst. For each bit string z of length at most l0, V (z) is a subset of f1; : : : ; ng and E (z) is the set of fi; j g 2 ?2 such that both i and j are in V (z). For each a = 0; : : : ; l0, the set fV (z) : z 2 f0; 1g g is a partition of f1; : : :; ng. The idea is to do the following in parallel for each z of length a: split V (z) into two sets V (0z) and V (1z) such that for at least one-half of the fi; j g 2 E (z), i and j are on opposite sides of the partition. The vertex partitioning algorithm is used to do the splitting. l

i

indexi ;indexj

a

i

a

Vertex Indexing Algorithm Let V () f1; : : : ; ng, E () ?2 For a = 0; : : :; l0 ? 1 do In parallel, for all z 2 f0; 1g such that jE (z)j  1, Use the vertex partition algorithm to nd a vertex partition V (0z), V (1z) of V (z) such that at least one-half of the fi; j g 2 E (z) cross the partition E (0z) = ffi; j g 2 E (z) : i 2 V (0z) and j 2 V (0z)g E (1z) = ffi; j g 2 E (z) : i 2 V (1z) and j 2 V (1z)g enddo In parallel, for all z 2 f0; 1g 0 , for all i 2 V (z), index z. a

l

i

14

It can be veri ed that this algorithm outputs a solution to the vertex indexing problem and that it can be implemented in O(log3 n) time using O(n + m) processors. The running time of the algorithm can be improved to O(log2 n log log n) with O(n + m) processors using the following observation. For all fi; j g 2 E , we say fi; j g is active during the execution of the loop if fi; j g 2 E (z) for some z 2 f0; 1g . By the properties of the vertex partition algorithm, at the beginning of the execution of the loop there are at most m=2 active elements of E . Thus, if there are two processors for each fi; j g 2 E , then there are 2 +1 processors available for each active element of E . When we run the bit pairs bene t algorithm to solve the vertex partitioning subproblems at the a iteration of the loop, we can extend the sample point ! by a +1 bits at a time, i.e. we can try all possible 2 +1 extensions of length a + 1 of ! in parallel and take the best. Thus the total number of extensions to specify all of ! at the a iteration is log n=(a + 1). The running time per extension is still O(log n) using O(n + m) processors, and thus the total running time of the bit pairs bene t algorithms at the a iteration of the loop is O(log2 n=(a + 1)). With this improvement, the running time of the entire vertex indexing algorithm is 0 0?1 1 X O @ log2 n=(a + 1)A = O(log2 n log log n): a

a

a

th

a

th

th

l

=0

a

We now describe how the solution to the vertex indexing problem can be used to speed up the algorithm for the bit pairs bene t problem. We simply rename each i 2 f1; : : :; ng as index 1 and let the probability space be de ned on strings of length l0 + 1 (as opposed to l before). Without loss of generality, assume that l0 is a power of two. Let z be the current bit string pre x of the nal sample point, where initially z = . Before, we extended z one bit at a time for a total of O(log n) times. The new idea is to extend z one-half of the remaining bits at a time for a total of log log n times until z is of length l0, and then extend z by one more bit as before. Until z is extended by the last bit to length l0 + 1, the expected values of auxiliary variables associated with ?1 do not change value and can be ignored. We now explain how z can be extended to a string of length l0. Initially z . At a general step, let 2a be equal to l0 minus the length of z. To extend z by a bits, compute, in parallel for all extensions y 2 f0; 1g of z, the value of E [B (X~ (zy!))]. Find y0 such that E [B (X~ (zy0!))]  E [B (X~ (zy!))] for all y 2 f0; 1g . Then, set z zy0 to complete the extension. Each extension of z can be implemented in O(log n) time using O(n + m) processors as follows. Before computing E [B (X~ (zy!))] for all y 2 f0; 1g , the algorithm assigns one processor to each pair (y; fi; j g) where fi; j g 2 LAST (a). Because jLAST (a)j  m=2 , the i

a

a

a

a

15

total number of processors is O(m). The reason that each fi; j g 2 ?2 ? LAST (a) has no processors assigned to it is because, for all y 2 f0; 1g , a

E [Y (X (zy!); X (zy!))] = (Y (0; 0) + Y (0; 1) + Y (1; 0) + Y (1; 1))=4: ij

i

j

ij

ij

ij

ij

Thus, for all fi; j g 2 ?2 ? LAST (a), the contribution of Y to the bene t is the same for all y and does not need to be considered when choosing the best extension. The nal part of the algorithm, choosing the best extension, can be implemented in O(log n) time using O(m) processors. There are a total of O(log l) = O(log log n) extensions, and thus the running time of the entire algorithm is O(log n log log n) using O(n + m) processors. ij

3.3 The  + 1 Vertex Coloring Problem The input to the  + 1 vertex coloring problem is an undirected graph G = (V; E ), where jV j = n and jE j = m. For each i 2 V , let deg be the number of edges incident to i in G and let  = maxfdeg : i 2 V g. The output of the  + 1 vertex coloring problem is, i

i

color 2 f1; : : : ;  + 1g : i 2 V; i

such that for all fi; j g 2 E , color 6= color . There is an easy reduction from the  + 1 vertex coloring problem to the maximal independent set problem [Lovasz 79], and, as noted in [Luby 85], this reduction can be performed fast in parallel. The reduction together with any of the NC or RNC algorithms for the maximal independent set problem described in [Karp, Wigderson 84], [Luby 85], [Alon, Babai, Itai 86], [Goldberg, Spencer 87] imply an NC or RNC algorithm for the  + 1 vertex coloring problem. However, even if a maximal independent set algorithm (randomized or deterministic) is used to solve the maximal independent set problem that uses only a linear number of processors, this does not yield a linear processor algorithm (randomized or deterministic) for the  + 1 vertex coloring problem because the reduction produces a graph for the maximal independent set problem that contains (n  ) vertices and (m  ) edges. In the next subsection we present a randomized parallel algorithm for the  + 1 vertex coloring problem with expected running time O(log2 n) using O(n + m) processors. In the following subsection we use this randomized algorithm to express the  + 1 vertex coloring problem as a general pairs bene t problem, and then show how to apply the general pairs bene t algorithm to yield a deterministic parallel algorithm for the  + 1 vertex coloring problem with running time O(log3 n log log n) using O(n + m) processors. i

j

16

3.3.1 A Randomized Algorithm The overall structure of all of the algorithms for the +1 vertex coloring problem t into the following framework. Initially all vertices are uncolored. The algorithm runs in iterations, where at each iteration some of the remaining uncolored vertices are permanently colored by the COLOR procedure and the graph for the next iteration is the induced graph on the remaining uncolored vertices.  + 1 Vertex Coloring Algorithm

H G, In parallel, for all i 2 V , color , avail f1; : : : ; deg +1g Repeat until H is empty call COLOR(H ) V 0 fi 2 V : color = g E 0 ffi; j g 2 E : i 2 V 0 and j 2 V 0g H (V 0; E 0) In parallel, for all i 2 V 0, avail avail ? fcolor : fi; j g 2 E g i

i

i

i

i

i

j

The randomized algorithm for the  + 1 vertex coloring problem uses a randomized COLOR procedure to color on average a constant fraction of the remaining uncolored vertices in V at each execution. Colors 1; : : :;  + 1 are called the real colors and we use the empty string  to designate a null color. For all i 2 V , we initialize color  and avail f1; : : : ; deg +1g. In general, avail is the set of available colors for i excluding colors already permanently assigned to neighboring vertices of i. i

i

i

i

Randomized COLOR(H ) Procedure Let H = (V 0; E 0) In parallel, for all i 2 V 0, using statistically independent random sources with probability 21 , temp  and with probability 21 , randomly and uniformly choose temp 2 avail In parallel, for all i 2 V 0, color temp In parallel, for all fi; j g 2 E 0, if temp = temp then color  i

i

i

i

i

i

j

i

Lemma 1: For all i 2 V 0, Pr[color 6= ]  41 at the end of the execution of COLOR. Proof: Fix i 2 V 0 and let adj be the set of neighbors of i in H . Pr[color = 6 ] = Pr[color =6 jtemp =6 ] Pr[temp 6= ]: i

i

i

i

i

17

i

But, Pr[temp 6= ] = 1=2 and i

Pr[color 6= jtemp 6= ] = 1 ? Pr[color = jtemp 6= ]: X Pr[color = jtemp 6= ]  Pr[temp = temp jtemp 6= ]: i

i

i

i

i

j

But,

i

j

2

i

i

adji

1 : Pr[temp = temp jtemp 6= ]  2javail j j

i

i

i

This is because temp and temp are chosen independently, and with probability 1=2 the color chosen for temp 6= , and if temp 6= , then the color chosen for temp , given that temp 6= , is equal to the color chosen for temp with probability at most 1=javail j. (This probability is zero if the value chosen for temp is not in avail .) Thus, j

i

j

j

i

i

j

i

j

X j

2

adji

i

deg  1=2: Pr[temp = temp jtemp 6= ]  2javail j i

j

i

i

i

This last inequality is because deg  javail j. Hence, Pr[color 6= jtemp 6= ]  1=2. i

i

i

i

We point out that pairwise independence of the choices of the temporary colors is sucient for the proof of Lemma 1. This is one of the key properties that allows us to express this randomized algorithm as an instance of the general pairs bene t problem. Corollary 2: Each execution of COLOR on average colors at least 14 of the remaining uncolored vertices. From this, it can be easily shown that the expected number of iterations until H is empty is O(log n). Each iteration takes time O(log n) using O(n + m) processors. The entire algorithm has expected running time O(log2 n) using O(n + m) processors.

3.3.2 A Deterministic Algorithm In this subsection we show how the randomized COLOR procedure can be expressed as an instance of the general pairs bene t problem. This leads to a deterministic parallel algorithm for the  + 1 vertex coloring problem with running time O(log3 log log n) using O(n + m) processors. Consider an execution of the COLOR procedure with input H = (V 0; E 0). For each i 2 V 0 we make the following de nitions. Let Navail = javail j, let k be such that i

18

i

i

2

ki

?1 < 2Navail

i

 2 , and let Nlist = 2 . With these de nitions, ki

ki

i

1 < Navail  1 : 4 Nlist 2 i

i

Let list [0; : : : ; Nlist ? 1] be an array such that the rst Navail entries in list are the elements of avail in sorted order and the remaining entries in list have value . Let q = maxfk : i 2 V 0g. We note that q  dlog(+1)e +1 = O(log n). Let ~x =< x 2 f0; 1g : i 2 V 0 >. For all i 2 V 0, let list (x ) be the entry in list indexed by the rst k bits of x . Consider the execution of the randomized COLOR procedure, where for all i 2 V 0, temp is chosen to be list (x ). If ~x is a vector of pairwise independent random strings then, for and is equal to  with all i 2 V 0, temp is a random color in avail with probability probability 1 ? . Furthermore, for all i 6= j 2 V 0, temp is independent of temp . We derive a lower bound on the number of vertices permanently colored with respect to ~x in terms of a bene t function de ned in terms of the following auxiliary functions. i

i

i

i

i

i

i

q

i

i

i

i

i

i

i

i

i

i

N availi

i

N listi

N availi

i

N listi

(1) For all i 2 V 0, let

j

(

list (x ) 2 avail Y (x ) = 01 ifif list (x ) =  i

i

i

i

i

i

i

(2) For all fi; j g 2 E 0, let

(

if list (x ) = list (x ) 6=  Y (x ; x ) = 0?1otherwise i

ij

i

i

j

j

j

De ne the bene t function as

B (~x) =

X 2 0

i

Y (x ) + i

i

V

X f g2 0 i;j

Y (x ; x ): ij

i

j

E

The rst term on the right says that we add one to the bene t if vertex i is initially assigned a real color. The rst term is an overcount on the number of vertices permanently colored because two adjacent vertices may receive the same real color; the second term compensates for this by subtracting one for each such pair of vertices. We now show how to compute the input tables for the bit pairs bene t problem. Fix p such that 0  p  q. For all i 2 V 0, for all x 2 f0; 1g , we extend the de nition of list and de ne low and high as follows. p

i

i

i

i

(a) If p  k then let list (x ) be the entry in list indexed by the rst k bits of x as before i

i

i

i

i

i

Let low (x ) = high (x ) be the index of this entry in list . i

i

i

19

i

i

(b) If p < k then let list (x ) be the portion of list between indices low (x ) = x 0    0 and high (x ) = x 1    1, where low (x ) is x padded out with zeroes to a length k i

i

i

i

i

i

i

i

i

i

i

i

i

i

bit string and high (x ) is x padded out with ones to a length k bit string. i

i

i

i

Let Nlist (x ) = jlist (x )j = high (x ) ? low (x ) + 1. Let ~ =< 2 f0; 1g ? : i 2 V 0 > be a collection of pairwise independent random strings. TY (x ) = E [Y (x )] is the probability that a color in avail is chosen when a random entry is uniformly chosen from list (x ). We ~ (x ) = TY (x ). Given low (x ), high (x ) and Navail , we can compute de ne TY i

i

i

i

i

i

i

i

q

i

i

i

i

i

p

i

i

i

i

i

i

i

i

i

i

i

i

i

~ (x ) = maxf0; minfNavail ; high (x )g ? low (x ) + 1g TY Nlist (x ) i

i

i

i

i

i

i

i

i

in constant time with one processor. For all fi; j g 2 E 0, for each x ; x 2 f0; 1g , ?TY (x ; x ) = E [?Y (x ; x )] is by de nition the probability that the same real color is chosen from both list (x ) and list (x ) when a random entry is chosen uniformly from each list independently. It is not clear how this probability can be computed very quickly with just one processor, since a list could contain

(n) entries, and the straightforward method to compute this quantity involves counting the number of real colors common to both lists. Instead, we de ne below the functions inter (x ; x ), inter (x ; x ) and let inter (x ; x ) = minfinter (x ; x ); inter (x ; x )g, and ~ (x ; x ) of TY (x ; x ) in terms of inter (x ; x ). It can be then de ne an approximation TY veri ed that inter (x ; x ) is an upper bound on the number of common colors on list (x ) and list (x ). i

p

j

ij

i

j

ij

i

i

j

i

i

i

j

j

ij

ij

ij

j

i

i

i

j

j

ij

i

i

i

j

j

i

j

i

j

j

j

i

j

i

j

i

ij

i

j

j

j

j

i

i

j

 If list (x ) doesn't contain a real color or if the smallest real color in list (x ) is greater j

j

j

j

than the largest real color in list (x ) or if the largest real color in list (x ) is less than the smallest real color in list (x ) then let inter (x ; x ) = 0. i

i

j

j

j

i

i

i

i

j

 Otherwise, let min inter (x ; x ) be the smallest index of an entry in list (x ) that is a j

i

i

j

i

i

real color greater than or equal to the smallest real color in list (x ), let max inter (x ; x ) be the largest index of an entry in list (x ) that is a real color less than or equal to the largest real color in list (x ), and let j

j

i

j

j

i

i

j

inter (x ; x ) = max inter (x ; x ) ? min inter (x ; x ) + 1: j

j

i

i

j

i

j

i

j

i

i

j

 De ne inter (x ; x ) the same way with the roles of i and j reversed. i j

i

j

20

i

j

Because the colors in the lists are stored in sorted order in consecutive memory locations, all of these calculations can be made using one processor in O(log n) time using binary search. We set ~ (x ; x ) = ? inter (x ; x ) : TY Nlist (x )  Nlist (x ) ij

ij

i

i

j

j

i

i

j

j

Given low (x ), high (x ), Navail , low (x ), high (x ) and Navail , we can compute ~ (x ; x ) in O(log n) time with one processor. Thus, the input tables for the bit pairs TY bene t problem can be constructed in O(log n) time using O(n + m) processors. ~ (x ; x ), i.e. if list (x ) and It is not hard to verify that when q = p, Y (x ; x ) = TY ~ (x ; x ) = 1, whereas if they aren't the same real list (x ) are the same real color then TY ~ (x ; x ) = 0. We now must verify that the submartingale property is satis ed, color then TY i.e. i

ij

i

i

i

i

i

j

j

j

j

j

ij

j

j

j

ij

ij

i

i

i

j

ij

i

j

i

i

j

j

Lemma 3:

P

~ (x ; x )  TY ij

i

~ 2f0 1g TY

a;b

j

;

4

ij

(x a; x b) i

j

:

Proof: We show below that

X

j

2f0 1g

a;b

inter (x a; x b)  inter (x ; x ); j

i

i

j

i

i

j

;

and similarly

X

i

2f0 1g

a;b

inter (x a; x b)  inter (x ; x ): i

i

j

j

i

j

j

;

We rst show the conclusion of the proof, assuming that this is true. Because of the general inequality X X X minf a ; b g  minfa ; b g l

l

l

=1

i

i

=1

i

=1

i

i

i

i

for any sequence of real numbers a1; : : : ; a ; b1; : : :; b , it follows that l

l

inter (x ; x ) = minfinter (x ; x ); inter (x ; x )g  X X minfinter (x a; x b); inter (x a; x b)g = inter (x a; x b): j

ij

i

j

i

i

i

j

i

2f0 1g

a;b

j

j

i

j

i

i

j

j

i

;

j

2f0 1g

a;b

ij

i

j

;

From this inequality we prove the lemma as follows. The proof is omitted for the case when either list (x ) or list (x ) contains a single entry because the proof is analogous to but i

i

j

j

21

simpler than the proof for the case we consider now when both lists contain more than one entry. For all a 2 f0; 1g, Nlist (x ) = 2Nlist (x a) and Nlist (x ) = 2Nlist (x a). Thus, i

i

i

i

j

j

j

j

(x ; x ) ?TY~ (x ; x ) = Nlistinter (x )  Nlist (x ) ij

ij

i

i

a;b

ij

i

;

j

i

? inter (x a; x b) =  2f0 1g 4Nlist (x a)  Nlist (x b) X

i

j

i

j

i

j

j

P

j

~ 2f0 1g TY

a;b

;

4

j

ij

(x a; x b) i

j

and the submartingale property is satis ed. We now show that P 2f0 1g inter (x a; x b)  inter (x ; x ). The proof of the same claim with respect to j is analogous. There are three cases to consider. j

a;b

;

j

i

i

j

i

i

j

(1) inter (x ; x ) = 0. In this case, it is easy to verify that, for all a; b 2 f0; 1g, inter (x a; x b) = j

i

j

i

j

i

i

j

0. (2) max inter (x ; x ) and min inter (x ; x ) are both de ned and list (x 1) contains no real colors. The proof is similar to but easier than the proof of Case (3). j

j

i

i

j

i

i

j

j

j

(3) max inter (x ; x ) and min inter (x ; x ) are both de ned and list (x 1) contains real j

j

i

i

j

i

i

j

j

j

colors. Let max inter (x ; x 0) be the largest index of an entry in list (x ) that is a color less than or equal to the real color at the last entry in list (x 0). Let min inter (x ; x 1) be the smallest index of an entry in list (x ) that is a color greater than or equal to the real color at the rst entry in list (x 1). Because the real colors are sorted on each list and because the last color in list (x 0) is strictly less than the rst color in list (x 1), j

i

i

j

i

i

j

j

i

j

j

i

i

j

i

j

j

j

j

j

min inter (x ; x )  max inter (x ; x 0) < min inter (x ; x 1)  max inter (x ; x ): j

i

j

i

j

j

i

i

j

i

j

i

j

i

i

j

There are ve cases to consider. (a) min inter (x ; x ), max inter (x ; x 0), min inter (x ; x 1), and max inter (x ; x ) are in list (x 0). Then inter (x 0; x 0) = max inter (x ; x 0) ? min inter (x ; x ) + 1 inter (x 0; x 1) = max inter (x ; x ) ? min inter (x ; x 1) + 1 inter (x 1; x 0) = 0 inter (x 1; x 1) = 0 (b) min inter (x ; x ), max inter (x ; x 0) and min inter (x ; x 1) are in list (x 0) and max inter (x ; x ) is in list (x 1). Then j

i

i

j

i

j

j

i

i

j

i

j

j

i

j

i

i

j

j

i

i

j

j

j

i

i

i

j

i

j

i

j

i

j

i

j

i

i

j

j

i

i

i

j

j

i

i

j

j

i

j

i

j

i

j

i

j

j

i

j

i

i

j

i

i

22

i

i

j

i

i

inter (x 0; x 0) = max inter (x ; x 0) ? min inter (x ; x ) + 1 inter (x 0; x 1) = high (x 0) ? min inter (x ; x 1) + 1 inter (x 1; x 0) = 0 inter (x 1; x 1) = max inter (x ; x ) ? low (x 1) + 1 (c) min inter (x ; x ) and max inter (x ; x 0) are in list (x 0) and min inter (x ; x 1) and max inter (x ; x ) are in list (x 1). Then inter (x 0; x 0) = max inter (x ; x 0) ? min inter (x ; x ) + 1 inter (x 0; x 1) = 0 inter (x 1; x 0) = 0 inter (x 1; x 1) = max inter (x ; x ) ? min inter (x ; x 1) + 1 (d) min inter (x ; x ) is in list (x 0) and max inter (x ; x 0), min inter (x ; x 1) and max inter (x ; x ) are in list (x 1). This case is similar to Case (b). (e) min inter (x ; x ), max inter (x ; x 0), min inter (x ; x 1) and max inter (x ; x ) are in list (x 1). This case is similar to Case (a). inter (x a; x b)  inter (x ; x ). In each case, it is easily veri ed that P j

j

i

i

j

j

i

i

j

i

i

j

j

j

i

i

j

i

j

i

j

i

i

i

i

j

j

i

j

j

i

i

i

j

j

i

i

j

i

i

j

j

i

i

j

i

i

i

i

j

j

i

i

j

i

j

i

j

i

i

j

i

j

i

j

i

j

j

i

i

j

i

j

i

i

j

j

i

j

i

j

j

i

j

i

i

i

j

j

j

i

i

j

i

i

j

i

i

j

i

i

j

j

i

i

j

i

j

i

i

i

j

i

j

i

j

i

j

j

i

i

j

i

i

j

i

j

2f0 1g

a;b

;

j

i

i

j

i

i

j

The following theorem shows that using the ~x output by the general pairs bene t algo~ (~ ), to determine the the list indices used in the COLOR rithm, which satis es B (~x)  TB procedure, instead of choosing the indices randomly, is guaranteed to permanently color at least 1=8 of the remaining uncolored vertices. ~ (~ )  j 80j . Theorem 4: TB > 14 . For all fi; j g 2 E 0, Proof: For all i 2 V 0, TY~ () = V

N availi

i

N listi

~ (; )  ? minfNavail ; Navail g : TY Nlist  Nlist i

j

ij

i

Thus,

j

~ () + X TY ~ (; ) ~ (~ ) = X TY TB 2 0

i

i

ij

f g2 0

V

i;j

E

0j P 0 P ~ (; ) jV 0j P 2 0 P 2 TY j V 2 2  ?  +

4

i

V

j

adji

2

ij

4

23

i

V

j

N availj



adji N list N list i j

2

:

For each i 2 V 0, because  12 ,

N availj N listj

 12 and because deg < Navail  Nlist =2 implies that i

i

i

degi

N listi

X j

Thus,

2

adji

Navail  X 1  1 : Nlist  Nlist 2 2Nlist 4 j

i

j

j

i

adji

~ (~ )  jV j ? jV j = jV j : TB 4 8 8 0

0

0

Deterministic COLOR(H ) Procedure We rst execute the vertex indexing algorithm with input ?2 = E to speed up subsequent calls to the bit pairs bene t algorithm, which is called within the general pairs bene t algorithm. The running time of the vertex indexing algorithm is O(log2 n log log n). As previously de ned, q  dlog( + 1)e+1. In parallel, for all i 2 V 0, compute k , Navail , Nlist and list Run the general pairs bene t algorithm to generate ~x =< x 2 f0; 1g : i 2 V 0 > In parallel, for all i 2 V 0, temp list (x ) In parallel, for all i 2 V 0, color temp In parallel, for all fi; j g 2 E 0, if temp = temp then color  i

i

i

i

i

i

i

i

q

i

i

i

j

i

Because each execution of the deterministic COLOR procedure colors a fraction of at least 1=8 of the remaining vertices, the COLOR procedure is executed a total of O(log n) times. For each execution the preprocessing and postprocessing of the lists for all vertices takes time O(log n) using O(n + m) processors. The most costly step is the call to the general pairs bene t algorithm, which takes time O(q log n log log n) = O(log2 n log log n) using O(n + m) processors. Thus, the total running time of the  + 1 vertex coloring algorithm is O(log3 n log log n) using O(n + m) processors.

3.4 Maximal Independent Set Let G = (V; E ) be an undirected graph with jV j = n and jE j = m. The maximal independent set problem is to nd I  V such that no two vertices in I are adjacent and every vertex not in I is adjacent to at least one vertex in I . The set I is called a maximal independent set. In the next subsection we present a randomized algorithm for the maximal independent 24

set problem with expected running time O(log3 n) using a linear number of processors, and in the following subsection we show how to apply the general pairs bene t algorithm to yield a deterministic parallel algorithm with running time O(log5 n) using a linear number of processors. Neither of these results are new, e.g. the randomized algorithms in both [Luby 85] and [Alon, Babai, Itai 86] have expected running times O(log2 n) using a linear number of processors, and the deterministic algorithm in [Goldberg, Spencer 87] has running time O(log3 n) using a linear number of processors. The main point of this subsection is to show how the techniques developed in this paper can be applied to many problems.11 Consequently, although it is possible to develop a more ecient but more complicated deterministic algorithm for the maximal independent set problem using these techniques, for clarity of exposition we choose to present a relatively simple algorithm but not exceptionally ecient algorithm.

3.4.1 A Randomized Algorithm The overall structure of the algorithms for the maximal independent set problem t into the following framework. The algorithm runs in iterations. The current graph H is initialized to G and I is initialized to the empty set of vertices. At each iteration an independent set I 0 is found in H via a call to procedure FIND INDEP and these vertices are added to I . The graph for the next iteration is obtained by removing from H the vertices in I 0 together with all neighboring vertices N (I 0), and all edges with an endpoint at any of these vertices. It is easy to verify that I is a maximal independent set at the termination of the algorithm.12 Let adj be the set of neighbors of i in H and let deg = jadj j be the degree of i in H . i

i

i

Maximal Independent Set Algorithm H = (V 0; E 0) G = (V; E ) I ;  maxfdeg : i 2 V 0g bigdeg minf2 0 : 2 0  g smalldeg bbigdeg=2c Repeat until bigdeg = 0 call FIND INDEP (H; I 0; bigdeg) i

k

k

11It turns out that the ideas used here to develop a deterministic algorithm for the maximal independent set problem can be used to design a fast deterministic algorithm for the maximal independent set problem in a distributed computing environment [Awerbuch, Goldberg, Luby, Plotkin 89]. 12The randomized algorithm presented here is very similar to the randomized algorithm implicit in [Karp, Wigderson 84].

25

I I [ I0 V 0 fi 2 V 0 : i 2= I 0 [ N (I 0)g E 0 ffi; j g 2 E : i 2 V 0 and j 2 V 0g H (V 0; E 0)  maxfdeg : i 2 V 0g If   smalldeg then bigdeg smalldeg and smalldeg I I [V0 i

bbigdeg=2c

Let a phase of the algorithm consist of consecutive iterations where bigdeg does not change value. Note that there can be at most dlog ne + 1 phases. De ne i 2 V 0 to be a high degree vertex in an iteration if at the beginning of the iteration deg  smalldeg. The randomized algorithm for the maximal independent set problem uses a randomized FIND INDEP procedure to nd an independent set in the remaining graph such that, on average, a constant fraction of the high degree vertices are eliminated from the graph in an iteration. Because of this and because the degree of a vertex can only decrease in subsequent iterations, the expected number of iterations in a phase is O(log n), and the expected total number of iterations in the algorithm is O(log2 n). The idea behind the randomized FIND INDEP procedure is as follows. In parallel, each vertex i 2 V 0 randomly and independently tries to add itself to the independent set I 0 with probability 1=4bigdeg. The random variable temp is used to make this random choice; temp = 1 indicates i is trying to add itself to I 0, temp = 0 indicates not. The variable indep is used to indicate whether or not i is actually added to I 0, and is initialized to temp . The set of vertices i for which temp = 1 is not necessarily an independent set because there might be edges fi; j g 2 E 0 such that both temp = 1 and temp = 1. In parallel for each such edge, indep is reset to 0. Finally, all of the vertices i such that indep = 1 form the independent set I 0. The lemma below shows that for each high degree vertex i0 2 V 0, with probability at least 1=16 there is some neighbor i 2 adj 0 such that i 2 I 0, and thus i0 is not part of the graph at the next iteration. The intuition for this is that if i0 is a high degree vertex, then i0 has so many neighbors that temp = 1 for at least one neighbor i with constant probability. Furthermore, since deg  bigdeg, with constant probability temp = 0 for all neighbors j of i. Thus, with constant probability there is some neighbor i of i0 such that i 2 I 0 at the end of the iteration. i

i

i

i

i

i

i

i

j

i

i

i

i

i

Randomized FIND INDEP (H; I 0; bigdeg) Procedure Let H = (V 0; E 0) In parallel, for all i 2 V 0, with probability 4 1 , temp bigdeg

i

1 and 26

j

with probability 1 ? 4 1 , temp 0 In parallel, for all i 2 V 0, indep temp In parallel, for all fi; j g 2 E 0, if temp = temp = 1 then indep I 0 fi 2 V 0 : indep = 1g. i

bigdeg

i

i

i

j

0

i

i

Lemma 5: For all high degree i0 2 V 0, Pr[9i 2 adj 0 such that i 2 I 0]  1=16 at the end of i

the execution of FIND INDEP . Proof: Fix high degree vertex i0 2 V 0. For each i 2 adj 0 , let E 0 be the event that temp = 1 and, for all j 2 adj 0 ?fig and for all j 2 adj , temp = 0. Note that if E 0 occurs then i 2 I 0. Furthermore, for all i; j 2 adj 0 , i 6= j , E 0 and E 0 are mutually exclusive. By the rst two terms of the inclusion-exclusion formula, X Pr[E 0 ]  Pr[temp = 1] ? Pr[temp = 1 and temp = 1]: i i

i

i

i

i i

i

i i

i

j

Thus,

j

2

adji

i

i i

i j

[

adj

i

0 ?fig

j

i

1 ? 2bigdeg = 1 : Pr[E 0 ]  4bigdeg 16bigdeg2 8bigdeg i i

Consequently,

Pr[9i 2 adj 0 such that i 2 I 0]  i

X 2

i

0

adji

1: Pr[E 0 ]  16 i i

Corollary 6: Each execution of FIND INDEP on average eliminates at least

of the high degree vertices. From this it can be easily shown that the expected number of iterations in a phase is O(log n). 1 16

Each iteration takes time O(log n) using O(n + m) processors. Thus, since there are a total of O(log n) phases, the entire algorithm has expected running time O(log3 n) using O(n + m) processors. We note that the analysis only requires pairwise independence between the random variables ftemp : i 2 V 0g. As before, this is the key to developing a deterministic algorithm by expressing the problem as an instance of the general pairs bene t problem and then using the general pairs bene t algorithm. i

27

3.4.2 A Deterministic Algorithm In this subsection we show how the randomized FIND INDEP procedure can be expressed as an instance of the general pairs bene t problem. This leads to a deterministic parallel algorithm for the maximal independent set problem with running time O(log5) using O(n + m) processors. Consider an execution of the FIND INDEP procedure with input H = (V 0; E 0). Let q = log(bigdeg) + 2. We note that q = O(log n). Let high be the set of high degree vertices in H and let Nhigh = jhighj be the total number of high degree vertices in H . Let ~x =< x 2 f0; 1g : i 2 V 0 >. Consider the execution of the randomized FIND INDEP procedure, where for all i 2 V 0, ( =1 temp = 01 ifif xx 6= 1 i

q

q

i

i

q

i

If ~x is a vector of pairwise independent random strings then, for all i 2 V 0, Pr[temp = 1] = 1=4bigdeg and for all i; j 2 V 0, i 6= j , temp and temp are independent. We derive a lower bound on the number of high degree vertices eliminated with respect to x by rst de ning auxiliary functions as follows.13 For all i; j 2 V 0, i 6= j , i

i

(1)

j

(

=1 Y (x ) = 01 ifif xx 6= 1 i

i

i

i

(2)

q q

(

if x = x = 1 Y (x ; x ) = 0?1otherwise i

ij

De ne

i

j

0 X X @ X B (~x) = Y (x ) + Y (x ; x ) + 0 2high i2adji0

i

i

q

j

i

j

ij

2

i

adji

X

j

j

2

0 ?fig

1 Y (x ; x )A : ij

i

j

adji

We claim that B (~x) is a lower bound on the number of high degree vertices eliminated from the graph with respect to ~x. The proof of this claim can be easily derived from the proof of Lemma 5. Fix p such that 0  p  q. Let ~x =< x 2 f0; 1g : i 2 V 0 > and let ~ =< 2 f0; 1g ? : i 2 V 0 > be a vector of pairwise independent random strings. For all i; j 2 V 0, i 6= j , let p

i

i

q

p

The auxiliary functions described here and the corresponding bene t function B are almost identical to the pro t, cost and bene t functions used in [Karp, Wigderson 84]. 13

28

(1)

(

1 TY (x ) = E [Y (x )] = 02 ?if xif 6=x 1= 1 q

i

i

i

i

p

i

i

p

i

(2)

p

(

1 2 ? if x = x = 1 TY (x ; x ) = E [Y (x ; x )] = 0?otherwise 2(q

ij

and

i

j

ij

i

i

j

p)

0 2high i2adji0

i

i

j

j

p

j

0 X X X @ TY (x ) + TY (x ; x ) + TB (~x) = i

i

ij

2

i

X

j

adji

j

2

adji

0 ?fig

1 TY (x ; x )A : ij

i

j

We later show how to compute quickly TB (~x) for any ~x using only a linear number of processors. The following theorem shows that using the ~x output by the general pair bene t algorithm, which satis es B (~x)  TB (~ ), to x the values of the temp variables as described above is guaranteed to eliminate at least 1=16 of the high degree vertices. Theorem 7: TB (~ )  Nhigh=16. Proof: For all i; j 2 V 0, i 6= j , TY () = 4 1 and TY (; ) = 16 1 . Because deg  bigdeg for all i 2 V 0, ! X X X X 1  Nhigh : 2 bigdeg 1 = ? TB (~ )  2 16 02 02 2 0 8bigdeg 2 0 4bigdeg 16bigdeg i

ij

bigdeg

bigdeg 2

i

i

high i

adj

i

i

high i

adji

We now show how to calculate TB (~x). Computing these quantities directly by assigning a processor to each auxiliary function is problematic, because the number of processors needed is (P 0 deg2), which could be much larger than O(n + m). Thus, we need to resort to 2

i

V

i

indirect ways of calculating these quantities.14 Suppose we have computed the rst p bits of all the basic variables

~x =< x 2 f0; 1g : i 2 V 0 > : p

i

Consider the calculations that need to be made when making the one bit extension using the bit pairs bene t algorithm. We retain the notation of Subsections 2.3 and 2.4. Let l be 14Because we are using an indirect method of keeping track of the auxiliary functions in calls to the bit pairs bene t algorithm, we cannot use the vertex indexing algorithm to speed up the bit pairs bene t algorithm as we did for the  + 1 coloring problem.

29

the length of the nal sample point and let z be a bit string of length k < l for which we are computing TB (~xX~ (z!)) in the bit pairs bene t algorithm.15 Since k < l and because i = 1 for each i 2 V 0, TY (x X (z!)) = TY (x ): For each i 2 V 0, let i = i i ?1    i +1: For each pair i; j 2 V 0, i 6= j , l

i

i

i

i

l l

i

k

(1) If i 6= j then all four possible simultaneous extensions of x and x by one bit are equally i

likely. Thus,

j

TY (x X (z!); x X (z!)) = TY (x ; x ): ij

i

i

j

j

ij

i

j

(2) If i = j then it is easy to verify that last  k. 1. If (z) = 6 (z) then the next bit extension of x is guaranteed to be di erent ij

i

j

i

than the next bit extension of x and thus j

TY (x X (z!); x X (z!)) = 0: ij

i

i

j

j

2. If (z) = (z) then the next bit extension of both x and x is 1 with probability 1=2 and both are 0 with probability 1=2. Thus i

j

i

j

TY (x X (z!); x X (z!)) = 2TY (x ; x ): ij

i

i

j

j

ij

i

j

We compute all the following lists and quantities in O(log n) time using O(n + m) processors. For all i 2 V 0 we de ne the following. Let high be the set of high degree vertices adjacent to i and let Nhigh = jhigh j. Let i

i

i

active (~x) = fj 2 adj : x = 1 g p

i

i

j

and let Nactive (~x) = jactive (~x)j. Compute the following multi-set in sorted order: i

i

list (~x) = fj (z) : j 2 active (~x)g: i

j

i

For each r 2 f0; 1g ? such that there is some j 2 active (~x) with r = j and for each b 2 f0; 1g, let N (~x) be the number of copies of rb in list (~x). We compute the following from this in parallel in O(log n) time using O(n+m) processors. l

k

i

rb

i

i

15

The case when k = l can be thought of as the case when each xi is one bit longer and k = 0.

30

(1) For each i0 2 high, let

X TY (x X (z!)): TY 0 (~xX~ (z!)) = i

2

i

adji

i

0

i

i

We can calculate the following using one processor. 0 (~x) TY 0 (~xX~ (z!)) = Nactive : ? 2 (2) For each i 2 V 0, let X X TY_ (~xX~ (z!)) = TY (x X (z!); x X (z!)): i

i

q

p

i

i

ij

0 2highi j 2adji

i

i

j

j

It can be veri ed that X X X X _ ~ TY (~xX (z!)) = TY (x X (z!); x X (z!)): i

ij

0 2high i2adji0 j 2adji

2 0

i

i

i

j

j

i

V

We can calculate the following using one processor. i (z)

TY_ (~xX~ (z!)) = ?Nhigh  TY (x )  Nactive (~x) + N 2 ?(~x) ? N (3) For each i0 2 high, let X  0 (~xX~ (z!)) = TY TY (x ; x ): i

i

i

i

i

i

q

i(1? (z))

i

p

(~x) :

i

2

i

ij

0 2adji0 ?fig

i

j

adji ;j

For all r 2 f0; 1g ? as above, let Ntotal 0 (~x) = N 00(~x)(N 00(~x) ? 1) + N 01(~x)(N 01(~x) ? 1) ? 2N 00(~x)N 01(~x): We can calculate the following using O(deg 0 ) processors. P 0 0 (~x)(Nactive 0 (~x) ? 1) + 2f0 1g ? Ntotal 0 (~x) Nactive  (~xX~ (z!)) = ? : TY 22( ? ) Finally, once all of this information has been generated, calculate X  0 ~ X X TY (~xX (z!)): TB (~xX~ (z!)) = TY 0 (~xX~ (z!)) + TY_ (~xX~ (z!)) + l

k

r

r

r

r

r

r

r

i

i

i

i

i

i

i

i

i

i

i

r

q

0 2high

2 0

i

V

l

r i

k

p

i

i

i

i

;

i

0 2high

Each call to the bit pairs bene t algorithm takes O(log2 n) time, and thus each call to the general pairs bene t algorithm takes O(log3 n). This is the dominating factor in the running time for each iteration. Since there are a total of O(log2 n) iterations, the total running time of the maximal independent set algorithm is O(log5 n) using O(n + m) processors. 31

3.5 Maximal Matching Let G = (V; E ) be an undirected graph with jV j = n and jE j = m. The maximal matching problem is to nd J  E such that no two edges in J are adjacent and every edge not in J has an endpoint that is the endpoint of at least one edge in J . The set J is called a maximal matching. In the next subsection we present a randomized algorithm for the maximal matching problem with expected running time O(log3 n) using a linear number of processors, and in the following subsection we show how to apply the general pairs bene t algorithm to yield a deterministic parallel algorithm with running time O(log5 n) using a linear number of processors. Neither of these results are new, e.g. the randomized algorithm in [Israeli, Itai 84] has expected running time O(log2 n) using a linear number of processors, and the deterministic algorithm in [Israeli, Shiloach 86] has running time O(log3 n) using a linear number of processors. As with the maximal independent set subsection, the main point of this subsection is to show how the techniques developed in this paper can be applied to many problems, and thus we sacri ce eciency for simplicity.

3.5.1 A Randomized Algorithm The algorithms for the maximal matching problem are very similar to the algorithms for the maximal independent set problem. The primary di erence is that there is a basic variable for each edge in the graph instead of for each vertex. The algorithm runs in iterations. The current graph H is initialized to G and J is initialized to the empty set of edges. At each iteration a matching J 0 is found in H via a call to procedure FIND MATCH and these edges are added to J . The graph for the next iteration is obtained by removing from H the edges in J 0 together with their endpoints and all edges touching these endpoints. It is easy to verify that J is a maximal matching at the termination of the algorithm.

Maximal Matching Algorithm H = (V 0; E 0) G = (V; E ) J ;  maxfdeg : i 2 V 0g bigdeg minf2 0 : 2 0  g smalldeg bbigdeg=2c Repeat until bigdeg = 0 call FIND MATCH (H; J 0; bigdeg) J J [ J0 V 0 fi 2 V 0 : i 2= [J 0g i

k

k

32

E 0 ffi; j g 2 E : i 2 V 0 and j 2 V 0g H (V 0; E 0)  maxfdeg : i 2 V 0g If   smalldeg then bigdeg smalldeg and smalldeg i

bbigdeg=2c

Let a phase of the algorithm consist of consecutive iterations where bigdeg does not change value. Note that there can be at most dlog ne + 1 phases. De ne i 2 V 0 to be a high degree vertex in an iteration if at the beginning of the iteration deg  smalldeg. The randomized algorithm for the maximal matching problem uses a randomized FIND MATCH procedure to nd a matching in the remaining graph such that, on average, a constant fraction of the high degree vertices are eliminated from the graph in an iteration. Because of this and because the degree of a vertex can only decrease in subsequent iterations, the expected number of iterations in a phase is O(log n), and the expected total number of iterations in the algorithm is O(log2 n). The idea behind the randomized FIND MATCH procedure is as follows. In parallel, each vertex fi; j g 2 E 0 randomly and independently tries to add itself to the matching J 0 with probability 1=4bigdeg. The random variable temp is used to make this random choice; temp = 1 indicates fi; j g is trying to add itself to J 0, temp = 0 indicates not. The variable match is used to indicate whether or not fi; j g is actually added to J 0. The set of edges fi; j g for which temp = 1 is not necessarily a matching because there might be edges fi; j g; fi; j 0g 2 E 0 such that both temp = 1 and temp 0 = 1. For each i 2 V 0, the number of edges chosen adjacent to i is computed. All edges fi; j g such that temp = 1 and such that no other edges with endpoint at either i or j are chosen are added J 0. The lemma below shows that for each high degree vertex i0 2 V 0, with probability at least 1=16 there is some neighbor i 2 adj 0 such that fi0; ig 2 J 0, and thus i0 is not part of the graph at the next iteration. The intuition for this is that if i0 is a high degree vertex, then i0 has so many edges that temp 0 = 1 for at least one neighbor i of i0 with constant probability. Furthermore, since deg 0  bigdeg and deg  bigdeg, with constant probability temp = 0 for all neighbors j 6= i0 of i and temp 0 = 0 for all neighbors j 6= i of i0. Thus, with constant probability there is some neighbor i of i0 such that fi0; ig 2 J at the end of the iteration. i

ij

ij

ij

ij

ij

ij

ij

ij

i

i i

i

i

ij

i j

Randomized FIND MATCH (H; J 0; bigdeg) Procedure Let H = (V 0; E 0) In parallel, for all fi; j g 2 E 0, 1 and with probability 4 1 , temp 0 with probability 1 ? 4 1 , temp bigdeg

bigdeg

ij

ij

33

In parallel, for each fi; j g 2 E 0 such that temp = 1, place < i; j > and < j; i > into a list and sort the list by rst coordinate In parallel, for each i 2 V 0, if there is more than one entry in the list such that i is a rst coordinate then toomany 1 else toomany 0 In parallel, for each fi; j g 2 E 0, if temp = 1 and toomany = 0 and toomany = 0 then match = 1 else match = 0. 0 J ffi; j g 2 E 0 : match = 1g. ij

i

i

ij

ij

i

j

ij

ij

Lemma 8: For all high degree i0 2 V 0, Pr[9i 2 adj 0 such that fi0; ig 2 J 0]  1=16 at the i

end of the execution of FIND MATCH . Proof: Fix high degree vertex i0 2 V 0. For each i 2 adj 0 , let E 0 be the event that temp 0 = 1 and, for all j 2 adj 0 ? fig, temp 0 = 0 and, for all j 2 adj ? fi0g, temp = 0. Note that if E 0 occurs then fi0; ig 2 J 0. Furthermore, for all i; j 2 adj 0 , i 6= j , E 0 and E 0 are mutually exclusive. By the rst two terms of the inclusion-exclusion formula, X Pr[E 0 ]  Pr[temp 0 = 1] ? Pr[temp 0 = 1 and temp = 1] 0 2 ?f g X Pr[temp 0 = 1 and temp 0 = 1] ? i i

i

i j

i

i i

i

i i

ij

i i

i

i i

i i

i i

j

j

2

adj

i0 ?fig

adji

i j

ij

i

i i

i j

Thus, because deg  bigdeg for all i 2 V 0, i

Pr[E 0 ]  i i

1 ? 2bigdeg = 1 : 4bigdeg 16bigdeg2 8bigdeg

Consequently, Pr[9i 2 adj 0 such that fi0; ig 2 J 0]  i

X 2

i

adj

0

i

1: Pr[E 0 ]  16 i i

Corollary 9: Each execution of FIND MATCH on average eliminates at least 161 of the high degree vertices. From this is can be easily shown that the expected number of iterations in a phase is O(log n).

Each iteration takes time O(log n) using O(n + m) processors. Thus, since there are a total of O(log n) phases, the entire algorithm has expected running time O(log3 n) using O(n + m) processors. 34

We note that the analysis only requires pairwise independence between the random variables ftemp : i 2 V 0g. As before, this is the key to developing a deterministic algorithm by expressing the problem as an instance of the general pairs bene t problem and then using the general pairs bene t algorithm. i

3.5.2 A Deterministic Algorithm In this subsection we show how the randomized FIND MATCH procedure can be expressed as an instance of the general pairs bene t problem. This leads to a deterministic parallel algorithm for the maximal matching problem with running time O(log5) using O(n + m) processors. Consider an execution of the FIND MATCH procedure with input H = (V 0; E 0). Let q = log(bigdeg) + 2. We note that q = O(log n). Let high be the set of high degree vertices in H and let Nhigh = jhighj be the total number of high degree vertices in H . Let ~x =< x 2 f0; 1g : fi; j g 2 E 0 >. Consider the execution of the randomized FIND MATCH procedure, where for all fi; j g 2 E 0, ( 1 temp = 10 ifif xx = 6= 1 q

ij

q

ij

ij

q

ij

If ~x is a vector of pairwise independent random strings of length q, then for all fi; j g 2 E 0, Pr[temp = 1] = 1=4bigdeg and for all fi; j g; fi0; j 0g 2 E 0, fi; j g 6= fi0; j 0g, temp is independent of temp 0 0 . We derive a lower bound on the number of high degree vertices eliminated with respect to x by rst de ning auxiliary functions as follows. For all fi; j g; fi0; j 0g 2 E 0, fi; j g 6= fi0; j 0g, ij

ij

i j

(1)

(

=1 Y (x ) = 01 ifif xx 6= 1 ij

ij

ij

ij

(2) Y

0 0 (xij ; xi0 j 0 ) =

(

q q

?1 if x = x 0 0 = 1 ij

0 otherwise

ij;i j

i j

q

De ne

0 X X @ Y 0 (x 0 ) + B (~x) = i

0 2high i2adji0

i i

X

i i

j

2

adji

?f 0 g

X

Y 0 (x 0 ; x ) + i i;ij

i

i i

ij

j

35

2

0 ?fig

adji

1 Y 0 0 (x 0 ; x 0 )A : i i;i j

i i

i j

We claim that B (~x) is a lower bound on the number of high degree vertices eliminated from the graph with respect to ~x. The proof of this claim can be easily derived from the proof of Lemma 8. Fix p such that 0  p  q. Let ~x =< x 2 f0; 1g : fi; j g 2 E 0 > and let ~ =< 2 f0; 1g ? : fi; j g 2 E 0 > be a vector of pairwise independent random strings. For all fi; j g; fi0; j 0g 2 E 0, fi; j g 6= fi0; j 0g, p

ij

ij

q

p

(1)

(

1 TY (x ) = E [Y (x )] = 02 ?if xif x6= 1= 1 q

ij

ij

ij

ij

p

ij

ij

q

ij

(2) TY

0 0 (xij ; xi0 j 0 ) = E [Yij;i0 j 0 (xij ij ; xi0 j 0 i0 j 0 )] =

(

?2

if x = x 0 0 = 1 0 otherwise 1

2(q

ij;i j

q

?p)

and

0 X X @ TY 0 (x 0 ) + TB (~x) = i0 2high i2adji0

i i

i i

X j 2adji ?fi0 g

X

TY 0 (x 0 ; x ) + i i;ij

i i

ij

j

2

adji

0 ?fig

ij

i j

q

1 TY 0 0 (x 0 ; x 0 )A : i i;i j

i i

i j

We later show how to compute quickly TB (~x) for any ~x using only a linear number of processors. The following theorem shows that using the ~x output by the general pair bene t algorithm, which satis es B (~x)  TB (~ ), to x the values of the temp variables as described above is guaranteed to eliminate at least 1=16 of the high degree vertices. Theorem 10: TB (~ )  Nhigh=16. Proof: Analogous to the proof of Theorem 7. We now show how to calculate TB (~x). As for the maximal independent set problem, computing these quantities directly using one processor for each auxiliary function is problematic because the number of auxiliary variables could be much larger than O(n + m). The computation of these auxiliary variables is similar to the computation for the maximal independent set problem, but di erent enough to require explanation. Suppose we have computed the rst p bits of all the basic variables

~x =< x 2 f0; 1g : fi; j g 2 E 0 > : p

ij

Consider the calculations that need to be made when making the one bit extension using the bit pairs bene t algorithm. We retain the notation of Subsections 2.3 and 2.4. Let l be 36

the length of the nal sample point16 and let z be a bit string of length k < l for which we are computing TB (~xX~ (z!)) in the bit pairs bene t algorithm.17 Since k < l and for each fi; j g 2 E 0, the last bit ij of the index of fi; j g is equal to 1, l

TY (x X (z!)) = TY (x ): ij

ij

ij

ij

ij

For each fi; j g 2 E 0, let

ij = ij ij ?1    ij +1: For each pair fi; j g; fi0; j 0g 2 E 0, fi; j g 6= fi0; j 0g, l

l

k

(1) If ij 6= i0j 0 then all four possible simultaneous extensions of x and x 0 0 by one bit are ij

equally likely. Thus,

TY

i j

0 0 (xij Xij (z! ); xi0 j 0 Xi0 j 0 (z! )) = TYij;i0 j 0 (xij ; xi0 j 0 ):

ij;i j

(2) If ij = i0j 0 then it is easy to verify that last 0 0  k. 1. If (z) = 6 0 0 (z) then the next bit extension of x is guaranteed to be di erent ij;i j

ij

ij

i j

than the next bit extension of x 0 0 and thus i j

TY

0 0 (xij Xij (z! ); xi0 j 0 Xi0 j 0 (z! )) = 0:

ij;i j

2. If (z) = 0 0 (z) then the next bit extension of both x and x 0 0 is 1 with probability 1=2 and both are 0 with probability 1=2. Thus ij

ij

i j

TY

i j

0 0 (xij Xij (z! ); xi0j 0 Xi0 j 0 (z! )) = 2TYij;i0 j 0 (xij ; xi0 j 0 ):

ij;i j

We compute all the following lists and quantities in O(log n) time using O(n + m) processors. For all i 2 V 0 we de ne the following. Let high be the set of high degree vertices adjacent to i and let Nhigh = jhigh j. Let i

i

i

active (~x) = fj 2 adj : x = 1 g p

i

i

j

and let Nactive (~x) = jactive (~x)j. Similarly, let i

i

high active (~x) = fj 2 high : x = 1 g p

i

i

j

Here, since the basic variables correspond to edges, we let ijk be the kth bit of the index for fi; j g 2 E 0 , and therefore l is such that 2m < 2l < 4m and the last bit ijl is set to 1. 17The case when k = l can be thought of as the case when each x is one bit longer and k = 0. ij 16

37

and let Nhigh active (~x) = jhigh active (~x)j. Compute the following multi-sets in sorted order: list (~x) = fij (z) : j 2 active (~x)g and high list (~x) = fij (z) : j 2 high active (~x)g For each r 2 f0; 1g ? such that there is some j 2 active (~x) with r = ij and for each b 2 f0; 1g, let N (~x) be the number of copies of rb in list (~x). De ne Nhigh (~x) similarly with respect to high list (~x). We compute the following from this in parallel in O(log n) time using O(n+m) processors. i

l

i

i

j

i

j

i

i

k

i

rb

rb

i

i

i

i

(1) For each i0 2 high, let

X TY 0 (x 0 X 0 (z!)): TY 0 (~xX~ (z!)) = i

2

i

i i

0

i i

i i

adji

We can calculate the following using one processor. 0 (~x) TY 0 (~xX~ (z!)) = Nactive : ? 2 i

i

q

(2) For each i 2 V 0, let

X TY_ (~xX~ (z!)) =

X

i

i

0 2highi j 2adji ?fi0 g

p

TY 0 (x 0 X 0 (z!); x X (z!)): i i;ij

i i

i i

ij

ij

It can be veri ed that X X X X _ ~ TY 0 (x 0 X 0 (z!); x X (z!)): TY (~xX (z!)) = i

2 0

i

i

V

i i;ij

0 2high i2adji0 j 2adji

i i

i i

ij

ij

Let

N 0total (~x) = Nhigh 0(~x)(N 0(~x) ? 1) + Nhigh 1(~x)(N 1(~x) ? 1) ?Nhigh 0(~x)N 1(~x) ? N 0(~x)Nhigh 1(~x): r

r

r

r

r

i

i

i

i

i

r

r

r

r

i

i

i

i

We can calculate the following using O(deg ) processors. i

P 0 active ( ~ x )( Nactive ( ~ x ) ? 1) + x) ? Nhigh 2f 0 1g ? N total (~ _ ~ : TY (~xX (z!)) = ? 2( ? ) i

i

i

2

38

r

q

p

;

l

k

r

i

(3) For each i0 2 high, let

X

 0 (~xX~ (z!)) = TY i

0 2adji0 ?fig

2

i

TY 0 0 (x 0 ; x 0 ): i i;i j

i i

i j

adji ;j

For all r 2 f0; 1g ? as above, let l

k

Ntotal 0 (~x) = N 00(~x)(N 00(~x) ? 1)+ N 01(~x)(N 01(~x) ? 1) ? 2N 00(~x)N 01(~x): r

r

r

i

i

i

r

r

r

r

i

i

i

i

We can calculate the following using O(deg 0 ) processors. i

 0 (~xX~ (z!)) = ? Nactive 0(~x)(Nactive 0 (~x)2(??1)) + TY 2 i

i

i

q

P

r

p

2f0 1gl?k Ntotal 0 (~x) : r

;

i

Finally,

X X X  0 ~ TB (~xX~ (z!)) = TY 0 (~xX~ (z!)) + TY_ (~xX~ (z!)) + TY (~xX (z!)): i

i

i

i

0 2high

2 0

i

V

i

0 2high

Each call to the bit pairs bene t algorithm takes O(log2 n) time, and thus each call to the general pairs bene t algorithm takes O(log3 n). This is the dominating factor in the running time for each iteration. Since there are a total of O(log2 n) iterations, the total running time of the maximal matching algorithm is O(log5 n) using O(n + m) processors.

4 Extensions and Open Problems The following papers build on some of the ideas developed in this paper: [Berger, Rompel 89], [Motwani, Naor, Naor 89] and [Awerbuch, Goldberg, Luby, Plotkin 89]. There are several generalizations of the problems discussed in this paper about which little is known with respect to parallel algorithms. One such problem is the maximal vertex coloring problem. The input to the maximal vertex coloring problem is an undirected graph G = (V; E ) together with a set of real colors avail assigned to each i 2 V . The output is an assignment of a color to each vertex i 2 V , color 2 avail [ fg, such that no two adjacent vertices are assigned the same real color and such that for every vertex i with color = , for each real color in c 2 avail there is some j 2 adj such that color = c. The maximal independent set problem is the special case of the maximal coloring problem when avail = f1g for each i 2 V , and the +1 vertex coloring algorithm is the special case when i

i

i

i

i

i

i

39

j

avail = f1; : : : ; deg + 1g for each i 2 V . There is a straightforward fast parallel reduction from the maximal vertex coloring problem to the maximal independent set problem (see e.g. [Lovasz 79] or [Luby 85]). This reduction does not preserve linearity in the size of the problem, but it is not hard to show that it does imply that there is a fast parallel randomized algorithm for the maximal vertex coloring problem using only a linear number of processors. To achieve a linear number of processors, the reduction is not literally performed; instead the randomized algorithm for the maximal independent set is simulated (for the larger than linear size graph that results from the reduction) using only a linear number of processors. An open problem is to nd a deterministic fast parallel algorithm for the maximal vertex coloring problem using only a linear number of processors. Another interesting problem is the bipartite maximal independent set problem, which was introduced by [Gazit, Miller 88]. The input to the bipartite maximal independent set problem is a bipartite graph B = (A; B; F ), where A and B are disjoint sets of vertices and F is the set of edges between A and B . The output is I  A such that no two vertices in I have a common neighbor and such that I is maximal with respect to this property. As pointed out by [Gazit, Miller 88], the maximal independent set problem G = (V; E ) is the special case of the bipartite maximal independent set problem when A = V , B = E and i

i

F = ffv; eg : v 2 V and e 2 E and v 2 eg; and the maximal matching problem is the special case of the bipartite maximal independent set problem when A = E , B = V and F is as before. [Gazit, Miller 88] show that a modi cation of a randomized algorithm in [Luby 85] can be used to design a randomized NC algorithm for the bipartite independent set problem that uses only a linear number of processors, but there is no known deterministic NC algorithm for this problem using only a linear number of processors.

5 Acknowledgements Thanks to the group of high school students who visited the University of Toronto to hear my lecture, you inspired this work. Thanks to Ronitt Rubinfeld for pointing out the connections between this work and that of Joel Spencer, and to Prabhakar Raghavan for pointing out the relationship to his work.

References [1] Awerbuch, B., Goldberg, A., Luby, M., Plotkin, S., \Network Decomposition and Lo40

cality in Distributed Computation," Proceedings of the 30 Annual IEEE Symposium on Foundations of Computer Science, pp. 364-369, October, 1989. th

[2] Alon, N., L. Babai, A. Itai, \A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem", Journal of Algorithms, 7, pp. 567{583, 1986. [3] Berger, B., Rompel, J., \Simulating (log n)-wise Independence in NC", Proceedings of the 30 Annual IEEE Symposium on Foundations of Computer Science, pp. 1-7, October, 1989. c

th

[4] Cole, R, \Parallel merge sort", SIAM J. on Computing, vol. 17, 1988, pp. 770-785. [5] Fortune, S., Wyllie, J., \Parallelism in Random Access Machines," Proceedings of the 10 ACM Symposium on Theory of Computing, pp. 114-118, 1978. th

[6] Gazit, H, Miller, G., personal communication, 1988. [7] Goldberg, M., T. Spencer, \A New Parallel Algorithm for the Maximal Independent Set Problem", a preliminary version appears in the Proceedings of 28 Annual Symposium on Foundations of Computer Science, 1987, pp. 161{165, to appear in SIAM J. on Computing. th

[8] Israeli A., A. Itai, A Fast and Simple Randomized Parallel Algorithm for Maximal Matching, Computer Science Dept., Technion, Haifa, Israel, 1984. [9] Israel, A., Y. Shiloach, \An Improved Parallel Algorithm for Maximal Matching", Information Processing Letters, 22, 1986, pp. 57{60. [10] Karp, R.M., A. Wigderson, A Fast Parallel Algorithm for the Maximal Independent Set Problem, Proceedings of the 16 ACM Symposium on Theory of Computing, 1984, pp. 266{272. [11] Ladner, R., Fischer, M., "Parallel pre x computation", JACM, vol. 27, 1980, pp. 831838. [12] Lovasz, L., \Combinatorial Problems and Exercises", Akademiai Kiado, Budapest and North-Holland, Amsterdam, 1979. th

[13] Luby, M., \A Simple Parallel Algorithm for the Maximal Independent Set Problem", a preliminary version appears in the Proceedings of the 17 Annual Symposium on Theory of Computing, pp. 1-10,May 1985, nal version appears in SIAM J. Comput., vol. 15, no. 4, November, 1986, pp. 1036-1053. th

41

[14] Luby, M., \Removing Randomness in Parallel Computation Without a Processor Penalty", Proceedings of the 29 Annual IEEE Symposium on Foundations of Computer Science, October 1988, pp. 162-173. th

[15] Motwani, R., Naor, J., Naor, M., \The Probabilistic Method Yields Deterministic Parallel Algorithms", Proceedings of the 30 Annual IEEE Symposium on Foundations of Computer Science, pp. 8-13, October, 1989. th

[16] Raghavan, P., \Probabilistic construction of deterministic algorithms: approximating packing integer programs," JCSS, vol. 37, pp. 130-143, 1988. [17] Spencer, J., Ten Lectures on the Probabilistic Method, CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, 1987.

42