On the Hardness of Approximating Minimization Problems

2 downloads 0 Views 2MB Size Report
CARSTEN LUND AND MIHALIS YANNAKAKIS. AT&T Bell Laboratories, Murray .... problems: Hypergraph Transversal (node cover), minimum Hitting Set, mini-.
On the Hardness of Approximating

CARSTEN AT&T

LUND

AND

MIHALIS

Minimization

Problems

YANNAKAKIS

Bell Laboratories, Murray Hill, New Jersey

Abstract. We prove results indicating that it is hard to compute efficiently good approximate solutions to the Graph Coloring, Set Covering and other related minimization problems. Specifically, there is an E > 0 such that Graph Coloring cannot be approximated with ratio n’ unless P = NP. Set Covering cannot be approximated with ratio c log n for any c < l/4 unless NP is contained in DTIME(nP”Y’“~“). Similar results follow for related problems such as Clique Cover, Fractional Chromatic Number, Dominating Set, and others. Categories -reducibihy

and Subject Descriptors:

F.1.3 [Computation by Abstract Devices]: Complexity Classes

and completeness;

F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems-cornpururion on discrete srructures; G.2.1 [Discrete Mathematics]: Combinatorics-con&inatoriol algorithms; G.2.2 [Discrete Mathematics]: Graph Theorygraph olgorithnls

General Terms: Algorithms,

Theory

Additional Key Words and Phrases: Approximation algorithms, chromatic number, clique cover, dominating set, graph coloring, hitting set, independent set, NP-hard, probabilistically checkable proofs, set cover

1. Introduction Graph Coloring and Set Covering are two important minimization problems that have been extensively studied over the last 25 years. They are both conceptually simple and have served as paradigms for problems from many application areas. In the Graph Coloring problem, we are given a graph G and wish to color its nodes with as few colors as possible so that no two adjacent nodes receive the same color. In the Set Covering problem, we are given a finite collection 9 = {S,, . . . , S,,,} of subsets of a finite set U, and we wish to compute a subcollection that contains as few sets as possible and covers U. Both problems were shown to be NP-hard in Karp’s original paper 119721. Since it is unlikely that they can be solved optimally in polynomial time, there has been a lot of work in exploring the possibility of obtaining efficiently

Authors’ address: AT&T Bell Laboratories,

600 Mountain

Avenue, Murray Hill, NJ 07974.

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1904 ACM 0004-5411 /Y4/0900-0960 $03.50

Hardness of Approximating Minimization Problems

961

near-optimal solutions. The usual metric for measuring the nearness to optimality of a solution is by the ratio of its cost to that of an optimal solution. The performance of an approximation algorithm is measured then by its worst-case ratio over all inputs of a given size. An algorithm achieves ratio r if for every instance it computes a solution whose cost is at most r times the cost of the optimal solution. (This is for minimization problems; in the case of maximization problems, the value of the computed solution must be at least l/r times the optimal value.) The approximability status of Graph Coloring and Set Covering have been long-standing open problems. A number of heuristics were developed for the Graph Coloring problem already from the 60’s even before the discovery of NP-completeness. Johnson analyzed in 1974 the performance of many such heuristics in Johnson [1974b] and proved that their worst-case ratio is very poor, n(n) where n is the number of nodes. He also proposed another heuristic with slightly better performance, @(n/log n). This has been improved since then, but still the currently best-known ratio achievable by a polynomial-time approximation algorithm is only O(n(loglog n)*/(log n)“) [Halldbrsson 19901, a ratio that is worse than n’ for any E < 1. Somewhat better ratios have been obtained in the special case of graphs that can be colored with a small number of colors (e.g., for 3-colorable graphs) [Blum 1990; Wigderson 19831. On the negative side, the best known result is still that of Garey and Johnson [1976] showing that if a polynomial-time approximation algorithm achieves a constant ratio smaller than 2 then P = NP. In the case of the Set Covering problem, Johnson and Lovasz proved in the mid 70’s that a simple greedy heuristic achieves a ratio of In(N) + 1 = 0.710gzN, where N = ((/I is the number of different elements [Johnson 1974a; Lovasz 19751. This result has served as an important paradigm in many contexts and has been extended several times. Chvatal [1979] generalized it to the weighted case, in which each set of the given collection has an associated weight and we wish to find a cover with the minimum total weight. The above logarithmic factor bounds the ratio of the cost of the solution obtained by the greedy heuristic not only to the cost of the optimal solution, but also to the cost of the best fractional cover, that is, the best solution to the Linear Programming relaxation of a standard formulation of the Set Covering problem as an Integer Program. The result was further generalized in different directions, for example, to the Integer Programming problem of minimization type with nonnegative coefficients [Dobson 19821, to the case of submodular constraints [Wolsey 19821, and to a continuous version [Fisher and Wolsey 19821. On the negative side, essentially nothing was known until very recently. The problem was shown MAX SNP-hard in Papadimitriou and Yannakakis [1988]. The results of Arora et al. [1992] imply then that it does not have a polynomial time approximation scheme unless P = NP; that is, there exists a constant E > 0 such that the problem cannot be approximated in polynomial time with ratio smaller than 1 + E unless P = NP. In this paper, we show strong negative results on the approximability of both these problems. For the Graph Coloring problem, we show that there is a constant E > 0 such that no polynomial time approximation algorithm can achieve ratio nE unless P = NP. Similar results follow for other related minimization problems. The Graph Coloring problem can be stated as the problem of covering the nodes of a graph by the minimum number of

962

C. LUND

AND

M. YANNAKAKIS

independent sets. Other graph covering problems of this type that are known to be (almost) equivalent to coloring with respect to approximability include: covering the nodes of a graph by the minimum number of cliques, covering the edges of a graph by cliques, and covering the edges of a bipartite graph by complete bipartite subgraphs [Kou et al. 1978; Orlin 1977; Simon 19901. Also a similar negative result holds for the approximation of the fractional coloring problem (we give the definition in the next section). For the Set Covering problem, we show that it cannot be approximated within ratio c log,N for any constant c < l/4 unless NP is contained in unless NP is contained in DTIME(nPO’Y’ogn ) (and for any c < l/2 ZTIME( n PO’Y log’ ), where ZTZME(t(n)) is the class of languages that can be recognized in probabilistic time t(n) with zero-sided error). Of course, the same result follows for the generalizations of Set Covering mentioned above. Furthermore, similar results follow again for closely related minimization problems: Hypergraph Transversal (node cover), minimum Hitting Set, minimum Dominating Set in a graph, and the variant of the Set Covering problem where we wish to minimize the sum of the cardinalities of the sets in the cover. Our proofs use the recent results from interactive proof systems and probabilistically checkable proofs and their surprising connection to approximation. This connection was first observed, by Feige et al. [1991]. They used it to prove a negative result on the approximation of the maximum clique and the maximum independent set problems, which was then tightened in Arora and Safra [1992] and Arora et al. [1992] to show that these problems cannot be approximated with ratio tt’ for some constant E > 0 unless P = NP. Arora et al. [1992] developed new interactive proof systems for NP that allowed them to prove that unless P = NP, Maximum 3-Satisfiability and a host of other problems that are hard for the class MAX SNP do not have a polynomial time approximation scheme. MAX SNP is a class of maximization problems defined syntactically in Papadimitriou and Yannakakis [1988]; minimization problems can be “placed” in the class or shown hard for it indirectly through approximation-preserving reductions from their complementary maximization problems. The interactive proof techniques can be most naturally applied to maximization problems. Interactive proof systems themselves can be viewed as maximization problems in which the goal is to find strategies for the provers that maximize the probability that the verifier accepts its input. To prove intractability for the approximation of minimization problems we must relate them to appropriate maximization problems. We prove the nonapproximability of Graph Coloring by a reduction from the maximum Independent Set problem. It has been known for a long time (see Johnson [1974a]) that one can use an algorithm for the Independent Set problem with approximation ratio r to obtain an algorithm for Graph Coloring with ratio r log n; our reduction shows a relationship between the two problems in the reverse direction. The proof draws in spirit from the recent techniques in that the construction has a (linear) algebraic flavor instead of the more usual graph-theoretic one. Our proof for the Set Covering problem is completely different. It uses a reduction from two-prover one-round interactive proof systems [Feige and Lo&z 19921. We give the definitions for these proof systems later when we present the reduction. In the rest of this paper, we prove the above results. In Section 2, we address the Graph Coloring problem, and in Section 3, the Set Covering problem,

Hardness

ofApproximating

Minimization Problems

963

2. The Difficulty of Coloring

We shall give first preliminary definitions and notation. Then, we shall describe in Section 2.1 the construction proving the main result stated in the Introduction on the nonapproximability of coloring. In Section 2.2, we prove a result for the case of graphs with bounded chromatic number: if there is a constant c such that for every k there is a polynomial time algorithm (possibly depending on k) that colors every k-colorable graph with c * k colors, then P = NP. Finally, in Section 2.3, we discuss some other minimization problems related to Coloring. For a graph G, we use n(G) to denote its number of nodes, Q(G) to denote its independence number (the size of its largest independent set), and x(G) to denote its chromatic number. Our reduction is from the maximum independent set problem. Arora et al. [1992] showed that it is NP-hard to approximate the maximum independent set problem within a factor n4 for some E > 0. It will be convenient to take advantage of the structure of the graphs that are produced in their reduction (which is the same as the reduction of Feige et al. [1991]).’ From a Boolean 3CNF formula cp with m variables, they construct a graph G, on n = rn’“’ nodes that are partitioned into r cliques C,, . . . , C, with the following properties: (1) If cp is satisfiable, then a(GV) = r, that is, G, has an independent set that contains exactly one node from each clique Cj (obviously an independent set cannot contain more than one node from a clique); (2) If cp is not satisfiable, then cw(G,) < r/g where the “gap” g is nt for some E > 0. We explain briefly the relationship to the probabilistically checkable proof for 3SAT: r is the number of random choices of the verifier; there is one clique Cj for each random choice, which contains one node for each possible combination of answers from the provers that causes the verifier to accept. (The graph contains more edges connecting nodes from different cliques if they correspond to contradictory answers to a common question.) Thus, for a probabilistically checkable proof with I random bits and 4 query bits, r = 2’ and the size of each C, is bounded by 2y. We refer the reader to the references for more information on probabilistically checkable proofs; they are not needed to understand the rest of this section. The basic idea of our reduction is as follows: Let G be a graph as above whose nodes are partitioned into cliques C,, . . . , C,. We consider the nodes of G as being placed on a grid with r rows, one for each clique; see Figure 1. The nodes of each clique are ordered arbitrarily in the corresponding row. For simplicity, we can assume without loss of generality that all rows have the same number of nodes, by adding if necessary new nodes that are adjacent to all other nodes of the graph; clearly, this does not change the size Q(G) of the maximum indcpendcnt set. We call the number of rows r the height of the graph, and the number of columns, say u’, the width of the graph. ’ Let us remark howcvcr that every graph can bc convcrkd to one that has this structure. In an Appendix, we explain how to do this, and show thal every graph G can be transformed in polynomial time to another graph If whose chromatic number is roughly inversely proportional to the indcpcndcncc number of (;.

964

FIG. 1. The node set of a clique-partitioned r =4 and w = 5.

C. LUND

AND

M.

YANNAKAKIS

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

graph. Here

Clearly, the chromatic number x(G) is at least n(G)/a(G) because every color class of any coloring is an independent set. Thus, if the maximum independent set has size cw(G) < r/g then x(G) > w *g. On the other hand, suppose that G has an independent set I with r nodes (one from each row), which has furthermore the following “shifting” property: for every s = 1,. . . , w, if we shift cyclically every node of I in its row s places to the right (with wraparound), the resulting set of nodes is also independent. Then, these w shifted sets form a coloring of G and hence x(G) = w. Thus, if the graphs G constructed in the reduction from 3SAT had this shifting property, then the gap g could be transferred from the independent set problem to the chromatic number problem. Unfortunately, there is, of course, no reason to expect that the graphs of the 3SAT reduction would have this property. However, we show how any partitioned graph G can be transformed to another graph that does have such a property. Given a partitioned graph G with r rows and w columns, we shall construct another partitioned graph H with b times more rows (where b depends polynomially on r and w) and more columns such that (8 the independence number increases proportionately: a!(H) = b. a(G), and (ii) if (y(G) = r, then H has the above shifting property: we can transform any independent set I of G with r nodes to an independent set I’ of H with b * r nodes (one from each row of H) such that we can color the nodes of H by appropriate shifts of I’. Thus, x(H) = n(H)/cr(H) is equal to the width of H. It follows that a gap can be transferred from the independent G to the chromatic number problem in H.

set problem

in

2.1. THE CONSTRUCIION. Let G be a graph whose nodes are partitioned into cliques C,, . . . , Cr. Let p be a prime that is at least as large as the size of the largest Ci. We will work in the field ZP of integers modulo p. As described above, we consider the nodes of G as being placed on a grid with r rows. Without loss of generality, we assume that all rows are padded to width p (by adding if necessary nodes that are adjacent to all other nodes). Each node u of G can be represented by a pair (i, i>, where i = 1,. . . , r is the row index and i E 2, is the column index of u. The new graph H has p2 - r rows and p2 -p = p3 columns. It is obtained from G by replacing every node u of G by a block of p4 nodes arranged in a p2 X p2 grid (see Figure 2). We index the rows and columns of each block by elements of 2;. Each node u of H will be represented by a 4-tuple (i, i, X, y),

L

Hurdness of Approximating Mrlimizution :8888

00000

00000

8x3

88%:

888%

8%%8

~0000

00000

00000

X%8

8S88 88%:

:8888

88%:

00000 Z3888

~0000

l oooo

8E%

88%:

:8888

888;:

8%%

:8888

l oooo

~0000

8E38 88%: 88%: 8%:8

00000 ~0000 ~0000

~0000

:E%

8%S

00000

00000 0000.

0000. 00.00

88% 0.000

~0000

00000 0000.

w%

8%8

0000. 00.00

88% 0.000

00000 8%8 0000.

00000 0000. 8%%

00.00 ~0000

%%8 ooooe oo*oo

0.000

Prohkms

96.5

FIG. 2. The node set in our construction. Because of space considerations, WC USC in this figure only p x p grids for every node in G, whcrc p = 5. The black nodes in the block corresponding to (‘ illustrate the set

F( 1-1.

l oooo 0000~

882% 0.000

where the first two components i = 1,. . . , r, and j E Z, are the row and column indices respectively of the block containing U, and the other two components X, y E Zd are the row and column indices of u within its block. We shall use this notation consistently in the following; that is, i, j (possibly with a subscript) will be used for the row and column indices of a block (and of a node of G), and X, y for the row and column indices within a block. We define a mapping F from the nodes (and sets of nodes) of G to sets of nodes of H. The edges of H will be defined so that for every independent set Z of G, its image F(l) as well as all shifts of F(I) are independent sets in H. We map every node c = (i, j) of G to the following set F(c) of p’ nodes from its corresponding block in H containing one node from each row: F(l?) = ((i, j, x, jx)lx E Z$. The mapping is extended in a straightforward way to a set Z of nodes: F(Z) = U,.. , F(u). Note that distinct nodes of G are mapped to disjoint sets in H (because they are contained in different blocks) and hence IF( = p2111for all I. A shift is specified by elements s E Z and .z E Zj; the shift maps each block to the block s positions to its right 4.with wraparound) and moves each node within its block by adding z to the column index. Formally, shift, ,(( i, j, x, y)> = (i, j + s, x, y + z), and for a set of nodes Q of H, we let shift,.,(Q) = UuEa shift,,,(u). We define indirectly the edge set of H by specifying which nodes are not adjacent. Two nodes u,, u2 of H are not adjacent if and only if there are nonadjacent nodes LI,, u2 of G (possibly, u, = u,> and shifts S, z such that u, E shift,,,(F(u,)) and u2 E shift,,,(F(u,)). Thus, every pair of nonadjacent nodes u,, u2 of G gives rise to several nonedges in H by mapping u, and c2 into H and then shifting by the same amount. This concludes the construction of the graph H. We shall prove that the graph Z-Z has the following properties: If I is any independent set of G, then its image F(I) and all the shifts of F(Z) are independent sets in H; in particular, if 111= r, then these sets yield a coloring of H. This means that a(H) 2p2cu(G), and if Q(G) = r, then X(H) = n(H)/p2r. On the other hand, if we pick p 2 r, then any “large enough” independent set I’ of H can be obtained from some independent set Z of G by mapping into H and shifting; that is, there are S, z such that I’ G shift,,,(F(Z)). This implies that a(H) I p2cr(G>. These two facts will imply our result. First, let us make some simple observations on the structure of H.

C. I.UND AND M. YANNAKAKIS

966 F~cr

2.1.

Consider two (distinct) irdependcnt nodes 11, ad

~1~of H.

( 1) If they belong to the same row of blocks, then they must in j&t be in the sume block. (2) If they are in the same block, therl they must be in diSftent rows. PROOF.

nodes The absence of an edge [u,, u,] is due to some nonadjacent of G and shifts s, z such that U, E shift ,*,(F(c,, )) and 11, E shift_(F(r>,)). If (I,, ~7~are distinct nodes, then they must be in different rows because every row of C is a clique, and thus II,, 1~~will belong to different rows of blocks. If l:, = l-Z are the same node, say I*, of G, then u ,, 1~~E shift,Y,z(F(c)). Recall from the definition of F that all nodes of F(V) are contained in the same block and are in different rows. This continues to hold •I after we shift F(c), which proves the claims. l’,,

1’2

Note that the fact implies in particular that every row of H is a clique. The graph H has p2r rows and p3 columns, so at best its independence number is p’r and its chromatic number is p3. We show now that this is the case if cr(G) = r. LEMMA 2.2 (1)

&HI

2 p’c4G). = r, then a(H) n(H)/a(H) = p3.

(2) Zf a(G)

= p’r

and the chromutic number of H is x(H)

=

PROOF

set of G and let I’ = shift,s.,(F(I)) for some (1) Let I be an independent shifts s E Zq, z E 2;. For any two nodes u,, 112of I’, there exist nodes L’,, u2 of I (possibly the same node, ~1, = ~‘~1such that II, E shift,.,(F(l,,)) and u2 E shift,Y,,(F(c2)). From the definition of H, the nodes u,, u1 are not adjacent. Hence, I’ is an independent set. Since IF(I)1 = p’)II, it follows that a(H) 2 p2dG). (2) Suppose that cr(G) = r and let I be a maximum independent set of G; the set I contains a node from every row of G. For every s E Z,, z E Zi we have a color class C(s, z) = shift,,,(F(I)) of H; algebraically, C(s,z)

= ((i,j

+ s,x,jx

+z)l(i,j)

E 1,x E Zp’).

As we just argued, every color class C(s, z> is an independent set. Furthermore, we claim that these p3 color classes cover all the nodes of H. The reason is that if we take any node u of G, say from the ith row (in particular, the node of that row that is in 0, map it to the set F(u) in H and then take all possible shifts, we cover all nodes of H in the ith row of blocks. Specifically, if the node of Z from the ith row of G is u = (i, j), then a node u = (i, k, x, y) of H from the ith row of blocks belongs to the color class C(s, .z) where s = k - j and z = y - jx. q We show a converse inequality bounding the independence number of H from above. We argue that every large enough independent set of H is essentially obtained from an independent set of G by mapping it into H via F and then shifting it. If p < r, this is not quite true (but almost). For the purposes of this subsection, we could pick p 2 r; however, in the next subsec-

Hardness of Appr(ximuting

Minimization Problems

967

tion, concerning the bounded chromatic number case, we need that p < r. For this reason, we will not make any assumption on the relation between p and r. This part will involve more the linear algebraic nature of the construction. Recall that the absence of an edge between two nodes 11, = (i,, j,, xl, y, > and 112= (i,, j,, x2, y2) of H is due to the existence of nonadjacent nodes I’,, L’? of G and shifts s, z, such that II, E shift C,z(F(r,, 1) and n2 E shift ,,Z(F(L~z)). A given pair of nodes u,, uz may have more than one such “witness” of nonadjacency. We observe first that a choice for the block shift s determines uniquely the rest of the witness, L’,, ~1~and z: From the definitions of F and shift, we have L’, = (i,, j, - s), L’? = (i?, j, - s) and y, = s = j,x, -y, -j2x2 +y2. If x, f x,, then the equation (b) has a unique solution in s (if any). Let us call a pair of nodes of H coupled if their row indices x in their respective blocks are different. Thus, a pair of coupled nodes has a unique witness of nonadjacency. For an independent set I’ of H, it is conceivable that different pairs of nodes of I’ have their nonadjacency witnessed by different block shifts s. We shall prove that if I’ is large enough then all coupled pairs of I’ use in fact the same s. This will allow us then to relate large independent sets in H back to independent sets in G, and thus bound a(H) from above in terms of (Y(G).

LEMMA 2.3. If I’ is an independent set of H with more than pr nodes, then all coupled pairs of I’ use the same block shift s. PROOF. Let u,, u2, u3 be three pair-wise independent, coupled nodes, where - (ik, jk, xk, yk) for k = 1, 2, 3. Let s be the block shift used by the pair u,,- u,), let s’ be the shift for (u,, uj) and s” for the pair (u*, u,). We have the Y” analogous equations to (b): (xj - x,)s’

= j,x,

- y, - j,x,

+ y,

and (x2 - x3)s” = j2x2 - y, - j3x3 + y3. Adding these together

with Eq. (b), we get:

(x, -x&

+

(x3

-x&s’

+

(x2

-x3)sW

=

0,

or equivalently (x, -x&s

- s’> = (x3 -x,)(?

Recall that the x’s are the row indices of blocks and are vectors of Zi. Since the nodes vectors are distinct. From Eq. 1, if two of the the third. That is, either all shifts s, s’, s” are

- $9.

(1)

the nodes within their respective are pairwise coupled, all three x shifts are equal, then so must be distinct or they are all equal. Let

968

C. LUND AND M. YANNAKAKIS

us say that x3 is bud for the pair (x,, x2) if Eq. 1 has a solution with unequal shifts; otherwise, it is good. We accordingly say also that the node u3 is bad or good for the pair of nodes (u,, uz). In the bad case, we have x3 =x2 + t(x, -x2), where t = (s - s’)/(s” - s’). Thus, there are p bad vectors x3 corresponding to the p possible values of t (for t = 0, 1 we get x2, x,1;

geometrically, they correspond to the points of the Za plane that lie on the note that if x3 is bad for the pair x,‘: line connecting x, and x2. Furthermore, x2 (i.e., is collinear with them), then another vector x4 is also bad for the pair x1, x2 if and only if it is bad for the pair x2, x3. Consider now an independent set I’ of H with more than pr nodes. Let X be the set of vectors x E Zp’ such that I’ contains a node with row index x within its block. From Fact 2.1, for every row of blocks, I’ contains nodes from at most one block, and all these nodes must be in different rows. Since there are r rows of blocks, (I’( I t-(X(, and since 11’1> rp, we have that 1x1 > p. Thus, X has at least p + 1 elements. This implies that for every coupled pair (u,, u,> there exists a node u3 of I’ that is good for the pair. To show that all coupled pairs of nodes of Z’ use the same (unique) block shift s, consider two coupled pairs (u,, u?), (uj, u4). Since u,, u? are coupled, u3 must differ in its x index from at least one of the two, so assume without loss of generality that it differs from uz; that is, u3 is coupled with u2. If u3 is good for the pair (u,, uz), then the shifts corresponding to the pairs (LI,, u2) and (u?, u,) are equal. If u:, is bad, then let lls be a node in I’ that is good for the pair (u,, u,); then us is also good for the pair (u?, uJ. It follows that the shift of the pair (u5, u?) is equal to both the shift of the pair (u,, u2) and to the shift of the pair (uz, u,); thus, the latter two shifts are again equal to each other. Similarly, we can argue that the pairs (u,, u,) and (u,, u,) have equal •I shifts, thus proving the claim. We can now bound LEMMA

2.4. a(H)

a4 H 1 from above. s n~u.x(p~a(G),p~(a(G)

- 1) + r,pr).

PROOF. Suppose that H has an independent set I’ with more than p/ nodes. Let s be the global block shift that is consistent with all coupled pairs of nodes in I’. From condition (a) of nonadjacency, if u,, u2 are a coupled pair of nodes of I’ from blocks (i,,j,> and (i2.jz), then the nodes (i,,j, -s> and ( iz, jl - s) of G are not adjacent. Note furthermore that if I’ has two or more nodes in a block (i,, j, >, then they must be in different rows (by Fact 2.1); thus, every other node from any block (i,, j,) is coupled with at least one of them, and therefore, nodes (i,, j, - s) and (2?, j2 - s) of G are not adjacent. Let I be the set of nodes (i, j> of G such that I’ contains at least two nodes from block (i, j + s), and let K be the set of nodes (i, j) such that I’ contains exactly one node from block (i, j + s>. By Fact 2.1, for each row of blocks, Z’ contains nodes only from at most one block, and can only contain at most p’ nodes from a block. Thus, 11’1< p’IZI + IKI and IKI I r. From the argument above, Z is an independent set, and furthermore no node of Z is adjacent to any node of K. Thus, 111I CT(G), and furthermore if K is nonempty, then 111_< o(G) - 1 because we could augment Z with any single node of K. Therefore,

11’1I max(p%(G),p’(~(G)

- 1) + r).

0

Hardness of Approximating

Minimization Problems

969

Note that, if p 2 r/a(G), 6 (e.g., if p 2 r), then the right-hand side in the inequality of Lemma 2.4 simplifies to p%(G). Combining Lemmas 2.2 and 2.4, we have: PROPOSITION2.5 (1) p2cy(G> I c4H) _< max(p20(G), p2(o(G) - 1) + r, pr). In particular, if we pickp 2 r, then a(H) = p2cr(G>. (2) If (Y(G) = r, th en Q(H) = p2r and the chromatic number of H is x(H) = n(H)/c-u(H) = p3. We are ready to prove the main theorem

now.

THEOREM 2.6. There is a 6 > 0 such that it is NP-hard to approximate the chromatic number problem within a factor n’. PROOF. By Arora et al. [1992], there is a E > 0 such that given a 3CNF formula cp we can construct in polynomial time a graph G partitioned into r cliques C; such that (1) if cp is satisfiable, then a(G) = r, and (2) if cp is not satisfiable, then (Y(G) < r/[n(G)l’. Apply the above transformation, and assume that we choose p 2 r (though in some cases a smaller p may yield a better 6). Let H be the resulting graph. If cp is satisfiable, then x(H) = p’. If cp is not satisfiable, then x(H) 2 n(H)/cr(H) = rp’/p’cu(G) 2 p”[n(G)]‘. The ratio between the two cases is at least [n(G)]’ 2 [n(H )I’ for some constant 6 > 0. The precise value of 8 (and the best choice for p) dcpcnds on the relationship between r, n(G) and E; that is, ultimately it depends on the relationship between the number of random bits, the number of query bits and the error probability that come out of the proof of Arora et al. [1992]. q 2.2. BOUNDED CIIROMATIC NUMBER. The gap of 11’ of Theorem 2.6 holds only for graphs with unbounded chromatic number. We can deduce from the reduction also some constant gaps for graphs with bounded chromatic number, whcrc the gap incrcascs (going to infinity) as the chromatic number increases (going to infinity). If L,, L, are two disjoint languages, we say that an algorithm distinguishes between L, and L2 if on input x, it outputs “Yes” if x E L, and it outputs “No” if x E L2; the output is irrevelant if x is neither in L, nor in Lz. Recall that for a graph G whose nodes are partitioned into cliques C,, . . . , C,, the height of G is r and the width is max IC;l. The results of Arora et al. I19921 imply the following lemma.

LEMMA 2.7. For et’ery constant g > 1 (the “gap”), there is a constant w such thut the following prothwz is NP-hard. Gitrn a purtitioned qaph G with width at most w, distinguish between the case that a(G) is equal to the height r of G and the cuse thut a(G) < r/g. PROOF. This can be shown by a standard amplification argument as in Berman and Schnitgcr [1989], for example. Let t > 1 be a constant such that given a 3CNF formula cp with m clauses, it is NP-hard to distinguish between the cast that cp is satisfiable and the case that WC can satisfy less than m/t of

C. LUND AND M. YANNAKAKIS

970

its clauses. Let 1 be a positive integer such that t’ > g. Let 9’ be the set of HZ’ possible tuples of 1 clauses of cp and let r = m’; each tuple of .Y is considered as a conjunction of its clauses. Clearly, the number of tuples of 9 that are satisfied by any truth assignment is the Ith power of the number of clauses of cp satisfied by the assignment. Construct a graph G that has a node for every tuple T; in Y and every assignment of truth values to the (at most 30 variables of the clauses of 7; that satisfies the tuple. Two nodes are adjacent iff the corresponding partial assignments give contradictory values to a common variable. For every tuple 71 of Y the corresponding nodes form a clique C;. Thus, the width w of the graph is at most 23’. The size a(G) of the maximum independent set of G is equal to the maximum number of tuples of 9 that can be satisfied by a truth assignment. If cp is satisfiable, then cr(G) = r; if we can satisfy less than m/t •I clauses of cp, then (Y(G) < (m/t>’ I r/g. THEOREM 2.8. For ecery constant g > 1, there is a constant c such thut the following problem is NP-hard. Girlen a graph H, distinguish between the case that H is colorable with c colors and the case that the chromatic number of H is nt least g . c. PROOF. Let g’ = g + 1 be a constant gap for the independent set problem and let w be the implied width for g’ from the preceding lemma. Let p be the smallest prime that is at least as large as g’ and w, and let c = p3. Clearly, p is a constant that depends on g, and the same is true of c. Let G be a partitioned graph with width w and height r. Let H be the graph obtained by the construction of the previous section using the above value of p. By Proposition 2.5 if (Y(C) = r, then x(H) = p3 = c. Suppose that a(G) < r/g’. We can easily bound the three quantities on the right-hand side of the inequality of the first part of Proposition 2.5. The first and the second quantity are smaller than p%(G) + r < p’(r/g’) + r < p’r/g. The third quantity is pr < p’r/g. Thus, Q(H) < p’r/g, and consequently X(H) 2 n(H)/a(H) > gp” =g*c.

0

2.3. RELATED PROBLEMS. We define some problems proximability properties to Graph Coloring. Clique Partition.

number of cliques; number of cliques.

that have similar ap-

Given a graph G partition its nodes into the minimum equivalently, cover the nodes of G with the minimum

Given a graph G, find the minimum number of cliques that Clique Couer. cover the edges of G. This problem was studied and shown NP-hard in Kou et al. [1978] and Orlin [1977]. It is equivalent to the problem of finding a set U of minimum cardinality such that G can be expressed as the intersection graph of a collection of subsets of U. Biclique Couer. Given a bipartite graph G, find the minimum number of complete bipartite subgraphs that cover the edges of G. This problem was shown NP-hard in Orlin [1977]. It is related to communication complexity. In particular, the nondeterministic communication complexity of a predicate is equal to the logarithm of the biclique cover number of a bipartite graph associated with the predicate.

Hardness of Appro.uimating Minimization Problems

971

Given a graph G, find a collection of indeFractional Chromatic Number. nonnegative (possibly fracpendent sets I,, . . . , I, of G and corresponding tional) values h,, . . . , A,, so that the sum of the Aj’s is minimized, and for every node c of G the sum of values assigned to the independent sets containing L’ is at least 1. It is well known that we only need to assign nonzero value to at most II independent sets, that is, there is an optimal solution with t I II. If we consider the Graph Coloring problem as a special case of the Set Covering problem where the collection of sets is the collection of all independent sets of the graph (it is not listed explicitly), then the fractional chromatic number is the optimal value of the corresponding Linear Programming relaxation. The fractional chromatic number is within a log it factor of the (ordinary integer) chromatic number. Grijtschel et al. [1981] used the Ellipsoid algorithm in an interesting way to show that it is NP-hard to compute a weighted version of the fractional chromatic number. The NP-hardness of the unweighted version appears to be new. THEOREM 2.9. The following holds for each of the problems Clique Partition, Clique Car-er, Biclique Cover, and Fractional Chromatic Number. There is a 6 > 0 such that there does not exist a polynomial time approximation algotithm that achierves ratio ns unless P = NP. PROOF. The equivalence between Clique Partition and Graph Coloring is obvious. The claim for Clique Cover and Biclique Cover is not trivial, but it follows from known reductions from the Graph Coloring problem [Simon 19901. The claim for the Fractional Chromatic Number follows from the above mentioned logarithmic relationship to the chromatic number, but also it follows directly from the construction of Section 2.1 and Propositions 2.5. Note that the fractional chromatic number of any graph H cannot be smaller than n( H)/a( H). It is easy to see then that the arguments of Theorem 2.3 for the chromatic number apply also to the fractional chromatic number establishing the same gap. 0 3. The Dificulry of Set Cocering A set system 9 = (U; S,, SZ,. . . , S,) is a collection of sets S,, SZ,. . . , S,, C U. We say that S,, SZ, . . . , S,,, cover U if and only if Ur! , S, = U. In this section, we study the set-covering problem: Given a collection of sets s,, c cJ= (1,2 )...) N} find the minimum subcollection that covers s,, s,,..., N). We show that this problem is hard to approximate within a factor (1,2,..., of c log N, for any 0 < c < l/4. We then list some equivalent problems. 3.1. THE CONSTRUCTION. Our construction uses a 2-prover l-round interactive proof system constructed by Feige and Lo&z [1992]. Our construction depends on some specific properties of their proof system and we need to change the proof system slightly. A 2-prover l-round interactive proof system for a language L consists of three players: a probabilistic (computationally limited) verifier V and two deterministic (all-powerful) provers P, and P,. Fix an input size n. The proof system has an associated finite set Q;, i = 1, 2 of possible queries that the verifier can ask the ith prover, a finite set Ai of possible answers that it can receive from the ith prover, a finite set R of possible random seeds for the verifier, a polynomial-time computable function f from Z” X R to Q, X Q,

972

C. LUND

AND

M. YANNAKAKIS

(where C is the input alphabet) and a polynomial-time computable Boolean predicate l7 on ZZ,”x R x A, x A,. A prover P, for i = 1, 2 is simply a function from Q, to A,. Given an input x of length n, the verifier chooses uniformly at random a seed r from R and computes a pair of queries (q,, qZ) = f(x, r) to the provers. After receiving answers a, = P,(q,) and a, = P&C/~)from the provers, the verifier evaluates the predicate II(x, r, a,, a,) and accepts or rejects accordingly. Feige and Lo&z [1992] constructed 2-prover l-round proof systems for any language in NEXP. Their construction can easily be scaled down to a 2-prover l-round proof for SAT such that -If cp E SAT, then there exists provers such that the verifier accepts always. --If cp E SAT, then for any pair of provers the verifier accepts with probability at most l/n, where n is the size of cp. Furthermore, the verifier uses poly log n random bits and all the messages have length polylog n; that is, the sets R, Q, and Ai, i = 1, 2, have cardinality bounded by 2P0’Y”‘g ‘. For a fixed prover P,, the verifier’s queries to that prover are distributed uniformly at random over the set Qi of all the possible queries, and given a query to one prover the number of queries that can be asked to the second prover is independent of the query to the first prover. Furthermore, for a fixed choice of the random seed and a fixed answer from the first prover there is at most one answer from the second prover that makes the verifier accept. We need that the verifier asks the provers the same number of queries. The Feige-Lo&z proof system does not have this property, but a simple variation of the proof system has the property. In this variation, the verifier generates two independent sets of queries ((q,, q2), ((I’,, $.I), following the procedure used by the Feige-Lo&z verifier. Then, the vertfier asks the first prover the query (q,, 4:) and the second prover the query (q2, 4;). The provers answer the queries by answering the first component of the query according to the Feige-Lovasz protocol, ignoring the second component. The verifier accepts if and only if the Feige-Lovasz verifier would accept the answers to the queries ((I,, qZ). It is now easy to see that the number of queries is the same to both provers and clearly the provers’ chances’of making the verifier accept are exactly the same as in the Feige-Lovasz proof system. Note that this proof system still has all the properties of the Feige-Lo&z proof system mentioned above. Our construction uses as a basic building block a set system z&‘~,,= (B; C,, Cz,..., C,,>, where m, I are positive integers. This set system has the property that if Ic{l, 2,..., m) is any set of at most 1 indices, then no collection (DiliE, covers B where Dj is C, or the complement of C;. We show in Lemma 3.2 that such set systems can be constructed for which IB) = 0(22’m2). The basic idea of the construction is as follows: Let cp be a CNF formula. We shall construct an instance Yq of SETCOVER such that -If cp E SAT, then SETCOVER ---If cp G SAT, then SETCOVER l/4.

= IQ,1 + lQ,l. 2 c log N(lQ,l + lQ21>,for any 0 < c
0 be a constant, and choose 1 = max(k logIRl, k loglA,I), and recall that m = lA21. We shall show in Lemma 3.2 below that lBl = 0(22’lAz12>.

Hardness of Approximuting

Minimization Problems

Thus, log N = log(lRIIB])

I 21(1 + 3/2k).

Hence, we get that

18’1

log (lRllBl>

lQ,l+ IQ,I r

4

i

Since k can be chosen to be an arbitrarily follows. q We give now a construction

975

large constant,

the lemma

for the set system .c@ ,,,,, = (B; C,, C,, . . . , C,,,).

LEMMA 3.2. GiL?en integers m, 1 there exists a set B ~md C,, C,, . . . , C,,, c B such that for any sequence of indices 1 5 i, < i, < ... < i, I m, no collection Di, cot’ers B where D,, is C,, or the complement of Ci,. Furthermore, D,, D,.,..., IB\ = 0(2”m’). of the problem. We need a collection PROOF. We look at a reformulation X of binary vectors each of length rrz such that for any 1 distinct coordinates i,, t2,..., i, and any 1 bit vector y there exists a x E X such that for all j = 1, 2 . f , 1, x = y,. From such a collection of vectors, we let B = X and define d,‘= (x ~“Xlx, = l}. In order to show that this set system has the required property, let 1 5 i, < i, < .** < i, I m be a sequence of indices and let D,,, D,,, . . . , Di be a set system such that D,, is equal to C, or the complement of C. Now define y such that y, = 0, if D = Cj , and ‘1, otherwise. From the pfioperty of X, we know that there exists 2 x E X such that x,, = y, for j = 1, 2 ..I 1. This imples that x @ D;, for all j = 1, 2,. . . ,l. Thus, D,,, D,_, . . . , D,, dk not cover B. First, let X be the set of all m bit vectors. Given an arbitrary requirement A (i.e., fix i,, i,, . . . , i, and y> the probability that a randomly chosen x E X satisfies the requirement is 2-‘. For any E > 0, Alon et al. [1990] constructed a set X, of binary vectors of length m such that for any A: I&X and IX,] = o(m*/e*>.

[x satisfies A] - PrXE ,Y,[ x satisfies All < E Hence,

if we let E = 2-(‘+I), then

p’, E *,I x satisfies A] 2 2~(‘+ ‘) > 0. Hence, for every requirement, there requirement and IX2-,,+~,] = 0(2”m*).

exists a x E X2-,,+~) that satisfies

the

•I

THEOREM 3.3. For any 0 < c < l/4, approximated within factor of c log N DTZME(n pOtytog“1.

the Set Covering problem cannot be in polynomial time unless NP C

PROOF. Follows since the construction DTIME(n”“‘Y’“g “>. 0

can be done in

COROLLARY 3.4. For any 0 < c < l/4, the Set Covering problem cannot be approximated within factor of c log N in polynomial time unless NT’ME(nP”‘Y’“~n)

We can obtain a better constant Recall the probabilistic complexity

= DTIME(nP”‘Y’ogn).

by using a somewhat classes R7”ME(t(n)),

stronger

assumption. and

coRTZME(t(n))

C. LUND AND M. YANNAKAKIS

976 ZTIME(r(n))

for a time bound r(n): A language L is in RTIME(t(n)) (and its complement is in coRTIME(t(n)) if there is a probabilistic Turing machine M with time bound t(n), such that, if an input x is in L, then M accepts x with probability at least 2/3, and if x is not in L, then M rejects x with probability 1. (The probability 2/3 is arbitrary; the class remains the same if we use any number between 0 and 1.) The class ZTIME(t(n)) = RTIME(t(n)) 0 coRTIME(t(n)) is the class of languages L that have a so-called Las-Vegas algorithm: A probabilistic algorithm that determines (correctly) with probability at least 2/3 whether the input x is in L, and outputs a ? otherwise (i.e., the algorithm never produces an erroneous answer). THEOREM 3.5. For any 0 < c < l/2, the Set Corvring problem approximated within factor of c log N in polynomial time unless

cunnot be

PROOF. The only difference compared to the proof above, is that we use a different construction for the set system &5’,,,.,.Instead of using a pseudorandom construction, as in Lemma 3.2, we use a truly random one. Let IBl = (1 + 1 In m + 2)2’ and let the subsets C,, C?,. . . , C,,, be randomly chosen. Then, it follows by simple calculations that (B; C,, C?,. . . , C,,,> has the required properties with probability at least 2/3. Note that the size of B is roughly the square root of the size of the construction in Lemma 3.2; this gives the improvement in the constant. The construction uses many copies of this one set system; thus, we get that

- If cp E SAT, then SETCOVER = IQ,1 + lQ?l. -If cp P SAT, then with probability at least 2/3, SETCOVER(3;) c log N(lQ,l + lQl>, where N = IRIIBI and 0 < c < l/2.

2

Thus, any better that c accepts when This shows

approximation algorithm for set cover with approximation ratio log N can be used to construct an algorithm for SAT that always cp E SAT and rejects with probability at least 2/3 when cp P SAT. that NP c coRTIME(n pO’y“?“). Hence, also NTIME( rzP”Y’“1:“) C coRTZME(nP”‘Y’“g “) and CONTIME(~~“‘~‘~‘~“) C RTIME(nP”‘Y’“g ‘I). Since CORTZME(~P”‘Y’~‘~“) c coNTIME(nP”‘Y“‘g “), it follows that NTIME(nPO’Y’og “> = ZTIME(nP”‘Y’“~ “).

q

This gives very tight bounds on the approximability of the Set Covering problem, since set cover can be approximated within a factor of 0.7 log N in polynomial time. Thus, the lower and upper bounds differ by only a multiplicative factor of less than 1.4. 3.2. RELATED PROBLEMS. We define some problems that are equivalent to Set Covering. Hitting Set. Given a (finite) collection 9 of subsets of a (finite) set U, find a minimum cardinality subset of U that intersects every set in _% Hypergraph Transversal. Given a hypergraph H = (V, E), find a minimum cardinality set of nodes S that covers all the edges of H, that is, every edge contains at least one member of S.

Hardness of Approximating

Minimization Problems

977

Dominating Set. Given a (directed or undirected) graph G, find a minimum cardinality set S of nodes that dominates all the nodes of the graph, where a node dominates itself and all its adjacent nodes. Minimum Exact CoL?er. Given a (finite) collection 9 of subsets of U find a subcollection 9’ that covers U and which minimizes the sum of the cardinalities of the sets in 5”‘. (This variant was called Set Coilering II in Johnson [ 1974al.)

All the above problems can be approximated within the same logarithmic factor as the Set Covering problem. Kolaitis and Thakur [1991] define syntactically a class of problems that exhibit this logarithmic behavior and that have the Set Covering problem as a complete representative. THEOREM 3.6. The following holds for each of the problems Hitting Set, Hypergraph Transversal, Dominating Set, Minimum Exact Corer. For any 0 < c < l/4, there is no polynomial time approximation algorithm with ratio clogn unless NP c DTIME(n P”‘y‘OfiI’>. PROOF. The first two problems are trivially equivalent to Set Covering. The equivalence of Dominating Set is not entirely trivial, but is well-known (see, e.g., Paz and Moran [1981]). The fourth problem can be shown by an easy reduction from Set Covering. Given an instance of Set Covering, introduce a sufficiently large number of new elements and include them in all the sets. 0 4. Conclusions We proved strong negative results on the possibility of finding efficiently good approximate solutions to the Graph Coloring, Set Covering, and other related minimization problems. For Graph Coloring, we showed that there is an E > 0 such that the problem cannot be approximated with ratio ,’ unless P = NP. The best result known previously was the fact that achieving ratio 2 is NP-hard [Garey and Johnson 19761. We also showed that, for every constant factor c > 1, there is a constant number of colors k (that depends on c) such that no polynomial time algorithm can color all k-colorable graphs with c * k colors unless P = NP. For the Set Covering problem, we showed that it cannot be approximated with ratio better than c log II with c < l/4 unless NP is contained in DTIME( n P"'Y c < unless NP is contained in n 'W These lower are surprisingly algorithm. We mention some remaining open problems now. We believe that stronger results are probably true in the case of bounded chromatic number. For instance, it should be possible at least to reverse the order of the two quantifiers k-colorable graphs one cannot achieve a constant factor approximation polynomial time unless P = NP. Perhaps this is even true for k = 3. The best algorithm currently known for 3-chromatic

3-chromatic graphs with 3 colors, colors by Khanna et al. [19931.

and this was improved

very recently

to 4

978

C.

LUND

AND

M. YANNAKAKIS

As we mentioned in the Introduction, in the case of arbitrary graphs, no algorithm is known that achieves ratio O(nl‘) for any constant c < 1. Is it NP-hard to achieve ratio O(n’> for any c < l? Other graph problems are known for which this is the case (see e.g., Lund and Yannakakis [1993] and Yannakakis 119791). Th e same question is open for the Independent Set and Clique problem. Recall that if these problems can be approximated within O(n’) for some c < 1, then the same is true of Graph Coloring. In the case of Set Covering, the lower bounds we proved are rather close to the known upper bound. However, the evidence here is somewhat weaker than P # NP. For this reduction, we used 2-prover l-round interactive proof systems. The restriction to a constant number of provers makes these proof systems very convenient for reductions. They have been used to show the nonapproximability of quadratic programming [Bellare and Rogoway 1993; Feige and Lo&z 19921, of a class of maximum subgraphs problems [Lund and Yannakakis 119931 and some others [Bellare 19931. However, currently, the results for these proof systems are not quite as strong as we would like, and as a consequence, the evidence of intractability is often weaker than P # NP (and in some cases, the bounds on the approximation ratios that are derived are not as tight as one would like). There is room for improvement by strengthening the results on proof systems that use two (or constant number of) provers [Feige and Lovasz 1992; Lapidot and Shamir 19911. The size of our construction for the Set Covering problem is polynomial in the size of the set R of random choices and the sets A, and AZ of possible answers of the provers. In the Feige-Lovasz proof system, these sets have size IZ~“‘~“~~” and hence so does our construction. In order to get a polynomial-time reduction showing nonapproximability of Set Covering within a factor of @(log NJ unless P = NP using our construction, we need new probabilistic proof systems for NP for which max (1RI, 1A, I, 1A2 I> is polynomial in n (i.e., the number of random bits and communication bits is logarithmic in n instead of polylogarithmic), and that these proof systems have similar properties as the Feige-Lovrisz proof system. The two most important of these properties are the constant number of provers and the very low error probability. For Set Covering, we need O(l/log”n) error probability, where k is the number of provers; other problems could use a smaller error probability, l/nc for some c > 0. A step in this direction was taken recently by [Bellare et al. 19931, where a 4-prover probabilistic proof system is constructed with parameters max(lR1, IA,!,. . ., IA,I) = n”‘g“‘g”, yielding the improved result that Set Covering cannot be approximated within c log II with c < l/8 unless NP is contained in fX’IME(n’“~‘“~’ ). They also showed that it suffices to use polynomial-size random set R and answer sets Ai to achieve any constant error probability, and thus Set Covering cannot be approximated within any constant factor unless P = NP. Ultimately, it would be useful to have polynomial-size two-prover proof systems for NP that achieve error probability l/nc for some c > 0. Appendix A Our transformation for the Graph Coloring problem utilizes the structure of the graphs that are constructed in the recent reductions of Arora et al. [1992] and the other papers for the nonapproximability of the independent set

Hurdness

ofApproximating

Minimization

Problems

979

problem. However, as we will point out, even if the graphs of a reduction do not have that structure, they can be easily put in that form. Thus, it would have been possible, for example, to prove earlier (say, since the middle 70s the time of the classical Garey-Johnson paper [1976]), that, if the independent set problem cannot be approximated within any constant factor, then neither can the chromatic number problem. (We note however that the construction draws in a sense on the recent techniques, in that the gadgets have a linear algebra character rather than a graph-theoretic one.) Suppose that we have a reduction from an NP-complete language L to the independent set problem such that, if the input x is in L, then the maximum independent set of the resulting graph G has size at least r, and if x is not in L, then it has size less than r/g for a “gap” g. Form the product G’ = K, x G of the complete graph on r nodes with G, as follows: Every node of G’ is a pair (i, Ll), where i = 1,. . . , r is a node of K, and L’ is a node of G. The nodes with the same first component i induce a clique C,. Two nodes (i, c), (j, w) from different cliques are adjacent iff their second components L’, w are equal or adjacent nodes of G. The graph G’ is in the desired form. It has r rows, each of which is a clique, and n(G) columns. Obviously, an independent set I’ of G’ can pick at most one node from each clique C,, and moreover the second components of the nodes of I’ form an independent set in G, and vice-versa. Thus, cr(G’) = min(r, a(G)). If x is in L, then cr(G’) = r, and if x is not in L, then a(G’) < r/g. In general, given an arbitrary graph G, suppose that we form the product G’ = K, x G of a complete graph on r nodes with G, and then apply our transformation of Section 2.1 (denote it 7,,) on G’ to obtain a graph H = T,,(K, x G). Assume that p 2 r 2 a(G). For any graph G, we have, a(G’) = a(G). Thus, CY(H) = p’cu(G’) = p%(G), by Proposition 2.5. Therefore, ,y(H)

n(H) 2 (Y(H)

p3r = _ a(G)



(2)

If r = (Y(G), then equality holds in Eq. 2 by Proposition 2.5. This is true also if r is an integer multiple k of cu(G>: The graph K, X G consists of k copies the copies, and of K,o, X G, with some additional edges interconnecting similarly, the graph H consists of k copies of T,(KOo, X G) along with some edges connecting nodes from different copies. We can color each copy separately using a distinct set of p3 colors, and thus H can be colored with p3k colors. If r is not an integer multiple of a(G), then the inequality in Eq. 2 may be strict because of rounding effects. In any case, if k = [r/cx(G)l, then H can certainly be colored with p3k colors. Thus, the chromatic number of H lies between p3r/cr(G> and p3[r/a(G)l. We cannot get rid completely of this rounding effect for an arbitrary graph G because it is NP-hard to compute Q(G), or even to narrow the range of a(G) enough so that we can pick an r that is guaranteed to be an integer multiple that has polynomial size. Of course, we can make the relative effect of rounding as small as we want by picking a sufficiently large r. Summarizing our discussion, we have: Given an arbitrary graph G and p 2 r 2 n(G), we can graph H = Tp( K, x G) whose chromatic number satisfies:

PROPOSITION 4.1. construct

p3r/a(G)

another

I x(H)

sp31r/(r(G)l.

080

C. LUND AND M. YANNAKAKIS

That is, every graph G can be transformed in polynomial time to another graph H whose chromatic number is essentially inversely proportional to the independence number of G (except for the rounding effect). REFERENCES ALON, N., GOmIIEIC~~,O., H&TAD, J., AND PERALTA,R. 1990. Simple constructions of almost k-wise independent random variables. In Proceedings of the 3/st lEEE Symposium on Foundations of Computer Science. IEEE, New York, pp. 543-553. ARORA, S., LUND, C., MOTWANI,R., SUD,\N, M., AND SZEGE~Y, M. l9Y2. Proof verification and intractability of approximation problems. In Proceedings of the 33rd IEEE Symposium on Foundations of Computer Science. IEEE, New York, pp. 14-23. ARORA,S., ANDSAFRA, S. 1992. Approximating clique is NP-complete. In Proceedings of the 33rc1 IEEE Symposium on Foundations of Computer Science. IEEE, New York, pp. 2: 13. BELLARE, M. 1993. Interactive proofs and approximation: Reductions from two provers in one round. In Proceedings of the 2nd Israel Symposium on Theory und Computing Systems. SpringerVerlag, Berlin, pp. 266-274. BELLARE,M., GOI.DWASSER,S., LUND, C., ANI) Russet., A. 1993. Efficient probabilistically checkable proofs and applications to approximation. In Proceedings of the 2.%/rACM Synzpo.siunl 011 Theory of Computing (San Diego, Calif., May 16- 18). ACM, New York, pp. 294-304. BELLARE,M., AND ROGOW,\Y,P. 1993. The complexity of approximating a nonlinear program. In Complexity in Numerical Optimization. P. Pardalos, Ed., World-Scientific. Singapore. pp. 16-32. BERMAN,P., AND SCHNITGER,G. 198’). On the complexity of approximating the indcpcndcnt set problem. In Proceedings of the 7th Symposium on Theoretical Aspects of Computer Science. Lecture Notes in Computer Science, vol. 340. Springer-Verlag. New York, pp. 256-267. BLUM, A. 19YO. Some tools for approximate 3-coloring. In Proceedings of the 31.~1 /E/X Symposium on Foundations of Computer Science. IEEE, New York, pp. 554-562. CHVATAL,V. 1979. A greedy heuristic for the set covering problem. Math. Op. Res. 4, 233-235. DOBSON, G. 1982. Worst-case analysis of greedy heuristics for intcgcr programming with nonnegative data. Murh. Op. Res. 7, 5 15-53 I. FEIGE, U., GOLDWASSER,S., Lovkz, L., SAFRA, S., AND SZEGEIIY,M. IYYI. Approximating clique is almost NP-complete. In Proceedings of the 32nd IEEE Symposium on Foundutions of Computer Science. IEEE, New York, pp. 2-12. FEIGE, U., AND LovAsz, L. 1992. Two-prover one-round proof systems: Their power and their problems. In Proceedings of the 24th ACM Symposium on the Theory of Computing (Victoria, B.C., Canada, May 4-6). ACM, New York, pp. 733-744. FISHER, M. L., AND WOLSEY, L. A. 1982. On the greedy heuristic for continuous covering and packing prob\ems. SL4M J. Aln. Disc. Meth. 3. 584-591. G~REY, k. R., AND JOHNSON,6. S. 1976. The complexity of near-optimal graph coloring. J. ACM 23, 1 (Jan.). 43-49. GROTSCHEL,M., LbvAsz, L., AND SCHRIJVER, A. 1981. The ellipsoid method and its consequences in combinatorial optimization. Combinatorics I, 169-197. HALLD~RSSON,M. M. 1990. A still better performance guarantee for approximate graph coloring. Tech. Rep. 90-44. DIMACS. JOHNSON,D. S. 1974a. Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci., 9, 256-278. JOHNSON,D. S. 1974b. Worst case behavior of graph coloring algorithms. In Proceedings of the Sfh Southeastern Conf. on Combinatorics, Graph Theory and Computing. Utilitas Mathematics Publishers, Winnipeg, Ontario, Canada. KARP, R. M. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. R. E. Miller and J. W. Thatcher, eds., Advances in Computing Research, Plenum Press, New York. I&ANNA, S., LINIAL, N., AND SAFRA, S. 1993. On the hardness of approximating the chromatic number. In Proceedings of the 2nd Israel Symposium on Theory and Computing Systems. Springer-Verlag, New York, pp. 250-260. KOLAITIS,P. G., ANDTHAKUR, M. N. 1991. Approximation properties of np minimization classes. In Proceedings of the 6th Conference on Structure in Complexity Theory, pp. 353-366. Kou, L. T., STOCKMEYER,L. J., ANDWONG, C. K. 1978. Covering edges by cliques with regard to keyword conflicts and intersection graphs. Commun. ACM 21, 2 (Feb.), 135-139.

ofApproximating

Hurdness LAPIDOT,

D., AND

Problems

SI~,%MIR.A. 1991.

Fully parallelized

f’roceedingx of the 32nd Anrz~~alIEEE New York, pp. 13-18.

LINIAL, N., AND VAZIKANI, U. 1989. fhe 30th fEEE

Symposium

multiprover

on Foundafions

protocols for NEXPTIME. In of ComputerSciencr. IEEE,

and chromatic numbers. In Proceedings of Science. IEEE, New York, pp. 124-133. integral and fractional covers. Disc. Muth. 13,

Graph products

Symposium on Foundations

LOVASZ, L. 1975.

9Sl

of Compter

On the ratio of optimal

383-390.

LUND, C., AND YANNAK~~KIS. M. 1993. The approximation of maximum subgraph problems. In Proceedirlgs of fhe 20th Intematiotd ColloquiLtm on Automata, Lanpages and Propmming. Springer-Vcrlag, Berlin, pp. 40-51. ORLIN, J. 1977. Contentment in graph theory: Covering graphs with cliques. In Proceedings of flze Koninklijke Nederlundse. Vol. 80 Akademie van Wetenschappen, Amsterdam, The Netherlands, pp. 406-424. PAZ, A., ANU MORAN, S. 1981. Nondeterministic polynomial optimization problems and their approximations. Theoret. Cornput. Sci. 15, 25 l-277. PAPADIMITRIOU,C., AND YANNAKAKIS,M. 1988. Optimization, approximation and complexity classes. In Proceedings of the 20th ACM Symposium on the Theory of Computing (Chicago, Ill., May 2-4). ACM, New York, pp. 229-234. SIMON, H. U. 1990. On approximate solutions for combinatorial optimization problems. SfAM J. Algebraic Discrete Methods 3, 294-310. WIGDERSON,

A. 1983.

Improving the performance

guarantee

for approximate

graph coloring. 1.

ACM 30, 4 (Oct.), 729-735.

WOI.SEY, L. A. 1982. An analysis of the greedy algorithm for the submodulat set covering problem. Comhinatorica 2, 385-393. YANNAKAKIS, M. 1979. The effect of a connectivity requirement on the complexity of maximum subgraph problems. J. ACM 26, 4 (Oct.), 618-630. RECEIVEI~

JANUARY

1992 : REVISED

JULY

1993 ;

ACCEPTED

Journal of the Association

JULY

for Computing

1993

Machinery.

Vol. 41. No. 5, September

1994.