Combinatorial Optimization on a Boltzmann Machine - Semantic Scholar

3 downloads 694 Views 483KB Size Report
faction machines that learn. Tech. Rep. CMU-CS-84-119, Carnegie-Mellon University,. 1984. 13. Hinton, G. E., and Sejnowski, T. J. Learning and relearning in ...
JOURNALOFPARALLELANDDISTRIBUTEDCOMPUTING

6,331-357 (1989)

Combinatorial Optimization on a Boltzmann Machine J AN H. M. KORST Philips Research Laboratories, P.O. Box 80000,560O

JA Eindhoven, The Netherlands

AND EMILE H. L.AARTS Philips Research Laboratories, P.O. Box 80000, 5600 JA Eindhoven, The Netherlands; and Eindhoven University of Technology, P.O. Box 513,560O MB Eindhoven, The Netherlands Received January 12, 1988

We discuss the problem of solving (approximately) combinatorial optimization problems on a Boltzmann machine. It is shown for a number of combinatorial optimization problems how they can be mapped directly onto a Boltzmann machine by choosing appropriate connection patterns and connection strengths. In this way maximizing the consensus in the Boltzmann machine is equivalent to finding an optimal solution of the corresponding optimization problem. The approach is illustrated by numerical results obtained by applying the model of Boltzmann machines to randomly generated instances ofthe independent set, the max cut, and the graph coloring problem. For these instances the Boltzmann machine finds near-optimal solutions whose quality is comparable to that obtained with sequential simulated annealing algorithms. The advantage of the Boltzmann machine is the potential for carrying out operations in parallel. For the problems we have been investigating, this results in a considerable speedup over the sequential simulated annealing algorithms. 8 1989 Academic Press, Inc.

1. INTRODUCTION

Ever since Kirkpatrick et al. [ 16 ] introduced the simulated annealing algorithm as a general technique for approximately solving large combinatorial optimization problems, the algorithm has attracted much attention. Both theoretical and practical aspects have been investigated and the algorithm has been applied to a wide variety of combinatorial optimization problems in such diverse areas as engineering, VLSI design, and operational research. For a review the reader is referred to [ 4, 18 1. 331 0743-73 15/89 $3.00 Copyright 0 1989 by Academic Press, Inc. All rights of reproduction in any form reserved.

332

KORST AND AARTS

Interesting features of the simulated annealing algorithm are its general applicability and its ability to obtain solutions (arbitrarily) close to an optimum. Furthermore, the algorithm allows one to make a trade-off between the amount of computation time one is prepared to spend and the expected value of the final solution. However, as is common to most algorithms based on randomization techniques, the algorithm usually requires large amounts of computation time. Parallel implementations of the simulated annealing algorithm can reduce computation times significantly. In [ 21 two parallel simulated annealing algorithms have been introduced, viz., a systolic and a clustering algorithm. Both algorithms are generally applicable and have been shown to exhibit a linear speedup on small-scale multiprocessor systems (number of processors ~30). However, the use of these algorithms on large-scale or massively parallel multiprocessor systems is impracticable, since the conditions for systolization or clustering can no longer be met. Moreover, communication bottlenecks will drastically reduce the efficiency (see also [ 2 ] ). Recently, researchers have been attracted by the potentials of neural networks for carrying out such complex computational tasks as combinatorial optimization and learning [ 14, 191. The characteristic features of neural networks are distributed memory (reduction of communication bottlenecks) and massiveparallelism (fast computing). The model of the Boltzmann machine [ 41, introduced by Hinton et al. [ 12, 131, belongs to the class of neural network models and is a typical representative of connectionist models [ 8, 91, We argue that it provides a computational model that is suitable for massively parallel execution of the simulated annealing algorithm. In this paper we demonstrate the feasibility of solving (approximately) a number of combinatorial optimization problems with a Boltzmann machine. Our approach is based on the observation that the structure of many combinatorial optimization problems can be mapped directly onto the structure of a Boltzmann machine by choosing the right connection pattern and connection strengths. In this way maximizing the consensus in a Boltzmann machine is equivalent to finding optimal solutions of the corresponding optimization problem. The organization of the paper is as follows. In Section 2 we briefly summarize the most important aspects underlying the model of Boltzmann machines. In Section 3 we show for a number of combinatorial optimization problems how they can be implemented on a Boltzmann machine. In Section 4 numerical results are presented obtained by applying Boltzmann machine algorithms to instances of three different combinatorial optimization problems. Furthermore, a comparison is made with results obtained by sequential simulated annealing algorithms. The paper is completed with some concluding remarks (Section 5 ).

OPTIMIZATION ON A BOLTZMANN MACHINE

333

2. A MODELOFTHEBOLTZMANNMACHINE A Boltzmann machine consists of a network of simple computing elements. The computing elements are considered as logic units having two discrete states, viz., “on” or “off.” The units are connected in some way. With each connection a connection strength is associated representing a local quantitative measure for the hypothesis that the two connected units are both “on.” A consensus function assigns a real number to a configuration of the Boltzmann machine, which is completely determined by the states of all individual units. This number is a quantitative measure of the amount of consensus in the Boltzmann machine with respect to the set of underlying hypotheses. The state of an individual unit is determined by a stochastic function of the states of the neighboring units. Maximization of the consensus function corresponds to maximization of the amount of information contained within the Boltzmann machine. 2. I. Structural Description A Boltzmann machine can be represented by a undirected graph 9 = (U, e),whereU = {u,,..., uN-, } denotes the set of N units and @ is a set of unordered pairs from U denoting the connections between the units [ 41. A connection { u,, u, } E @ connects the units Ui and Uj. @ includes all loops, i.e., { { u;, Ui } 1u, E U } C G’. The state of unit U, in a configuration k is denoted by Y~( u,), where Y~( u,) equals 0 or 1 corresponding to “off” and “on,” respectively. The configuration space JI denotes the set of all possible configurations ( 1.R 1= 2 “). A connection ( Ui, u, } is called activated if rk( u,). rk( u,) = 1. A connection strength s~,~,,,) E R, assigned to connection { ui, u, 1, is defined as a quantitative measure of the desirability that the connection be activated; if si U,,l,, i % 0 then it is considered very desirable that { u,, U, } be activated; if s( U,,U, ) 6 0 it is considered very undesirable. The consensus Ck denotes the overall desirability of all the activated connections in a given configuration k and is given by ch=

2

si.,,,,rk(ui)rk(u,).

(1)

i %.U, 1 EC

A neighborhood .%!k C %! is defined as the set of configurations that can be obtained from configuration k by changing the state (0 + 1 or vice versa) of one of the units. Thus, given a configuration k, a neighboring configuration k(‘) E .%!)k is obtained by changing the state of unit u, and leaving the states of all other units unchanged. The corresponding difference in the consensus . . ACkkcl) is given by [ 3, 41

334

KORST AND AARTS

A&(J) = &(I) - C, = (1 - ~Y,J ui))(

2 i %P, i =ut

s{ u+j} rk(

uj)

+ s( u,,u,) 19

t2)

where eU, denotes the set of connections incident with unit U, ({ u;, uj} @ e,,). Given the above definition of a neighborhood, we can now define a local maximum of the consensus function @ as a configuration k E 2, where AC’/& G 0,

for all

UjEU.

(3)

2.2. Consensus Maximization Similarly to the mathematical formulation of the simulated annealing algorithm [ 4, 181, we use the theory of Markov chains to model the state transitions required to maximize the consensus function of (1) in the Boltzmann machine. In a sequential Boltzmann machine, units are allowed to change their states only one at a time. This procedure can be described as a sequence of Markov chains, each chain consisting of a sequence of trials, where the outcome of a given trial depends probabilistically on the outcome of the previous trial (the outcomes are configurations). A trial consists of two steps: given a configuration k then, first, a neighboring configuration kc’) is generated, and, second, it is evaluated whether or not kc’) is to be accepted. If it is accepted, the outcome of the trial is kc’); otherwise it is k. The probability Akk(+ c) of accepting a state transition of unit Ui, given the configuration k, is chosen as

where c denotes the value of the control parameter (c E R ’ ), ’ and ACkk(i) is given by (2). The acceptance probability Akk(i)( c) implements the sigmoid scalar response function which is typical for neural networks [ 4,9, 12, 13 1. As in the simulated annealing algorithm, maximization of the consensus function takes place by starting off at a high initial value of the control parameter c and a (randomly) chosen initial configuration. Subsequently, a sequence of Markov chains is generated by trying continuously to change the states of the individual units and applying the acceptance criterion of (4). In between subsequent Markov chains the value of c is lowered until it approaches 0. As c approaches 0, state transitions become more and more infrequent until the Boltzmann machine stabilizes in a final configuration. For sufficiently long Markov chains and for c approaching 0 the Boltz’ The control parameter c plays a role similar to that of the temperature in the physical annealing process.

OPTIMIZATION

ON

A

BOLTZMANN

MACHINE

335

mann machine stabilizes in a configuration corresponding to maximal consensus [ 41. Due to this result the Boltzmann machine is asymptotically capable of solving combinatorial optimization problems to optimality. In practical implementations, however, the asymptoticity conditions can never be attained and thus convergence to a configuration with maximal consensus is not guaranteed; i.e., the Boltzmann machine will stabilize in a configuration corresponding to a local maximum of the consensus function which might be close (or even equal) to the maximal consensus. The convergence of the Boltzmann machine is determined by a set of parameters known as the cooling schedule [ 4, 181. The parameters are the initial value of c, a decrement rule to lower the value of c, the length of the individual Markov chains, and a stop criterion that justifies termination. From (2) it is apparent that the effect on the consensus by changing the state of a unit is determined completely by the states of its neighboring units and the corresponding connection strengths. Consequently, the differences in consensus ACkkci) can be computed locally. This condition facilitates the use of parallel state transitions. In a parallel implementation of the simulated annealing algorithm units are allowed to change their states simultaneously. One might distinguish between synchronous and asynchronous parallelism. A discussion of the differences between these types of parallelism is given in [ 4 1. Here, we restrict ourselves to the statement that we consider asynchronous parallelism to be the most interesting type of parallelism since it allows for high efficiencies and renders additional synchronization utilities unnecessary. However, asymptotic convergence of the simulated annealing algorithm to globally optimal configurations while applying asynchronous parallel state transitions has not yet been proved (it is however supported by numerical results). We consider a rigorous proof of the asymptotic convergence of an asynchronously parallel implementation of the simulated annealing algorithm as an important open research topic. The implementations of Boltzmann machine algorithms, presented in Section 4, are based on a parallel implementation of the simulated annealing algorithm using an asynchronous cooling schedule; i.e., each unit has its own cooling schedule whose parameters are independent of the other units. For each unit a “conceptually simple” cooling schedule [ 181 is used with the following parameters: The initial value of the control parameter associated with unit U, is chosen as l

cp= c L{u,,u,) I + biu,,u,J 1. u, E El,,

(5)

336

KORST AND AARTS

This expression ensures a sufficiently high initial value of the control parameter, which is required for obtaining good results [ 4, 181. The decrement rule for calculating the next value, cjyl, from the previous value, c(j), of the control parameter is chosen as l

(i) cj+l

=

acj'),

(6)

where CY is a positive number smaller than but close to 1. The decrement rule is applied each time the unit has completed L trials. To take into consideration the time needed to propagate changes through the network, L is chosen as a function of the number of units. A unit stops generating trials if for Mconsecutive trials no state transition is accepted. Mis also taken to be size-dependent. l

l

The values of the parameters (Y, L, and M used in the implementations are given in Section 4.

3.SOLVINGCOMBINATORIALOPTIMIZATIONPROBLEMS Combinatorial optimization problems can be characterized by a tuple (8, s’,f), where 8 denotes the finite set of solutions; 8’ & S the set of feasible solutions, i.e., the set of solutions satisfying the constraints that go with the problem; andf: S + lR a cost function that assigns a real number to each solution. The problem now is to find a feasible solution i for which the cost functionf( i) is optimal (either minimal or maximal). To use the model of the Boltzmann machine for solving a combinatorial optimization problem, a one-to-one correspondence is defined between the solutions of the combinatorial optimization problem and the configurations of a Boltzmann machine. For this reason the optimization problem must be formulated as a O-l programming problem; i.e., each solution is uniquely represented by a finite set of variables { x0, . . . , x,,-, } , with Xi E { 0, 1 } for all i. Given such a formulation, a Boltzmann machine is defined with N units. The value of a variable xi corresponds one-to-one to the state of unit u;. In this way, each configuration of the Boltzmann machine uniquely defines a a solution of the combinatorial optimization problem and vice versa. Furthermore, a connection pattern and connection strengths are chosen such that the following properties hold: (P 1) All local maxima of the consensus function correspond to feasible solutions; and (P2) the better the cost of a feasible solution, the higher the consensus of the corresponding configuration.

OPTIMIZATION ON A BOLTZMANN MACHINE

337

As a Boltzmann machine always stabilizes in a local maximum of the consensus function, property P 1 guarantees that the corresponding solution will be a feasible one. If both properties hold, then the consensus is maximal for configurations corresponding to an optimal feasible solution. Furthermore, under the same conditions, near-optimal local maxima of the consensus function correspond to near-optimal feasible solutions. This is very important, since only in the asymptotic case is a Boltzmann machine guaranteed to reach maximal consensus. Many combinatorial optimization problems belong to the class of NPcomplete problems (if formulated as decision problems). As all problems in the class of NP-complete problems are polynomially transformable into each other, it suffices, in theory, to show that it is possible to solve one NP-complete problem with a Boltzmann machine. We could then solve any NPcomplete problem by transforming it to the problem for which a Boltzmann machine implementation is known. However, Boltzmann machine implementations tailored to a specific problem are usually more efficient. Therefore, we discuss Boltzmann machine implementations for a number of problems. We show how three combinatorial optimization problems can be implemented on a Boltzmann machine according to the general prescription described above. These problems are the independent set, the max cut, and the graph coloring problem. For others we indicate how they can be transformed efficiently into one of these problems, using well-known results from complexity theory [lo], and we show the structure of the corresponding Boltzmann machine. How Boltzmann machines can be used for solving traveling salesman problems is discussed in [ 31. 3.1. The Max Cut Problem DEFINITION (Max Cut Problem). Given a graph G = ( V, E) with positive weights on the edges, find a partition of V = { 1, . . . , II } into disjoint sets Vi and VZ such that the sum of the (positive) weights of the edges from E that have one endpoint in I’, and one endpoint in V, is maximal.

To formulate the max cut problem as a O-l programming problem we define the following variables. Let wii be defined as the weight associated with the edge { i, j} (by definition, wii = Wji for all i, i E V, and wii = 0 for all { i, j)@E)andletO-1variable Xi be defined by

(7) Then the max cut problem can be formulated as choosing the vector (x,, . . . ) xn) to maximize

KORST AND AARTS

338

f= i i Wij{(l -Xi)Xj+Xi(l

-Xi)}.

(8)

i=l j=i+l

Clearly, any solution corresponds uniquely to a partition of V into V, and VZ. Hence, the set of solutions is identical to the set of feasible solutions. To implement the max cut problem on a Boltzmann machine we introduce for each variable Xi a unit Ui. The state of unit U, determines the value of the corresponding variable Xi, such that Xi = 1 if ui is “on” and X, = 0 if Ui is “off.” Next, a set of connections is chosen such that the cost function of (8) is implemented. For this purpose the set of connections is taken as the union of two disjoint sets, viz., -Bias connections (9) -Weight

connections ew={{ui,uj}I{i,j}EE}.

(10)

The consensus function now takes the form S{,,}rk(Ui)rk(Uj).

(11)

(w, I E@w

Next, we must choose the connection strengths such that maximizing the consensus function of (11) corresponds to maximizing the cost function of ( 8 ) . Clearly, as all solutions are feasible, property P 1 holds by definition. THEOREM 1. Let bi be the sum of the weights of all edges incident to vertex i, i.e., bi = CJf=l wuandlet

s(ui,ui) = bi stw, 1= -2wjj

for

{ ui> ui} E @br

(12)

for

( Ui> Uj} E @w.

(13)

Then property P2 holds. Proof: To prove this theorem we show that the consensus function of ( 11) equals the cost function of ( 8). The consensus function can be written as ck=

2 t%.%) Eeb

bird

+

C

{Suj)E@w

-2Wijrk(Z4)rk(Uj),

(14)

OPTIMIZATION ON A BOLTZMANN MACHINE

339

which is equal to n

n

n

2 2 WijXiXj + ix, j=l

n

2 C -2WfjXiXj.

(1%

i=lj=i+l

Since wii = W,i for all i, j E V, and W,i = 0 for all i E I/, we have 5 2 Wq(Xi + X

j

)

+

5 2 -2WijXiXj,

(16)

r=lJ=i+l

i=lj=i+l

which is identical to the cost function of (8).

w

An example of a max cut problem and the corresponding Boltzmann machine are given in Fig. 1. 3.2. The Independent Set Problem DEFINITION (Independent Set Problem). Given a graph G = (I’, E), find an independent set of maximal size; i.e., find a subset I” of I/ = { 1, . . . , n } , such that for all i, j E V’ the edge { i, j} is not in E and such that 1I”1 is maximal.

In a 0- 1 programming formulation this problem can be written as follows. Let Xi be a 0- 1 variable indicating whether ( 1) or not (0) a vertex i is in I”; then the independent set problem can be written as maximize (17)

f=i:Xi, i=l

(4

(b)

FIG. 1. The max cut problem. For a weighted graph G = ( V, E) as shown in (a), a Boltzmann machine is constructed whose structure is isomorphic to G (b). The bias connections are not drawn (as is the case in the following figures) but their connection strengths are drawn near the corresponding unit. Consensus maximization gives the maximal cut indicated by the units that are “on” (encircled); i.e., these units correspond to the set V, .

340

KORST AND AARTS

subject to XiXjl?o

= 0,

i,j= I,.. .,n,

(18)

where n denotes the number of vertices, and e. E { 0, 1 } denotes whether (1) ornot(O){i,j}isinE. This problem can be implemented on a Boltzmann machine in the following way. For each vertex i E V/a unit U, is defined. The state of unit Ui corresponds to the value of variable xi. The set of connections is taken as the union of two disjoint sets, viz., -Bias connections (19) -Inhibitory

connections

The consensus function now takes the form ck=

2

(U,,Ui} l

S(u,,u,,r2%)

+

b

C

{Ui,ujl

s{,,}rk(Ui)rk(Uj).

(21)

E@,

Next, we must choose the connection strengths such that maximization of the consensus function of (2 1) corresponds to maximizing the cost function of ( 17 ) subject to the constraint of ( 18 ) (see also Fig. 2 ) . T HEOREM 2. Let /3 be a positive real number and let

S( S”ii =P

for

{ ULV

S(u,.u,l < -P

fir

{ u;, Uj} E e;.

Ui} E @LIE

c-4

(23)

Then the properties P 1 and P2 hold.

F IG. 2. The independent set problem. For a graph G = (V, E) a Boltzmann machine is constructed whose structure is isomorphic to G. After maximizing the consensus the units that are “on” (encircled) make up the maximal independent set.

OPTIMIZATION ON A BOLTZMANN MACHINE

341

Proof: To prove this theorem we distinguish between two disjoint subsets of the configuration space of the Boltzmann machine, i.e., JJ = BA U .RB with BP, n RB = 125, where .RnA and SE, denote the sets of configurations corresponding to feasible and infeasible solutions, respectively. Then, it can be proved straightforwardly that the property P 1 holds by showing that 3k”’ E Y& : ACkk(l) > 0.

VkESB

(24)

If k E XTzB then an inhibitory connection { Ui, Uj} is activated. Changing the state of one of the units Ui and Uj, say Ui, will change the consensus with ACkk(i) = --s(.~+~) - s(,+, 1> 0. Property P2 follows immediately from the fact that for any configuration k E %A (i.e., for any feasible solution) the consensus function can be written as ck=

c

S{ui,u,)rZ(%)

(25)

f%Ui}E@b n

n =

C

Brk(%)

UiE u

=

2 OXi i=l

(26)

which scales linearly with the cost function of (17). If /3 is chosen equal to 1 the consensus function is identical to the cost function. n Equivalent problems, such as the vertex cover problem and the clique problem, can be solved on a Boltzmann machine in a similar way. The construction is given by reduction of the problems from the independent set problem [ 10 ] . 3.3. The Graph Coloring Problem D EFINITION (Graph Coloring Problem). Given a graph G = (V, E), find a minimal coloring; i.e., find a mapping g : V-+ { 1,2, . . . , k} such that g( i) Z g(j) for all { i, j } E E, and k minimal. It can be verified easily that the number of colors necessary to color a graph is bounded by A + 1, where A denotes the maximum degree of the graph [I 11. To formulate the problem as a 0- 1 programming problem, we first introduce a set of positive weights W = { wI, . . . , We,, } , such that for a given coloring of G each vertex contributes Wi to the cost function if it is given color i. So, with each color i a weight Wi is associated. A coloring is defined as a feasible solution of the graph coloring problem. To be able to discriminate between colorings that use a different number of colors, different colors must be given different weights. In this way, the

KORST AND AARTS

342

cost function is large whenever few colors (each color with a large weight) are used to color G. However, if different colors are given different weights, then the cost function usually will differ for colorings that are essentially identical, in the sense that only a different permutation of colors is used. Let the maximal value of the cost function that can be obtained by permuting the complete set of colors for a given coloring i be denoted by K(i) and let the number of different colors in a given coloring i be denoted by n( i) . Furthermore, let Xii be a 0- 1 variable indicating whether (1) or not (0) vertex i is given color j, and eik denotes whether ( 1) or not (0) there is an edge between the vertices i and k. The following theorem states that given the above assumptions, and given the following condition on the set of weights W, the graph coloring problem can be formulated as a 0- 1 programming problem, in the sense that a feasible solution of the O-l programming problem, for which the cost function is maximal, uniquely defines a minimal coloring. THEOREM 3. Let W = { wI, , w*+~ } be a set of positive weights that satisfy the recursive relation

wj+~ w2 > . . . > w*+i.

(32)

It can be shown that (27) induces (32); i.e., (32) poses no further restrictions on constraint ( 27 ) . Since we may restrict ourselves to colorings for which (28) is maximal over the set of colorings that can be obtained from coloring i by permuting the complete set of colors, we have

n,(i)2nz(i)2= ..a Z=n,+,(i).

(33)

Consequently, K(i) can be written as Cj”=:’ Winj( i). Now, we must show that (27) provides a sufficient condition to ensure that K(i) > K(j) for any two colorings i and j with n(i) = k and n(j) = k + 1. For a coloring i, with n(i) = k, the lower bound on K(i) is given by

IVI k

(34)

This follows directly from Chebyshev’s inequality, which states that

n 5 aibi 2 i ai 5 bi, i= I

1=l

(35)

i=l

whenever

a, 2 a2 > . . . 2 a,

(36)

b, 3 b2 2 . . . z= b,.

(37)

and

For a coloring j, with n(j) = k + 1, the upper bound on K(j) is given by

344

KORST AND AARTS k+l K(j) K(j) if the weights wi satisfy ( 27 ) . Furthermore, if w, is chosen sufficiently large and w2, . . . , WA+] are chosen close to the upper bounds given by (27)) then all weights can be positive. n If the constraints of (27) and (29)-( 30) are satisfied, then the cost function of (28 ) is maximal for a solution corresponding to a minimal coloring. Provided that wI is chosen sufficiently large, it is possible to construct a set of positive weights W = { wl, . . . , WA+, } for every graph such that the constraints of (27) are satisfied. However, for large graphs these constraints lead to a very large difference between wI - w2 and w1 - WA+]. Due to this large difference, the Boltzmann machine converges only slowly to near-optimal results. Furthermore, the constraints of (27 ) are unnecessarily restrictive for most problem instances. Therefore, we relax these constraints by choosing the weights according to Wj+* < 2Wj - WI)

j= 1 , - **, A.

(39)

For any number of colors it is possible to find positive weights, even integer ones, that satisfy these constraints, e.g., by choosing wI sufficiently large and choosing the other weights according to wj+, = 2Wj - wI - 1. If for instance seven colors are used, the weights wI through w7 can be chosen as 100,99, 97, 93, 85, 69, and 37, respectively. Using the constraints of (39) results in a much faster convergence when implemented on a Boltzmann machine. However, now a solution for which (28) is maximal does not necessarily correspond to a minimal coloring. In our experience, these situations hardly occur and moreover, if they do occur, the constraints of ( 39) still yield nearoptimal colorings. The graph coloring problem is implemented on a Boltzmann machine in the following way (see Fig. 3). A structure is chosen consisting of A + 1 (horizontal) layers, each layer corresponding to a specific color. The number of units in each layer is equal to the number of vertices in the original graph. So, for each vertex i and each colorj a unit uii is defined. The state of unit uii determines the value of variable x,. To ensure that the same color is not assigned to adjacent vertices, (horizontal) inhibitory connections are defined to connect units within the same layer. To ensure that exactly one color is

OPTIMIZATION ON A BOLTZMANN MACHINE

(b)

(4

345

(cl

FIG. 3. The graph coloring problem. For a graph G = (V, E) as shown in (a), a Boltzmann machine is constructed consisting of A + 1 layers. Each layer is isomorphic to G and corresponds to a specific color (b) . All units corresponding to the same u E V are mutually connected (c) . For reasons of clarity the number of layers is here chosen smaller than A + 1. The encircled units correspond to a minimal coloring.

assigned to each vertex, all units corresponding to the same vertex are mutually connected by (vertical) inhibitory connections. So, the set of connections is taken as the union of three disjoint sets, viz., -Bias connections

-Horizontal inhibitory connections (41)

@h={{Uij,Ukj}I{i,k}EE},

-Vertical inhibitory connections

@v=((uij,ui~)I(iEV)A(jfl)},

(42)

wherej,lE{l,...,A+l>. Next, we choose the connection strengths such that maximization of the consensus function corresponds to maximizing the cost function of (28) subject to the constraints of (29)-( 30) and (27) or (39). This is formulated in the following theorem. T HEOREM 4. Let the weights Wj bepositive real numbers and chosen such that either (27) or (39) is satisfied, and let

for S{u,,.u,,) < -9

for

{ uij? uij} E @b,

(43) (44)

KORST AND AARTS

346

S( uir,u,,} < -min { Wj9 w,}

fir

Then properties P 1 and P2 are satisfiedfor

{ Uij, Uil} E @u.

(45)

the corresponding problems.

Proof: To prove the theorem we distinguish between four subsets of the configuration space of the Boltzmann machine, i.e., % = %!A U JIB U Y& U SD, with SA n (Y?, U Y& U Yi?,) = Izr, where the subsets are defined as follows: (1) aA denotes the set of configurations corresponding to feasible solutions, (2) 3, = {kE -7213i,j, I: rk(uij) = 1 A rk(ulj) = 1 A ejl = l} denotes the set of configurations for which at least two adjacent vertices have been given the same color, (3) %c= {kE.B’(3i,j,1:rk(uij)= 1 Ark(~)= 1 AjZf}denotes the set of configurations for which at least one vertex has been given two or more colors, and (4 ) %‘D = ( k E 5% I3i : C,Y,’ rk( uij) = 0 } denotes the set of configurations for which at least one vertex has not been given a color. It can be proved that property PI is satisfied by showing that VkE(3?~U&-UR~)

ilk”’ E 9?

k

with

A@&) > 0. (46)

-IfkE3?Bthen3i,j,1:rk(uii)= 1 Ark(q)= 1 Aejl= l.Insucha configuration k the horizontal inhibitory connection { uii, uri} is activated. From the definition of the strength of a horizontal inhibitory connection it is apparent that changing the state of one of the units uii and u0 will increase the consensus. -IfkEJ&then3i,j,I:rk(U0)=1 Ark(ujl)= 1 Aj#f.Insucha configuration k the vertical inhibitory connection { uii, uir} is activated. The strength of a vertical inhibitory connection is chosen such that changing the state of unit uii (assuming that w, < wI) will increase the consensus. --If k E XD\Y& then 3i : C,?,’ uii = 0. In such a configuration k no color is assigned to vertex vi. Given that the number of layers is chosen equal to A + 1 and given that k 4 Xc, then there must exist a layer j in which unit u0 can be turned “on” without activating any inhibitory connections. Clearly, this will increase the consensus. Property P2 follows immediately from the fact that for any configuration k E W, (i.e., for any feasible solution), the consensus function can be written as

OPTIMIZATION ON A BOLTZMANN MACHINE

=

A+l

n

2

C

j-1

i=l

(48)

wjXij,

which is identical to the cost function of ( 28 ).

347

n

3.4. The Clique Partitioning and the Clique Covering Problem The clique partitioning problem and clique covering problem are closely related to the graph coloring problem [lo]. Here, we restrict ourselves to presenting the problem formulations. The relation between the clique partitioning problem and the graph coloring problem and the relation between the clique covering problem and the clique partitioning problem [ 17 ] (and thus the graph coloring problem) are illustrated in Figs. 4 and 5, respectively. DEFINITION (Clique Partitioning Problem). Given a graph G = ( V, E) , find a minimal partition into cliques; i.e., find a minimal partition VI, V,, . . . ) vk of V such that each V, induces a complete subgraph of G . DEFINITION (Clique Covering Problem). Given a graph G = ( V, E) , find a minimal clique cover; i.e., find a minimal number of subsets V, , VZ, . . . , vk of V such that each V, induces a complete subgraph of G and such that for each { i,j} E E there is some V, that contains both i andj.

4. NUMERICAL RESULTS To demonstrate the practical use of the model of the Boltzmann machine for approximating combinatorial optimization problems, we implemented

(4

(b)

F IG. 4. The clique partitioning problem. For a graph G = (V, E) as shown in (a), a Boltzmann machine is chosen reflecting the structure of the complementary graph G’ = (V, E’) as shown in (b), where E’ = { { i,j ) 1i, j E VA { i, j ) 4 E} . A coloring of G’ with a minimal number of colors corresponds directly to a partitioning of G into a minimal number of cliques. Note that the graph in (b) equals the graph used in Fig. 3.

348

KOF$#T AND AARTS

(4

(b)

F IG. 5. The clique covering problem. For a graph G = ( V, E) as shown in (a), a clique covering is equivalent to a clique partitioning of the graph G’ shown in (b), which is defined as follows. Each vertex in G’ corresponds to an edge in G and two vertices in G’ are joined by an edge if the endpoints of the corresponding edges in G form a clique. A minimal clique partitioning of G’ then yields a minimal clique covering of G. Note that G’ is isomorphic to the graph used in Fig. 4.

Boltzmann machine algorithms for three different combinatorial optimization problems discussed above, viz., the independent set problem, the max cut problem, and the graph coloring problem. The results obtained with the Boltzmann machine algorithms were compared with results obtained with sequential implementations of the simulated annealing algorithm. Both the Boltzmann machine and the simulated annealing algorithms were implemented in PASCAL on a VAX 1 1 / 785 computer. The problem instances were given by randomly generated graphs G = (I/, E), with a fixed set of vertices I/ ( ) I/ ) = 50, 100, 150,200,250). The edges were chosen independently and with a probability p to be in E [ 61. The probability p was chosen equal to 1 O/( ( V 1- 1)) such that the expected degree of the vertices equals 10. In this way, the average connectivity is the same for all problem instances. Thus, we concentrate on investigating the time complexities of both algorithms as a function of the number of vertices only. The dependence of the performance on the average connectivity is not considered here, since it depends strongly on the machine architecture of the parallel machine on which the model is eventually emulated. For each number of vertices five different problem instances were generated, resulting in a total of 25 problem instances. For the max cut problem edges were given a randomly chosen integer weight between 1 and 10. In the Boltzmann machine algorithm parallelism is achieved as follows. During each trial of the optimization process a subset S of units is randomly chosen, containing a fixed fraction q of the total set of units. For each unit Ui E S the corresponding AC’kk(i) is calculated, based on the configuration k, the outcome of the previous trial. Next, the states of the units in S are adjusted according to the acceptance probability of (4)) resulting in a configuration 1 as the outcome of the present trial. The fraction q is chosen equal to 3. The implementations of the simulated annealing algorithms are discussed briefly in the Appendix. The parameters of the cooling schedule of the sequential simulated annealing algorithms are chosen such that high-quality

OPTIMIZATION ON A BOLTZMANN MACHINE

349

solutions are obtained. We mention that iterative improvement algorithms, using the same generation mechanisms (see the Appendix), find solutions for the problem instances at hand, which are on the average 20% worse in cost than the solutions obtained by simulated annealing. The parameters of the cooling schedule (see Section 2) of the Boltzmann machine are chosen such that the final results obtained by the Boltzmann machines are comparable to the results obtained by simulated annealing. In this way the discussion on the performance can be concentrated solely on the computation times. For both implementations the computation times are proportional to the number of executed trials. The parameters of the cooling schedule of the Boltzmann machine are chosen as follows: a = 0.95, L= aIVl,andM= 1OL. For each problem instance both the Boltzmann machine algorithm and the simulated annealing algorithm are run 10 times, each time using a different initial configuration, to obtain reliable statistics. Average final results of the value of the consensus c for the Boltzmann machine algorithm and the value of cost functionJfor the simulated annealing algorithm, as discussed in the Appendix, together with the corresponding standard deviations cc and u,-, are given in Tables I, II, and III for the various problems. The values of the consensus C and the cost functionf can be compared directly, as both algorithms always produced feasible solutions. In that case, the consensus C and the cost functionf are identical. The tables furthermore give the average computation times tand the corresponding standard deviations ut for both algorithms. The computation times of the Boltzmann machine algorithm are obtained by dividing the computation times of the sequential implementation by the number of units that operate in parallel. From the tables the following conclusions can be drawn. First, we observe that the Boltzmann machine is able to obtain results comparable in quality to those obtained by simulated annealing. This in itself is not trivial. For the traveling salesman problem, for instance, it turned out to be impossible to obtain comparable results [ 3, 71 in reasonable amounts of computation time, for reasons which will be discussed below. To give an indication of the quality of the final results obtained by both algorithms, we mention that for the independent set problem the algorithms were able to find independent sets which are on average three times as large as the expected average independent set. The expected average size of an independent set for this type of random graph equals 1V I/ 10 [ 6 1. Furthermore, we observe that, for the problem instances we investigated, the Boltzmann machine yields a speedup over the sequential simulated annealing algorithm ranging from 20 up to 400. From the tables the average-case time complexities can be estimated. For the max cut problem, the average-case time complexities of the simulated annealing algorithm and the Boltzmann machine are estimated to be

350

KORST AND AARTS TABLE 1

RESULTSOFBOLTZMANNMACHINEANDSIMULATEDANNEALINGIMPLEMENTATIONS FOR 25 PROBLEMINSTANCESOFTHEMAXCUTPROBLEM

Simulated

Boltzmann

annealing

machine

Problem instance

7

?I

i

61

c

UC

T

01

N50N 1 N50N2 N50N3 N50N4 NSON5 NIOONI NlOON2 NlOON3 NlOON4 NlOON5 N150NI N150N2 N150N3 N150N4 N150N5 N200N L N200N2 N200N3 N200N4 N200N5 N250N 1 N250N2 N250N3 N250N4 N250N5

985.6 924.4 932.4 1022.8 1023.2 1901.1 1856.8 1861.7 1922.5 1799.5 2837.5 2996.6 3105.3 2918.0 2943.4 3730.0 3974.8 4118.0 3811.6 3925.1 5183.0 5343.6 5074.2 5338.4 5078.4

13.0 15.7 6.6 10.1 9.8 9.2 17.5 8.6 11.0 13.9 8.8 15.0 15.1 18.0 20.4 23.4 18.7 4.2 14.8 16.9 26.5 13.6 13.3 22.6 15.0

3.1 2.7 2.7 3.3 3.2 11.0 11.0 9.7 10.7 10.4 21.3 24.5 27.1 23.3 19.7 36.4 41.6 35.1 36.6 41.8 67.9 63.4 59.0 64.4 63.2

0.5 0.3 0.2 0.7 0.3 1.4 1.6 1.0 1.0 1.9 3.6 4.8 5.2 4.7 2.8 4.7 7.0 3.4 5.2 6.4 12.0 6.5 10.0 17.6 12.3

990.4 926.5 930.1 1032.4 1028.3 1912.2 1865.0 1863.5 1918.8 1806.7 2844.7 2996.2 3113.1 2906.1 2957.5 3731.2 3964.8 4114.3 3817.7 3927.5 5195.0 5335.9 5068.4 5345.6 5096.0

7.3 8.1 8.0 3.0 2.5 7.1 13.0 8.3 13.0 10.2 12.3 20.9 18.7 17.8 26.9 24.0 26.9 25.3 18.3 14.2 14.6 29.5 21.6 21.7 11.2

0.2 0.1 0.2 0.1 0.1 0.4 0.4 0.3 0.4 0.3 0.6 0.6 0.6 0.7 0.7 0.7 1.0 0.7 0.9 0.9 1.3 1.4 1.3 1.2 1.2

0.06 0.04 0.05 0.06 0.06 0.10 0.05 0.07 0.11 0.08 0.18 0.08 0.15 0.07 0.14 0.14 0.15 0.12 0.11 0.13 0.22 0.12 0.12 0.28 0.15

-

Nore. The instances are coded NxNy, where x denotes the number of vertices and an arbitrary sequence number. The remaining symbols are explained in the text.

y denotes

O( II/ 12.0) and O( 1V 1‘.4), respectively. For the independent set problem, they are estimated to be O( 1VI I.‘) and O( IVI ‘J ), respectively. For the graph coloring problem, they are estimated to be O( ) VJ 1.6) and 0 ( I V I I.’ /( A + 1))) respectively. As the time needed to carry out a trial is constant for both algorithms the time complexities do not depend on the exact implementation of the algorithms. We mention that the Boltzmann machine algorithm uses a conceptually simple cooling schedule whereas the simulated annealing algorithm uses a more elaborate one [ 181. By tuning the cooling schedule of the Boltzmann machine algorithm, one might even improve the estimated time complexities of the Boltzmann machine algorithm.

351

OPTIMIZATION ON A BOLTZMANN MACHINE TABLE II RESULTSOFBOLTZMANNMACHINEANDSIMULATEDANNEALINGIMPLEMENTATIONS FOR~SPROBLEMINSTANCESOFTHEINDEPENDENTSETPROBLEM Simulated

annealing

Boltzmann

0.0

3.4 3.2 3.3 3.4 3.6

0.3 0.2 0.2 0.2 0.2

12.9 13.9 14.9 13.8 13.0

0.3 0.3 0.3 0.6

0.9 0.5 0.8 0.5 0.3

31.4 31.3 31.7 28.7 29.9

machine

Problem N50Nl N50N2 N50N3 N50N4 N50N5 NlOONl NIOON2 NIOON3 NlOON4 NlOON.5 N150Nl N150N2 N150N3 N150N4 N150N5 N200N I N200N2 N200N3 N200N4 N200N5 N250N I N250N2 N250N3 N250N4 N250N5

13.0 13.9

0.3

0.1 0.1 0.1 0.1 0.1

0.02 0.02 0.02 0.02 0.02

0.5 0.8 0.5 0.6 0.3

0.2

0.2

0.03 0.03 0.03 0.04 0.03

0.7 0.5 0.4 0.6

0.03 0.03 0.02 0.03 0.03

15.0

0.0

13.6 12.9

0.7 0.3

31.3 30.8 31.3 29.2 29.7

0.4

11.2

0.7 1.0

10.7 10.5 10.6

0.5

10.1

44.6 45.1 41.3 45.0 45.2

0.5 0.6

21.1 22.2 22.1 21.5 20.1

1.4 1.5 1.2 1.6

44.8 45.5 41.8 45.2 45.0

0.8

0.3 0.3 0.3 0.2 0.3

64.3 60.1 61.3 66.1 62.8

0.6 0.7 0.6

33.3 36.2 35.8 33.1 35.0

2.2 2.6 2.4 3.3 1.5

64.5 60.5 61.2 66.0 62.5

0.8 0.5 0.7 1.4 0.9

0.3 0.4 0.4 0.3 0.3

0.03 0.04 0.03 0.06 0.04

76.3 76.2 17.6 73.4 17.3

0.8

52.8 53.5 51.8 54.8 52.3

4.6 3.0 3.4 3.9 4.2

76.9 16.2 11.5 73.9 77.3

0.7

0.4 0.5 0.5 0.4 0.5

0.03 0.04 0.02 0.05 0.06

0.5

0.8 0.8 0.7

1.1 0.4

1.0 0.1 0.7 0.6

I.1

0.0

1.1 0.7 0.5 0.6

0.1 0.2

0.1

Note. The instances are coded NxNy, where x denotes the number of vertices and y denotes an arbitrary sequence number. The remaining symbols are explained in the text.

5 . DI S C U S S I O N

AND

CO N C L U S I O N S

In this paper we have discussed the feasibility of solving (approximately) combinatorial optimization problems on a Boltzmann machine. The approach presented is based on a mapping of 0- 1 programming problems onto the structure of a Boltzmann machine by carefully choosing the connection network and the corresponding connection strengths of a Boltzmann machine. The approach is illustrated for a number of well-known combinatorial optimization problems. The architectural complexity of the Boltzmann machine for a number of problems is indicated in Table IV, where 1V 1and 1E 1 denote the number of vertices and edges, respectively, and A denotes the maximal degree of the graph associated with the problem.

352

KORST AND AARTS

TABLE III RESULTSOFBOLTZMANNMACHINEANDSIMULATEDANNEALINGIMPLEMENTATIONS FOR 25 PROBLEMINSTANCESOFTHEGRAPHCOLORINGPROBLEM Simulated

annealing

Boltzmann

machine

Problem instance

f

Cl

i

61

c

UC

i

@I

N50Nl N50N2 NSON3 N50N4 N50N5 NlOONl NlOON2 NlOON3 N100N4 NlOON5 Nl50Nl Nl50N2 Nl50N3 N 150N4 N l50N5 N2OON 1 N200N2 N200N3 N2OON4 N2OON5 N250N I N250N2 N250N3 N250N4 N250N5

6.2 5.3 6.0 6.4 6.2 6.0 5.6 5.7 5.8 5.9 5.9 6.0 6.0 5.9 5.9 5.9 5.9 5.9 5.9 5.8 6.0 6.0 5.9 6.0 6.0

0.6 0.5 0.0 0.5 0.4 0.0 0.5 0.5 0.4 0.3 0.3 0.0 0.0 0.3 0.3 0.3 0.3 0.3 0.3 0.4 0.4 0.0 0.3 0.0 0.0

83.3 82.7 82.5 84.1 81.6 243.3 234.7 241.6 241.0 239.0 443.6 446.1 449.1 452.6 450.4 686.7 696.2 684.0 685.1 701.9 997.4 983.2 990.9 990.0 991.1

3.8 2.9 2.5 3.5 3.5 8.5 4.6 4.9 5.2 4.4 8.0 8.7 14.5 11.6 11.9 9.9 12.5 11.6 14.3 14.5 25.5 15.5 13.0 22.3 17.5

6.1 5.9 5.8 6.1 6.1 6.2 6.1 5.6 6.1 5.4 5.7 5.8 6.3 5.7 5.9 5.8 6.1 5.9 5.6 5.6 5.8 6.2 6.0 6.1 5.9

0.3 0.7 0.4 0.3 0.3 0.4 0.5 0.5 0.3 0.5 0.5 0.4 0.5 0.5 0.7 0.4 0.3 0.7 0.5 0.5 0.4 0.4 0.0 0.3 0.7

0.6 0.5 0.6 0.5 0.5 1.2 1.2 1.2 1.2 1.2 1.9 1.9 1.9 1.8 1.8 2.5 2.6 2.6 2.5 2.5 3.3 3.3 3.2 3.2 3.2

0.03 0.07 0.04 0.07 0.07 0.06 0.05 0.06 0.06 0.04 0.07 0.09 0.08 0.08 0.12 0.07 0.06 0.12 0.05 0.05 0.10 0.12 0.10 0.08 0.11

Note. The instances are coded NxNy, where x denotes the number of vertices and y denotes an arbitrary sequence number. The remaining symbols are explained in the text.

Results obtained with implementations of the Boltzmann machine algorithm, simulating synchronous parallelism, are presented for the max cut, the independent set, and the graph coloring problem. The results are based on randomly generated problem instances and include a comparison with a sequential implementation of the simulated annealing algorithm. From these results we reach the following conclusions. Final solutions can be obtained by the Boltzmann machine which are comparable, in quality, to the solutions obtained by simulated annealing. For all three problems the Boltzmann machine uses considerably less computation time, provided the model is emulated on a parallel computer.

OPTIMIZATION ON A BOLTZMANN MACHINE

353

TABLE IV ARCHITECTURALCOMPLEX~TYOFTHEBOLTZMANNMACHINEFORVARIOUS COMBINATORIALOFTIMIZATIONPROBLEMS Problem Max cut Independent set Vertex cover Clique Graph coloring Clique partitioning

Number of units @I VI) @(I VI) @I VI)

ai VI) @((A+ 111 VI) o((A + 111 VI)

Number of connections

@(I VI + IEI) Qlvl+ IEI)

O(l VI + IElI WlV12- IEI) o(A21Vl +AIEI) fW21~I+A(l~12- IEl))

Note. For each problem the required number of units and connections is given, where 1 VI denotes the number of vertices, I E I denotes the number of edges, and A denotes the maximal degree of the associated graph.

From a comparison with previous work, in which a Boltzmann machine was used for solving traveling salesman problems [ 3 1, we conclude that it is much harder to obtain near-optimal results for the traveling salesman problem than for the graph problems discussed in this paper. Two reasons can be given to explain this feature. First, choosing appropriate connection strengths for the traveling salesman problem is difficult (cf. [ 31). If the strengths are chosen to meet property P 1, the convergence of the Boltzmann machine is slow due to the fact that the difference in consensus between good and bad tours is relatively small compared to the differences in consensus between tours and nontours. Choosing the connection strengths such that these differences are large results in a situation where final results are often infeasible. Second, transferring a given tour into another one, using the given Boltzmann machine formulation, requires in many cases a number of steps in which configurations are visited corresponding to infeasible solutions, which often have low values of the consensus. This increases the probability of becoming trapped in configurations corresponding to locally optimal tours whose length deviates substantially from that of an optimal tour. Evidently, the deficiency mentioned above depends strongly on the construction that is used for the traveling salesman problem in the Boltzmann machine. It is, however, hard to think of other, more efficient constructions for this problem. As an overall conclusion we state that the Boltzmann machine is able to obtain results that are comparable to the results obtained by the simulated annealing algorithm in less time. This result becomes more significant when Boltzmann machines are directly put on silicon, where each connection is hardwired. In this way the annealing process can be performed extremely fast using analog devices which add up the incoming charge and perform the

354

KORST AND AARTS

stochastic decision making by using noise. The design of these hardwired networks has been the subject of study for some time. For instance, Hopfield and Tank [ 15 ] recently introduced networks with linear analog neurons performing a gradient descent in a continuous configuration space (unlike Boltzmann machines, which perform a stochastic descent in a discrete configuration space). Their results of computer simulations for the traveling salesman problem are comparable to our results [ 31. Recently, Alspector and Allen [ 51 presented a design of a VLSI chip with 5 X 10’ gates, implementing a Boltzmann machine consisting of approximately 2000 units (this design is also suited for learning tasks). They estimate that their chip will run about a million times faster than simulations on a VAX. Optical implementations of the Boltzmann machine such as that proposed by Ticknor and Barrett [ 201 might even further increase this factor by some orders of magnitude . Our final conclusion is that the model of Boltzmann machines is promising for solving (approximately) combinatorial optimization problems. However, more theoretical analysis and large-scale practical experience are needed to judge the real impact of the model.

APPENDIX

In this appendix we describe briefly the application of the simulated annealing algorithm to the max cut, the independent set, and the graph coloring problem. In applying the simulated annealing algorithm one commonly resorts to an implementation in which a sequence of Markov chains is generated at descending values of a control parameter [ 4, 181. Individual Markov chains are generated by continuously trying to transform a current configuration into a next configuration by applying a generation mechanism and an acceptance criterion. Application of the simulated annealing algorithm requires specification ofthree distinct items: (i) a concise problem representation, (ii) a transition mechanism, and (iii) a cooling schedule. We shall now elaborate on these items in more detail. (i) A concise description of the problem representation consists of a configuration representation and an expression for the cost function. (ii) Transformation of the current configuration into a next one involves three steps: First, a new configuration must be generated from the current one, which is done by the generation mechanism. Second, the difference in cost between the two configurations must be calculated. Third, a decision must be made on whether or not the new configuration is to be accepted. Here we use the standard acceptance criterion for deciding upon

OPTIMIZATION

ON

A

BOLTZMANN

MACHINE

355

acceptance of the new configuration. According to this criterion the probability of accepting a new configuration is given by [ 4, 181 Pr accept =

I

1

if

Af30

exp( Aflc) i

f

Af= ((i,j}]iE V1 AjE V2}.

(51)

-New configurations are generated by randomly choosing a vertex i E Vand moving it from V, to V2 if i E V, or vice versa if otherwise. The Independent Set Problem -Configurations are represented by partitions of the set V = { 1, . . . , n } into the sets V’ and V - V’. -The cost function is chosen as

356

KORST AND AARTS

f = IV’( -X(E’(, where E’ denotes the set of edges ( i, j } E E with i, j E V’, and X denotes a weighting factor which must be larger than 1. The precise choice of this factor is not critical. In our implementations we use A = 1.1. Feasible solutions will contribute only to the first term of the cost function. The second term is used as a penalty function. -New configurations are generated by randomly choosing a vertex i E Vand moving it from V’ to V - V’ if q E I” or vice versa if otherwise.

The Graph Coloring Problem -Configurations are represented by a partitioning of the set vertices V into the sets I’, , V,, . . . , V,,, , where A denotes the maximal degree of the graph G . -The cost function is chosen as

f = C Wi(l J’il - XlEiO, i= I

(53)

where Wi denotes the weight assigned to color i (given by (39)), and where Ei denotes the set of edges { j, k} E E with j, k E V,. As in the independent set problem, the second term in the cost function is again a penalty term and feasible solutions will contribute only to the first term. Here too, the choice of the value of the weighting factor X is not critical. We again use X = 1.1. -New configurations are generated by randomly choosing a vertex i E I/and moving it from the current subset to one of the others. RE F E R E N C E S I. Aarts, E. H. L., and van Laarhoven, P. J. M. Statistical cooling: A general approach to combinatorial optimization problems. Phi&J. Rex 40 (1985), 193. 2. Aarts, E. H. L., de Bont, F. M. J., Habers, J. H. A., and van Laarhoven, P. J. M. Parallel implementations of the statistical cooling algorithm. Integration 4 (1987), 209. 3. Aarts, E. H. L., and Korst, J. H. M. Boltzmann machines for travelling salesman problems. European J. Oper. Res. 39 (1989), 79. 4. Aarts, E. H. L., and Korst, J. H. M. SimulatedAnnealingand Boltzmann Machines. Wiley, Chichester, 1988. 5. Alspector, J., and Allen, R. B. A neuromorphic VLSI learning system. In Losleben, P. (Ed.), Advanced Research in VLSI: Proc. 1987 Stanford Conference. MIT Press, Cambridge, MA, p. 313. 6. Bollob&, B. Random Graphs. Academic Press, London, 1985. 7. Cervantes, J. H., and Hildebrant, R. H. Comparison of three neuron-based computation schemes. Proc. International Conference on Neural Networks, San Diego, p. 651.

OPTIMIZATION ON A BOLTZMANN MACHINE

357

8. Feldman, J. A., and Ballard, D. H. Connectionist models and their properties. Cognitive Sci. 6 (1982), 205. 9. Fahlman, S. E., and Hinton, G. E. Connectionist architectures for artificial intelligence. Computer(Jan. 1987), 100. 10. Garey, M. R., and Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco, 1979. 11. Harary, F. Graph Theory. Addison-Wesley, Reading, MA, 1968. 12. Hinton, G. E., Sejnowski, T. J., and Ackley, D. H. Boltzmann machines: Constraint satisfaction machines that learn. Tech. Rep. CMU-CS-84-119, Carnegie-Mellon University, 1984. 13. Hinton, G. E., and Sejnowski, T. J. Learning and relearning in Boltzmann machines. In Rumelhart, D. E., McClelland, J. L., and the PDP Research Group (Eds.). ParuNelDistributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. Bradford Books, Cambridge, MA, 1986, p. 282. 14. Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. U.S.A. 79 (1982), 2554. 15. Hopfield, J. J., and Tank, D. W. Neural computation ofdecisions in optimization problems. Biol. Cybernet. 52 (1985), 141. 16. Kirkpatrick, S., Gelatt, C. D., Jr., and Vecchi, M. P. Optimization by simulated annealing. Science220(1983),671. 17. Kou, L. T., Stockmeyer, L. J., and Wong, C. K. Covering edges by cliques with regard to keyword conflicts and intersection graphs. Comm. ACM21 (1978), 135. 18. Laarhoven, P. J. M. van, and Aarts, E. H. L. SimulatedAnnealing: Theoryanddpplications. Reidel, Dordrecht, 1987. 19. Rumelhart, D. E., McClelland, J. L., and the PDP Research Group (Eds.). PurallelDistributed Processing: Explorations in the Microstructure of Cognition, Vols. 1, 2. Bradford Books, Cambridge, MA, 1986. 20. Ticknor, A. J., and Barrett, H. H. Optical implementations in Boltzmann machines. Opt. Engrg. 26(1987), 16. JAN H. M. KORST studied mathematics at the Delft University of Technology, Delft, The Netherlands (M.Sc. in 1984). Since 1985 he has been with Philips Research Laboratories, Eindhoven, The Netherlands. His research interests include combinatorial optimization, artificial intelligence, and VLSI design. EMILE H. L. AARTS studied mathematics and physics at the University of Nijmegen, The Netherlands (M.Sc. 1979). He earned a Ph.D. in physics (1983) from the University of Groningen, The Netherlands. Since 1983 he has been with Philips Research Laboratories, Eindhoven, The Netherlands, and as of 1987 became a consultant at the Eindhoven University of Technology. His research interests include combinatorial optimization.