An Overview of Evolutionary Computation 1 ... - Semantic Scholar

An Overview of Evolutionary Computation Xin Yao Computational Intelligence Group Department of Computer Science University College, The University of New South Wales Australian Defence Force Academy, Canberra, ACT, Australia 2600 Email: [email protected]

Abstract

This paper presents a brief overview of the eld of evolutionary computation. Three major research areas of evolutionary computation will be discussed; evolutionary computation theory, evolutionary optimisation and evolutionary learning. The state-of-the-art and open issues in each area will be addressed. It is indicated that while evolutionary computation techniques have enjoyed great success in many engineering applications, the progress in theory has been rather slow. This paper also gives a brief introduction to parallel evolutionary algorithms. Two models of parallel evolutionary algorithms, the island model and the cellular model, are described.

1 Introduction The eld of evolutionary computation has grown rapidly in recent years [1, 2, 3]. Engineers and scientists with quite dierent backgrounds have come together to tackle some of the most dicult problems using a very promising set of stochastic search algorithms | evolutionary algorithms (EAs). There are several dierent types of EAs; genetic algorithms (GAs) [4, 5], evolutionary programming (EP) [6, 7] and evolution strategies (ESs) [8, 9]. Each type has numerous variants due to dierent parameter settings and implementations. The answer to the question which EA is the best is problem dependent. There is no universally best algorithm which can achieve the best result for all problems. One of the challenges to the researchers is to identify and characterise which EA is best suited to which problem. EAs have two prominent features which distinguish themselves from other search algorithms. First, they are all population-based. Second, there is communications and information exchange among individuals in a population. Such communications and information exchange are the result of selection, competition and recombination. A simple EA can be summarised by Figure 1, where the search operators are often called genetic operators in GAs. Dierent representation or encoding schemes, selection schemes, and search operators will de ne dierent EAs. For example, GAs normally use crossover and mutation as search operators, while EP only uses mutation. GAs often emphasise genetic evolution, while EP puts more emphasis on the evolution of behaviours. This work is partially supported by a University College Special Research Grant. The paper is based on an invited tutorial given at ICYCS'95. To be published in Chinese Journal of Advanced Software Research (Allerton Press, Inc., New York, NY 10011), Vol. 3, No. 1, 1996.

1

1. Generate the initial population P (0) at random, and set i = 0; 2. REPEAT (a) Evaluate the tness of each individual in P (i); (b) Select parents from P (i) based on their tness in P (i); (c) Apply search operators to the parents and get generation P (i + 1); 3. UNTIL the population converges or the time is up Figure 1: A Simple Evolutionary Algorithm.

1.1 Evolutionary Algorithms as Generate-and-Test Search

Although EAs are often introduced from the point of view of survival of the ttest and from the analogy to natural evolution, they can also be understood through the framework of the generateand-test search. The advantage of introducing EAs as a type of generate-and-test search algorithms is that the relationships between EAs and other search algorithms, such as simulated annealing (SA), tabu search (TS), hill-climbing, etc., can be made clearer and thus easier to explore. Under the framework of generate-and-test search, dierent search algorithms investigated in arti cial intelligence, operations research, computer science, and evolutionary computation can be uni ed together. Cross-interdisciplinary studies are expected to generate more insights into the search problem in general. A general framework of generate-and-test search is shown by Figure 2. 1. Generate the initial solution at random and denote it as the current solution; 2. Generate the next solution from the current one by perturbation; 3. Test whether the newly generated solution is acceptable; (a) Accepted it as the current solution if yes; (b) Keep the current solution unchanged otherwise. 4. Goto Step 2 if the current solution is not satisfactory, stop otherwise. Figure 2: A General Framework of Generate-and-Test. It is quite clear that various hill-climbing algorithms can be described by Figure 2 with dierent strategies for perturbation. They all require the new solution to be no worse than the current one to be acceptable. SA does not have such a requirement. It regards a worse solution to be acceptable with certain probability. The dierence among classical SA [10], fast SA [11], very fast SA [12], and a new SA [13] is mainly due to the dierence in their perturbations, i.e., methods of generating the next solution. EAs can be regarded as a population-based version of generate-and-test search. They use search operators like crossover and mutation to perturb the current solutions, and use selection to decide whether a solution is acceptable. From this point of view, it is clear that we do not have to limit 2

ourselves to crossover and mutation. In theory, we can use any search operators as long as they perform well on the given representation of the problem they are dealing with. This is also true for selection. In practice, a good way to tailor the generate-and-test search to the problem we are interested in is to incorporate problem speci c heuristic knowledge into search operators and selection schemes [14]. The rest of this paper is organised as follows: Section 2 discusses some theoretical issues in evolutionary computation, including the classical schema theorem, convergence of EAs, neighbourhood search and landscape analysis, and computational complexity. Section 3 gives an overview of evolutionary optimisation. Some useful crossover and mutation operators, and selection schemes will be discussed. Some hybrid algorithms which combine EAs with other search algorithms will also be discussed in terms of global versus local search and search biases. Section 4 is on evolutionary learning. An introduction to classi er systems is given. Population-based machine learning is mentioned with an emphasis on co-evolutionary learning. Section 5 describes three common models of parallel EAs. Finally, Section 6 concludes with a summary of the paper.

2 Some Theoretical Issues in Evolutionary Computation One of the most important questions in evolutionary computation is: Why does and why doesn't the evolutionary algorithm work? Many eorts have been made towards answering the question since the early days of EAs. One typical example is Holland's schema theorem for GAs [4]. Later, GAhardness has been investigated by many researchers in terms of GA deceptive problems [5]. In recent years, analysis of neighbourhood structures and landscapes has attracted increasing attentions from both the evolutionary computation eld [15] and other elds like SA [16, 17, 18, 19].

2.1 The Schema Theorem

The schema theorem was rst proposed by Holland [4] to explain how GAs work by propagating similarity templates in populations. A schema is a similarity template describing a subset of strings with similarities at certain string positions. Without loss of generality, let's assume that we are working with binary strings, i.e., our alphabet is f0; 1g. To generate schemata, we introduce a don't care symbol into the alphabet. Now we have an extended alphabet f0; 1; g. Any string generated by this new alphabet is a schema. A schema matches a binary string if 1 matches 1, 0 matches 0, and matches either 1 or 0 at all corresponding positions. Schemata oer us a more accurate way to describe and discuss similarities among strings. In order to examine how schemata are propagated from a generation to another, two concepts need to be introduced. The order of a schema H , denoted as o(H ), is the number of xed (non-) positions in the schema. The de ning length of a schema H , denoted as (H ), is the distance between the rst and the last xed position in the schema. It facilitates our discussion if we view one generation of a GA as having two steps; the rst is selection and the second is crossover and mutation [20]: Current Population selection =) Intermediate Population crossover;mutation =) New Population

2.1.1 Impact of Selection on Schema Propagation

Let f (H ) be the average tness of the samples in the population which contain schema H , f be the average tness of the population, m(H; t) be the number of samples in the population which contain

3

schema H at generation t. Then the impact of roulette wheel selection on schema propagation is ) m(H; t + intermediate) = m(H; t) f (fH

2.1.2 Impact of Crossover on Schema Propagation

Let pc be the one-point crossover probability. Then the the impact of crossover on schema propagation is f ( H ) f ( H ) m(H; t + 1) = (1 ? pc )m(H; t) f + pc m(H; t) f (1 ? losses) + gains The gains in the above formula represent the cases where a new sample containing schema H is generated from two samples containing no H through crossover. For simplicity, we ignore this term. We also make a conservative estimation of losses and assume that crossover within the de ning length of the schema is always disruptive although this is not always true. Hence we get the following inequality: ) + p m(H; t) f (H ) (1 ? disruptions) m(H; t + 1) (1 ? pc )m(H; t) f (fH c f The probability of disruptions can be calculated easily as ) Probdisruptions = l (?H1) 1 ? n1 m(H; t) f (fH where n is the population size. After re-arrange the formula we have ) 1 ? p (H ) 1 ? 1 m(H; t) f (H ) m(H; t + 1) m(H; t) f (fH cl?1 n f

2.1.3 Impact of Mutation on Schema Propagation

Let pm be the bit- ipping mutation probability. We can calculate the probability of schema H 's survival (i.e., not disrupted) as follows.

Probsurvival = (1 ? pm )o(H )

2.1.4 Schema Theorem for the Simple Genetic Algorithm

Putting the previous three sub-sections together, we have the schema theorem for the simple GA with one-point crossover, bit- ipping mutation and roulette wheel selection [4, 5, 20]: ) 1 ? p (H ) 1 ? 1 m(H; t) f (H ) (1 ? p )o(H ) m(H; t + 1) m(H; t) f (fH cl?1 m n f It shows that short, low-order, above-average schemata receive exponentially increasing samples in subsequent generations. While this is a very nice property of the GA in theory, the practical use of the theorem is limited. It is worth noting that the above schema theorem replies on a large population and sucient number of generations to get a satisfactory statistical estimation of a schema's tness. In other words, the calculation of f (H ) is very dicult and noisy in practice. It is unclear how the noise aect schema propagation. 4

2.2 Convergence of Evolutionary Algorithms

Convergence discussed in this section refers to the global convergence of various EAs. It can be described as

Prob fXn 2 S nlim !1

g=1 S = fX j X 2 S; fX fY 8Y 2 S g where Xn is the solution at time n, fX is the tness of X , S is the whole search (solution) space, and S is the set of global optima.

It has been shown that such global convergence can be established under certain conditions [21, 22, 23, 24, 25, 26]. As indicated by Rudolph [26], EAs can be classi ed into elitist and nonelitist. Elitist EAs are those which always copy the best individual in the current generation to the next generation. Such elitist EAs include EP for numerical optimisation [7, 23], ( + )-ESs [9, 27], and elitist GAs [5, 28]. It is not dicult to analyse the global convergence of these elitist EAs using Markov chains [29]. The techniques used are very similar to those used to show the global convergence of SA [30, 31]. For non-elitist EAs, the analysis of global convergence is not straightforward. It has been shown that the simple GA [5] (also called the canonical GA) without elitism cannot converge to global optima regardless of the objective function and crossover operators used [25]. In general, however, non-elitist EAs may still converge if certain conditions can be satis ed [26]. While convergence is an important issue for EAs, it has rather limited role in practice and in guiding the design of new EAs. From the point of view of algorithm design, we are more interested in the computational complexity of EAs for a particular problem. A result similar to that for simulated annealing [32] would be very valuable.

2.3 Computational Complexity

Computational complexity is one of the most important issues in the analysis of algorithms. To our best knowledge, the only paper on this topic is by Hart and Belew [33]. The analysis of computational complexity for EAs is dicult because it must be done for a particular problem. It does not make sense when we discuss the computational complexity of an EA without indicating which problem the EA is applied to. Fogel [34, 35, 36] has carried out some empirical studies on the computational complexity of his EP in solving selected travelling salesman problems (TSPs). He found that the time used to arrive at a good solution for TSPs increased polynomially in the number of cities. This is an interesting experimental result and needs further investigation. It is well-known that no polynomial approximate algorithm exists for the TSP which guarantees the performance unless P = NP , a very unlikely event [37]. This result does not contradict Fogel's experimental result because the theoretical result is about the worst case time complexity and the experimental result is about the estimated average case time complexity. No performance guarantee was given in Fogel's studies. It is still an open issue whether an EA oer any advantage over the classical optimisation algorithms for TSPs in terms of average case time complexity. This is a very dicult problem as the average case analysis is already dicult to conduct for a deterministic algorithm, let alone for a population-based stochastic algorithm.

2.4 GA Deceptive Problems and GA-Hardness

Although few researchers in the evolutionary computation eld have been working on computational complexity, there are many people working on GA deceptive problems and GA-hardness [5, 38, 39]. 5

The motivation for such studies is to identify hard problems for GAs and develop new algorithms to combat them. Nearly all GA deceptive problems investigated so far are arti cial and created by human beings. The research has provided some insights into how GAs work and what makes a problem hard to solve by GAs. However, the question of how to identify whether a given problem is hard or easy for GAs remains open. What we need to concentrate on should be the studies in characterising problems and the relationships between the problems and the genetic operators and tness functions used by GAs, so that we have a clear picture of whether a problem is hard for a particular GA. Recent work in the analysis of neighbourhood structures and landscapes is an eort made towards this direction.

2.5 Analysis of Neighbourhood Structures and Landscapes

Analysis of neighbourhood structures and landscapes has been carried out for many algorithms like SA [16, 17, 18, 19, 40] and TS [41]. Jones [15] recently proposed a formal model of landscapes for EAs. The model make it possible to analyse various EAs with dierent search operators, especially crossover operators. It provides a convenient way to compare some hill-climbing algorithms. A measure of search diculty, i.e., tness distance correlation, is given. Jones [15] claimed that the measure \is a remarkably reliable indicator of problem diculty for a genetic algorithm on many problems taken from the genetic algorithms literature, even though the measure incorporates no knowledge of the operation of a genetic algorithm. This leads to one answer to the question `What makes a problem hard (or easy) for a genetic algorithm?' The answer is perfectly in keeping with what has been well known in Arti cial Intelligence for over thirty years." It is interesting to see how well this model of landscapes and the measure work on problems other than those \taken from the genetic algorithms literature".

3 Evolutionary Optimisation Evolutionary optimisation is by far the most active area in evolutionary computation. Most published papers and technical reports are on optimisation. EAs have been applied to both combinatorial and numerical optimisation problems, but the results are mixed. Most papers have reported very positive results obtained from EAs (too many to cite them all here). However, there are a few papers which contain some negative results about EAs [42, 43, 44]. This is not surprise because no algorithm can be ecient for all problems [45]. The answer to the question whether an EA is ecient is problem dependent. Most problems attacked by evolutionary optimisation are unconstrained optimisation problems. Michalewicz [46, 47] has achieved some very good results with his EAs on unconstrained optimisation problems. Constraint handling is still one of the unsolved problems in evolutionary optimisation. Simply adding a xed penalty term in the tness function does not work as well as one would hope. Michalewicz [46, 47, 48] has proposed a number of methods for handling constraints in evolutionary optimisation. Research in evolutionary optimisation mainly concentrates on the encoding scheme, search operators (genetic operators), and selection mechanisms for an optimisation problem. Tuning various parameters of an EA has been one of the major tasks in experimental studies. 6

It is generally accepted by most researchers that binary encoding is no longer the only or the best scheme for encoding either numerical or non-numerical values (e.g., permutations). The best encoding scheme diers from problem to problem. This is especially true for combinatorial problems where a problem normally has an inherent structure which should be captured by the encoding scheme.

3.1 Mutation

Mutation has traditionally been regarded as a secondary operator for GAs and a primary operator for EP and ESs. For numerical optimisation, real number encoding coupled with adequate mutation operators can perform at least as well as binary encoding. Some well-known mutation operators applicable to real numbers are Gaussian mutation and Cauchy mutation. These mutation operators all have the form of Xoffspring = Xparent + where is a random number generated by a Gaussian, a Cauchy, or even a uniform distribution. Such mutation can be applied to each dimension of a real vector independently by adding a onedimensional random number with certain distribution. It can also be applied to the real vector by adding a random vector with certain distribution. These two methods are dierent although some people tend to mix them up. For example, adding an independent one dimensional Gaussian random number to each dimension of an n dimensional real vector is quite dierent from adding an n dimensional Gaussian random vector to the same n dimensional real vector. Most implementations of EP and ESs use the former method. The above mutation could also be implemented in binary-encoded EAs. All we need to do is to decode a binary string into a real number rst. Then the mutation operator is applied to the real number. After mutation, the real number is encoded back into binary strings. This is quite dierent from traditional bit- ipping mutation for binary strings. Bit- ipping mutation has a uniform probability to change any bit in a binary string. Borrowing an idea proposed for SA [17, 18], we can used either a Gaussian or Cauchy distribution to determine the number of bits which will be ipped for a binary string.

3.2 Crossover

Crossover is also known as recombination. It is a primary operator in GAs. There have been a lot of discussions on the usefulness of crossover in the literature. The answer to the question whether crossover is useful is problem dependent. The encoding scheme has a crucial impact on the eectiveness of a crossover operator. The usefulness of a crossover operator has to be addressed with respect to an encoding scheme. In general, if the encoding scheme encourages the propagation and formation of useful \building blocks" under a crossover operator, the the crossover operator is eective for this encoding scheme. Jones' work [15] has provided us a good framework to investigate various crossover operators. For the binary encoding scheme, two-point crossover and uniform crossover have been empirically shown to be eective for many problems [49], especially for numerical optimisation problems. Some combinatorial optimisation problems, such as the TSP and various scheduling problems, require certain kinds of order-based crossover operators [5, 50, 51]. The crossover operator for a combinatorial optimisation problem often needs to be designed speci cally for the problem. Such design may not be easy because combinatorial problems usually have some constraints which must be satis ed. The crossover operator should handle these constraints and at the same time promote the formation of useful building blocks and reduce disruption 7

to the building blocks. The design of crossover operators is still an art rather than a science. It is also less mature that mutation design.

3.3 Selection

The selection mechanisms used by most EAs can be classi ed into three major categories: 1. Fitness proportional (roulette wheel) selection, 2. Rank based selection, and 3. Tournament selection. Roulette wheel selection is the probably the oldest one among the three. It was widely used in the early days of GAs. Let f1 ; f2; : : :; fn be the tness values of the n individuals in a population. The roulette wheel selection scheme assigns probability Pi to individual i according to the following formula: Pi = Pnfi j =1 fj

Roulette wheel selection is very simple, but it has diculty in dealing with super-individuals and tends to converge prematurely without additional scaling methods. These de ciencies lie in its dependence on the raw tness value. Rank-based selection does not depend on the raw tness of an individual. It depends on the rank of an individual in the population. There are several variants of rank-based selection, such as Baker's [52, 53] and Whitley's [54] linear ranking and Yao's [51] nonlinear ranking. Dierence rank-based selection mechanisms have dierent selection pressures. These selection pressures are the same across generations because they do not depend on the raw tness of individuals. Both roulette wheel and rank-based selection are global in the sense that all the individuals in a population have to participate in calculation of selection probabilities. This is undesirable for parallel EAs because it increases the communication overheads among individuals from dierent processors. Tournament selection is a better candidate for parallel EAs. It is a local selection mechanism because the probability of selecting an individual depends only on a subset of the whole population. In fact, tournament selection implements an approximate form of ranking [20]. One typical example of tournament selection is Boltzmann tournament selection proposed by Goldberg [5]. Fogel's competition scheme [7] is also a type of tournament selection.

3.4 Hybrid Algorithms

GAs are very good at coarse-grained global search, but are rather poor at ne-grained local search. One way to improve GA's performance is to combine it with another algorithm which has better local search ability. SA has been used as such a local searcher for GAs [55]. The proposed genetic annealing algorithm [55] compared favourably with the GA or SA alone. Muhlenbein et al. [56] used a simple hill-climbing algorithm as the local searcher for his parallel GA and also achieved very good experimental results. Every search algorithm, except for uniform random search, introduces some kind of bias into its search. Dierent algorithms have dierent biases. Evolutionary algorithms with dierent operators also have dierent biases. There have been some discussions on the biases introduced by various crossover operators in the genetic algorithm literature [57]. It is such biases that make an algorithm very ecient for one class of problems (i.e., the biases lead to the ecient search of a class of tness landscapes) but not for others. Since we do not know the shape and characteristics of the tness 8

landscape of a given new problem, it is very dicult to decide which search algorithm (i.e., which bias) to use to solve the problem. We probably do not want any bias in such a case. Ideally, we would like some kind of adaptive search operators which can change search bias dynamically during search. Another way to reduce possible detrimental eects of the bias introduced by a single algorithm is to use a hybrid algorithm which includes dierent biases introduced by several dierent algorithms. Kido et al. [58] described a hybrid algorithm combining GAs, SA and TS and achieved very good results for a TSP. They showed experimentally that GA+SA+TS performed better than either GA+SA or GA+TS. This seems to indicate that a greater diversity of biases in GA+SA+TS enables search to be conducted eciently in dierent regions of the search space.

4 Evolutionary Learning

4.1 An Introduction to Learning Classi er Systems

Learning Classi er Systems (LCS) are also known as Classi er Systems (CS). They are a particular class of message-passing, rule-based systems [59]. They can also be regarded as a type of adaptive expert system that uses a knowledge base of production rules in a low-level syntax that can be manipulated by a genetic algorithm [60]. In a CS, each low-level rule is called a classi er. A CS proposed by Holland [59] can be described by Figure 3. The general operational cycle for the classi er system is as follows: 1. Allow the detectors (input interface) to code the current environment status and place the resulting messages on the message list. 2. Determine the set of classi ers that are matched by the current messages. 3. Resolve con icts caused by limited message list size or contradictory actions. 4. Remove those messages which match the conditions of ring classi ers from the message list. 5. Add the messages suggested by ring messages to the list. 6. Allow the eectors (output interface) that are matched by the current message list to take actions in the environment. 7. If a payo signal is received from the environment, assign credit to classi ers. 8. Goto Step 1.

4.1.1 Rules and Messages

Each rule is a simple message processor: Conditions look for certain kinds of messages and, when the conditions are satis ed, the action speci es a message to be sent. Messages are normally coded by symbolic strings. The alphabet of the symbols is f0; 1; #g, where # is a \don't care" symbol which matches either 1 or 0. For example, 1### matches any length 4 messages starting with 1. An example of classi er can be described as: condition-1, 111#####

condition-2 / action (message) [strength] 10010010 / 00101101 [56] 9

Message List match

Rule List

input message

Input interface detectors

output message

post Bucket brigade (adjusts rule strengths) Genetic algorithm (generates new rules)

Output interface effectors

Payoff

Environment

Figure 3: An overview of a classi er system [59].

10

One message is more speci c than another if its condition is more speci c than another's. One condition is more speci c than another if the set of messages that satisfy the one is smaller than the set of messages that satisfy the other. The usefulness of a classi er is determined by its strength and updated by the credit assignment scheme, which is based on its average usefulness in the contexts in which it has been tried previously. All classi ers whose conditions are satis ed have to compete for the right to post their messages. Competition provides a simple, situation-dependent means of resolving con icts between classi ers. The actual competition is based on a bidding process. The bid can be treated as some proportion of the classi er's strength. The bid ratio is determined by the number of speci c bits divided by the total number of bits in a message. That is, the competition favours more speci c classi ers.

4.1.2 Default Hierarchies

An important feature of classi er systems is the possible adaptive formation of default hierarchies, i.e., layered sets of default and exception rules. A CS organises default hierarchies by favouring exception rules over defaults in its con ict resolution and credit assignment schemes. The simplest example of a classi er hierarchy consists of only two classi ers: 1. The rst (\default") one has a relatively unspeci c condition and provides action that is correct in most cases and incorrect sometimes. 2. The second (\exception") one is satis ed only by a subset of the messages satisfying the rst classi er and its action generally corrects errors committed by the rst classi er. The speci c classi er both provides the correct action and saves the general classi er from a mistake when it prevents the general classi er from winning. The exception classi er may in turn make mistakes that can be corrected by even more speci c classi ers, and so on. Such default hierarchies can be learned by the classi er system.

4.1.3 Credit Assignment

Credit assignment is a very dicult task because credit must be assigned to early-acting classi ers that set the stage for a sequence of actions leading to a favourable situation. The most famous credit assignment algorithm is bucket brigade algorithm which uses metaphors from economics [59]. For a classi er called middleman, its suppliers are those classi ers that have sent messages satisfying its conditions, and its consumers are those classi ers that both have conditions satis ed by its message and have won their competition in turn. When a classi er wins in competition, its bid is actually apportioned to its suppliers, increasing their strengths by the amounts apportioned to them. At the same time, because the bid is treated as a payment for the right to post a message, the strength of the winning classi er is reduced by the amount of its bid. Should the classi er bid but not win, its strength remains unchanged and its suppliers receive no payment. Winning classi ers can recoup their payments from either wining consumers or the environment payo.

4.1.4 Classi er Discovery by GAs

A GA is used in CSs to discover new classi ers by crossover and mutation. The strength of a classi er is used as its tness. The GA is only applied to the classi ers after certain number of operational cycles in order to approximate strengths better. There are two approaches to CSs; the Michigan approach and the Pitts approach. For the Michigan approach, each individual in a population is a classi er. The whole population represents a complete CS. For the Pitts approach, 11

each individual in a population represents a complete CS. The whole population includes a number of competing CSs.

4.2 Population-Based Machine Learning

Population-based machine learning includes CSs, evolutionary arti cial neural networks (EANNs) [61], and other evolutionary [62, 63] or non-evolutionary [64] systems. EANNs can be considered as a combination of arti cial neural networks (ANNs) and EAs. EAs have been introduced into ANNs at three dierent levels: the evolution of connection weights, architectures, and learning rules. At present, most work on EANNs concentrates on the evolution of architectures, i.e., connectivities of ANNs [65, 66, 67]. Very good results have been achieved for some arti cial and real-world benchmark problems.

4.2.1 Co-evolutionary Learning

Co-evolutionary learning has two dierent forms. The rst one indicates the situation where two populations are evolved at the same time [68]. The tness of an individual in one population depends on the individuals in another population. There is no crossover or other information exchange between two populations. This can be regarded as co-evolution at the population level. The second form of co-evolution is at the individual level. There is only one population involved. The tness of an individual in the population depends on other individuals in the same population [69, 62, 63]. For example, the same strategy for playing an iterated prisoner's dilemma game may get quite dierent tness values depending on what other strategies are in the population. Both forms of co-evolution have a dynamic environment and a dynamic tness function. Such learning problems are more interesting and challenging than the learning problems in a xed environment.

5 Parallel Evolutionary Algorithms Existing parallel EAs belongs roughly to three major models; the memory model, the island model, and the cellular model [20, 70]. The memory model is virtually the same as the simple GA except for the selection mechanism. Tournament selection is used in the memory model of parallel EAs. A simple GA can be executed in parallel on N=2 processors, where N is the population size and is assumed to be even. Each processor is assigned two individuals. During each generation, each processor conducts two independent tournaments by sampling individuals in the population at random. The two winners replace the two existing individuals on the processor. Then crossover and mutation are applied probabilistically to the two strings on each processor. In general, more than 2 individuals can be assigned to each processor, e.g., 4, 6, 8, etc. The island model of parallel EAs is designed for coarse-grained parallel machines. It divides a large population into smaller sub-populations and assigns a sub-population to each processor. In each processor, the EA runs on the sub-population just like a sequential EA except for the occasional migration of the best individuals from and to other processors. That is, after certain generations, these processors will swap their best individuals. Parallel GAs based on this model have been implemented on transputer arrays and PVM. The cellular model of parallel EAs is designed for ne-grained parallel machines or massively parallel machines. It assigns each individual to a processor. A neighbourhood is often de ned for each processor in this case so that communications between processors is always within the neighbourhood. Selection, crossover, and mutation are applied only within the neighbourhood 12

in order to minimise the communication overheads. There have been several implementations of parallel GAs on CM5.

6 Conclusions This paper provides an overview of the evolutionary computation eld. Four major areas are reviewed; evolutionary computation theories, evolutionary optimisation, evolutionary learning, and parallel EAs. It is indicated that while EAs have achieved a lot of success in solving practical problems, the question why they work so well for these problems still remains open. More theoretical work, such as the analysis of EA's computational complexity, needs to be done. Evolutionary optimisation is by far the most researched area in evolutionary computation. It is emphasised that hybrid algorithms that combine EAs with other local search algorithms can produce better results than either EAs or local search algorithms alone. An argument based on global versus local search and search biases has been presented. This paper also gives an introduction to classi er systems and population-based machine learning. In particular, co-evolutionary learning has been singled out due to its dynamic learning environment which cannot be handled easily by traditional learning approaches. EAs are population-based algorithms and well suited for parallel implementation. Three basic parallel models of EAs have been described in this paper. Tournament selection or neighbourhoods have been used in these models to restrict global communications.

References [1] L. J. Eshelman (ed.). Proc. of the Sixth Int'l Conf. on Genetic Algorithms. Morgan Kaufmann, San Mateo, CA, 1995. [2] J. R. McDonnell, R. G. Reynolds, and D. B. Fogel (ed.). Evolutionary Programming IV: Proceedings of the Fourth Annual Conference on Evolutionary Programming. MIT Press, 1995. [3] X. Yao (ed.). Progress in Evolutionary Computation, Lecture Notes in Arti cial Intelligence, Vol. 956. Springer-Verlag, Heidelberg, Germany, 1995. [4] J. H. Holland. Adaptation in Natural and Arti cial Systems (1st MIT Press Edn). The MIT Press, Cambridge, MA, 1992. [5] D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA, 1989. [6] L. J. Fogel, A. J. Owens, and M. J. Walsh. Arti cial Intelligence Through Simulated Evolution. John Wiley & Sons, New York, NY, 1966. [7] D. B. Fogel. System Identi cation Through Simulated Evolution: A Machine Learning Approach to Modeling. 13

[8] [9] [10] [11] [12] [13] [14]

[15] [16] [17] [18]

[19]

[20]

Ginn Press, Needham Heights, MA 02194, 1991. H.-P. Schwefel. Numerical Optimization of Computer Models. John Wiley & Sons, Chichester, 1981. H.-P. Schwefel. Evolution and Optimum Seeking. John Wiley & Sons, New York, 1995. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220:671{680, 1983. H. H. Szu and R. L. Hartley. Fast simulated annealing. Physics Letters A, 122:157{162, 1987. L. Ingber. Very fast simulated re-annealing. Mathl. Comput. Modelling, 12(8):967{973, 1989. X. Yao. A new simulated annealing algorithm. Int. J. of Computer Math., 56:161{168, 1995. J. J. Grefenstette. Incorporating problem speci c knowledge into genetic algorithms. In L. Davis, editor, Genetic Algorithms and Simulated Annealing, chapter 4, pages 42{60. Morgan Kaufmann, San Mateo, CA, 1987. T. Jones. Evolutionary Algorithms, Fitness Landscapes and Search. PhD thesis, The University of New Mexico, Albuquerque, New Mexico, May 1995. G. B. Sorkin. Ecient simulated annealing on fractal energy landscapes. Algorithmica, 6:367{418, 1991. X. Yao. Simulated annealing with extended neighbourhood. Int. J. of Computer Math., 40:169{189, 1991. X. Yao. Dynamic neighbourhood size in simulated annealing. In Proc. of Int'l Joint Conf. on Neural Networks (IJCNN'92), Vol. I, pages 411{416, Beijing, November 1992. IEEE Press, Piscataway, NJ. X. Yao. Comparison of dierent neighbourhood sizes in simulated annealing. In P. Leong and M. Jabri, editors, Proc. of Fourth Australian Conf. on Neural Networks, pages 216{219, Sydney, Australia, 1993. D. Whitley. A genetic algorithm tutorial. Technical Report CS-93-103, Department of Computer Science, Colorado State University, Fort Collins, CO 8052, March 1993. 14

[21] T. E. Davis and J. C. Principe. A simulated annealing like convergence theory for the simple genetic algorithm. In R. K. Belew and L. B. Booker, editors, Proc. of the Fourth Int'l Conf. on Genetic Algorithms, pages 174{181. Morgan Kaufmann, San Mateo, CA, 1991. [22] A. E. Eiben, E. H. L. Aarts, and K. M. van Hee. Global convergence of genetic algorithms: a markov chain analysis. In H.-P. Schwefel and R. Manner, editors, Parallel Problem Solving from Nature, pages 4{12. Springer-Verlag, Heidelberg, 1991. [23] D. B. Fogel. Evolving Arti cial Intelligence. PhD thesis, University of California, San Diego, CA, 1992. [24] J. Suzuki. A markov chain analysis on a genetic algorithm. In S. Forrest, editor, Proc. of the Fifth Int'l Conf. on Genetic Algorithms and Their Applications, pages 146{153. Morgan Kaufmann, San Mateo, CA, 1993. [25] G. Rudolph. Convergence properties of canonical genetic algorithms. IEEE Trans. on Neural Networks, 5(1), 1994. [26] G. Rudolph. Convergence of non-elitist strategies. In Z. Michalewicz et al., editor, Proc. of the 1994 IEEE Int'l Conf. on Evolutionary Computation (ICEC'94), pages 63{66. IEEE Press, Piscataway, NJ, 1994. [27] I. Rechenberg. Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der Biologischen Evolution. Frommann-Holzboog Verlag, Stuttgart, Germany, 1973. [28] K. A. De Jong. An analysis of the behavior of a class of genetic adaptive systems. PhD thesis, University of Michigan, Ann Arbor, 1975. [29] D. L. Isaacson and R. W. Madsen. Markov Chains Theory and Applications. John Wiley & Sons, 1976. [30] S. Geman and D. Geman. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-6:721{741, 1984. [31] X. Yao and G.-J. Li. General simulated annealing. J. of Computer Sci. and Tech., 6:329{338, 1991. [32] G. H. Sasaki and B. Hajek. The time complexity of maximum matching by simulated annealing. Journal of the ACM, 35:387{403, 1988. [33] W. E. Hart and R. K. Belew. Optimizing an arbitrary function is hard for the genetic algoriothm. 15

[34] [35] [36]

[37] [38]

[39]

[40]

[41]

[42] [43]

[44]

[45]

In R. K. Belew and L. B. Booker, editors, Proc. of the Fourth Int'l Conf. on Genetic Algorithms, pages 190{195. Morgan Kaufmann, San Mateo, CA, 1991. D. B. Fogel. An evolutionary approach to the traveling salesman problem. Biological Cybernetics, 60:139{144, 1988. D. B. Fogel. Applying evolutionary programming to selected traveling salesman problems. Cybernetics and Systems, 24:27{36, 1993. D. B. Fogel. Empirical estimation of the computation required to discover approximate solutions to the traveling salesman problem using evolutionary programming. In D. B. Fogel and W. Atmar, editors, Proc. of the Second Ann. Conf. on Evol. Prog. Evolutionary Programming Society, La Jolla, CA, to appear, 1993. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman Co., San Francisco, 1979. R. Das and D. Whitley. The only challenging problems are deceptive: global search by solving order-1 hyperplanes. In R. K. Belew and L. B. Booker, editors, Proc. of the Fourth Int'l Conf. on Genetic Algorithms, pages 166{173. Morgan Kaufmann, San Mateo, CA, 1991. A. J. Mason. Partition coecients, static deception and deceptive problems. In R. K. Belew and L. B. Booker, editors, Proc. of the Fourth Int'l Conf. on Genetic Algorithms, pages 210{214. Morgan Kaufmann, San Mateo, CA, 1991. R. Lister. Annealing networks and fractal landscapes. In Proc. of the IEEE Int'l Conf. on Neural Networks (San Francisco), Vol. 1, pages 257{262. IEEE Press, Piscataway, NJ, March 1993. F. Glover and M. Laguna. Tabu search. In C. R. Reeves, editor, Modern Heuristic Techniques for Combinatorial Problems, chapter 3, pages 70{150. Blackwell Scienti c, Oxford OX2 0EL, 1993. L. Ingber and B. Rosen. Genetic algorithms and very fast simulated reannealing: a comparison. Mathl. Comput. Modelling, 16(11):87{100, 1992. S. Baluja. An empirical comparison of seven iterative and evolutionary function optimization heuristics. Technical Report CMU-CS-95-193, School of Computer Science, Carnegie Mellon University, September 1995. G. McMahon and D. Hadinoto. Comparison of heuristic search algorithms for single machine scheduling problems. In X. Yao, editor, Progress in Evolutionary Computation, Lecture Notes in Arti cial Intelligence, Vol. 956, pages 293{304, Heidelberg, Germany, 1995. Springer-Verlag. D. H. Wolpert and W. G. Macready. 16

[46] [47]

[48]

[49] [50]

[51] [52] [53]

[54]

[55]

[56]

No free lunch theorems for search. Technical Report SFI-TR-95-02-010, Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA, July 1995. Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, Berlin, Germany, 1992. Z. Michalewicz. A perspective on evolutionary computation. In X. Yao, editor, Progress in Evolutionary Computation, Lecture Notes in Arti cial Intelligence, Vol. 956, pages 73{89, Heidelberg, Germany, 1995. Springer-Verlag. Z. Michalewicz. Genetic algorithms, numerical optimisation, and constraints. In L. J. Eshelman, editor, Proc. of the Sixth Int'l Conf. on Genetic Algorithms, pages 151{158. Morgan Kaufmann, San Mateo, CA, 1995. L. Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York, NY 10003, 1991. D. Whitley, T. Starkweather, and D. Shaner. The traveling salesman and sequence scheduling: quality solutions using genetic edge recombination. In L. Davis, editor, Handbook of Genetic Algorithms, chapter 22, pages 350{372. Van Nostrand Reinhold, New York, NY, 1991. X. Yao. An empirical study of genetic operators in genetic algorithms. Microprocessing and Microprogramming, 38:707{714, 1993. J. E. Baker. Analysis of the eects of selection in genetic algorithms. PhD thesis, Vanderbilt University, Nashville, 1987. J. E. Baker. Adaptive selection methods for genetic algorithms. In J. J. Grefenstette, editor, Proc. of an Int'l Conf. on Genetic Algorithms and Their Applications, pages 101{111, 1985. D. Whitley. The GENITOR algorithm and selective pressure: why rank-based allocation of reproductive trials is best. In J. D. Schaer, editor, Proc. of the Third Int'l Conf. on Genetic Algorithms and Their Applications, pages 116{121. Morgan Kaufmann, San Mateo, CA, 1989. X. Yao. Optimization by genetic annealing. In M. Jabri, editor, Proc. of Second Australian Conf. on Neural Networks, pages 94{97, Sydney, Australia, 1991. H. Muhlenbein, M. Schomisch, and J. Born. The parallel genetic algorithm as function optimizer. Parallel Computing, 17:619{632, 1991. 17

[57] L. Eschelman, R. Caruana, and J. D. Schaer. Biases in the crossover landscape. In J. D. Schaer, editor, Proc. of the Third Int'l Conf. on Genetic Algorithms and Their Applications, pages 10{19. Morgan Kaufmann, San Mateo, CA, 1989. [58] T. Kido, K. Takagi, and M. Nakanishi. Analysis and comparisons of genetic algorithm, simulated annealing, tabu search, and evolutionary combination algorithm. Informatica, 18:399{410, 1994. [59] J. H. Holland. Using classi er systems to study adaptive nonlinear networks. In D. L. Stein, editor, Lectures in the Sciences of Complexity, pages 463{499. Addison-Wesley, Redwood City, CA, 1988. [60] R. E. Smith and D. E. Goldberg. Reinforcement learning with classi er systems: adaptive default hierarchy formation. Applied Arti cial Intelligence, 6:79{102, 1992. [61] X. Yao. Evolutionary arti cial neural networks. International Journal of Neural Systems, 4(3):203{222, 1993. [62] X. Yao and P. Darwen. An experimental study of N-person iterated prisoner's dilemma games. Informatica, 18:435{450, 1994. [63] P. J. Darwen and X. Yao. On evolving robust strategies for iterated prisoner's dilemma. In X. Yao, editor, Progress in Evolutionary Computation, Lecture Notes in Arti cial Intelligence, Vol. 956, pages 276{292, Heidelberg, Germany, 1995. Springer-Verlag. [64] B. W. Wah. Population-based learning: a method for learning from examples under resource constraints. IEEE Trans. on Knowledge and Data Engineering, 4:454{474, 1992. [65] X. Yao and Y. Shi. A preliminary study on designing arti cial neural networks using co-evolution. In Proc. of the IEEE Singapore Intl Conf on Intelligent Control and Instrumentation, pages 149{154, Singapore, June 1995. IEEE Singapore Section. [66] X. Yao and Y. Liu. Evolving arti cial neural networks for medical applications. In Proc. of 1995 Australia-Korea Joint Workshop on Evolutionary Computation. KAIST, Taejon, Korea, September 1995. [67] Y. Liu and X. Yao. A population-based learning algorithm which learns both architectures and weights of neural networks. Chinese Journal of Advanced Software Research (Allerton Press, Inc., New York, NY 10011), 3(1):To appear, 1996. [68] W. Daniel Hillis. Co-evolving parasites improve simulated evolution as an optimization procedure. In Santa Fe Institute Studies in the Sciences of Complexity, Volume 10, pages 313{323. Addison-Wesley, 1991. 18

[69] R. Axelrod. The evolution of strategies in the iterated prisoner's dilemma. In L. Davis, editor, Genetic Algorithms and Simulated Annealing, chapter 3, pages 32{41. Morgan Kaufmann, San Mateo, CA, 1987. [70] J. Stender (ed.). Parallel Genetic Algorithms: Theory and Applications. IOS Press, Amsterdam, 1993.

19