Choosing selection pressure for wide-gap problems

Theoretical Computer Science 411 (2010) 926–934

Contents lists available at ScienceDirect

Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs

Choosing selection pressure for wide-gap problems Tianshi Chen a , Jun He b , Guoliang Chen a , Xin Yao a,c,∗ a

Nature Inspired Computation and Applications Laboratory (NICAL), School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China b

Department of Computer Science, University of Wales, Aberystwyth, Ceredigion SY23 3DB, Wales, UK

c

Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK

article

info

Article history: Received 21 June 2008 Received in revised form 19 August 2009 Accepted 16 December 2009 Communicated by T. Baeck Keywords: Evolutionary algorithm Selection pressure First hitting time Markov chain Evolutionary computation theory

abstract To exploit an evolutionary algorithm’s performance to the full extent, the selection scheme should be chosen carefully. Empirically, it is commonly acknowledged that low selection pressure can prevent an evolutionary algorithm from premature convergence, and is thereby more suitable for wide-gap problems. However, there are few theoretical time complexity studies that actually give the conditions under which a high or a low selection pressure is better. In this paper, we provide a rigorous time complexity analysis showing that low selection pressure is better for the wide-gap problems with two optima. © 2009 Elsevier B.V. All rights reserved.

1. Introduction Evolutionary Algorithms (EAs) are powerful tools for combinatorial optimization problems [13,18], many of which are multimodal. When applying an EA to a multimodal problem, the selection pressure may affect EA’s performance greatly, and must be chosen carefully [2,4]. Theoretical investigations on this issue would be helpful to the design and applications of EAs. Therefore, the time complexity analysis, which might be the most direct way to show the impact of selection pressure on the performance of an EA, is of great interest. However, from the time complexity point of view, few work has been done to theoretically investigate the condition under which high or low selection pressure can be better. We know that quite a few optimization problems are very hard for EA, such as the NP-complete problems. An EA may take very long time. e.g., exponential numbers of generations, to find the global optima of those hard problems. Nevertheless, when dealing with some concrete instances of these hard problems, it is still possible for an EA to obtain acceptable performance if its operators (e.g., the selection operators) are chosen carefully. In this paper, we focus on choosing suitable selection pressure for EAs on a specific kind of hard-to-solve problems, the wide-gap problems, defined by He et al. [14]. Concretely, He et al. utilized the mean first hitting time of the (1 + 1) EA [5] on a problem as a measure of hardness. Given a problem and its feasible search space, if there exists a pair of solutions with similar fitness such that the (1 + 1) EA starting from the two solutions finally spend intrinsically different mean first hitting time (e.g., an exponential one versus a polynomial one) to find the global optimum, then the problem is regarded as a wide-gap problem. Many multimodal instances of the NP-complete problems can be categorized as wide-gap problems.

∗ Corresponding author at: Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK. Tel.: +44 121 414 3747; fax: +44 121 414 2799. E-mail address: [email protected] (X. Yao). 0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.12.014

T. Chen et al. / Theoretical Computer Science 411 (2010) 926–934

927

Former theoretical studies related to the selection pressure mainly focused on the characteristics of selection operators alone [2,4,7,16,21,22]. Some results also concerned the performances of the population-based EAs as a whole [16,22], but few theoretical time complexity result has been reported. Chen et al. theoretically studied the computation times of the (µ + µ) EA with truncation and 2-tournament selection on unimodal problems [3]. However, the obtained results are not enough to demonstrate the impact of selection pressure on the performance of an EA. In this paper, we concern the possibility of reducing the time complexity of EA by choosing appropriate selection pressure. We conduct a theoretical investigation on the population-based EAs (with both mutation and selection) on a wide-gap problem with two optima, and provide the first time complexity evidence showing theoretically that low selection pressure is better than high selection pressure on some wide-gap problems. The rest of the paper is organized as follows: Section 2 gives the preliminaries; Section 3 analyzes the problem theoretically; Section 4 provides the conclusion and some discussion. 2. Preliminaries In this section, we introduce the preliminaries for the later theoretical analysis, including the algorithms, problem, and the analytical tool considered in our analysis. 2.1. Algorithms Given the initial population ξ0 , the general framework of EAs at the tth generation is as follows:

(c )

(i) Recombination: Parent individuals in population ξt are recombined, and the intermediate population ξt is obtained. (m) (c ) (ii) Mutation: Intermediate Individuals in population ξt are mutated, and the offspring population ξt is obtained. (m) (iii) Selection: Offspring individuals in the population ξt and ξt are assigned a survival probability. Then some individuals are selected to the next generation ξt +1 based on the probability. The above procedure is repeated until the stopping criteria is met. In this paper, the EAs use mutation and selection only, and their parent population size is equal to their offspring population size. Thus the EAs are also regarded as the (µ + µ) EAs [10], where µ is called the population size in this paper. Moreover, the EAs all use binary encoding and the mutation operator is the one-bit mutation that flips only one bit of each individual in a generation. The flipped bit is chosen uniformly random from the total n bits of an individual, where n is the problem size. (m) (m) Let the union population of the parent population ξt and offspring population ξt (at the tth generation) be ξt ∪ ξt . The selection operators considered in this paper are described as follows: (m)

– Selection I: Retain directly the best and worst individuals in the union population ξt ∪ ξt , select other µ − 2 individuals (m) from the remaining 2µ − 2 individuals (in ξt ∪ ξt ) by the 2-tournament selection. (m) – Selection II: With probability of p or q = 1 − p, select the best or worst individual (in the union population ξt ∪ ξt ) as the ‘‘seed’’ individual, where q < p < 1 are positive constants. Copy this seed individual µ times to fill the population of the next generation (ξt +1 ). Selection II selects a population containing µ same individuals, thus it actually derives from the (1 + λ) EA [17]. We can see that Selection II is with very strong bias towards either the best or the worst individual within one generation, which is an extreme selection scheme that maintains a high selection pressure and an extremely low diversity. In the contrast, Selection I retains both the best and worst individuals, which represents a low selection pressure. At the first glance, the two selection schemes appear to be not commonly used in practice. However, we are not trying to propose any specific practical algorithm, instead, our aim is to study the difference between two strategies, i.e., assigning both ‘‘promising’’ and ‘‘unpromising’’ individuals some survival probabilities and strongly biasing towards some particular individuals. The aforementioned selection schemes nicely demonstrate such two tendencies, and thus is sufficient for our analysis. Furthermore, the performance of an algorithm can be affected by both the selection scheme and the mutation operator. To analyze the effect of selection pressure precisely, it is necessary to reduce the impact by mutation as much as possible. Hence, we adopt the one-bit mutation (local mutation) rather than the widely used bitwise mutation (global mutation) [5]. 2.2. Wide-gap problem In this subsection, we introduce the general definition of wide-gap problem, and a concrete wide-gap problem for later investigations. Intuitively, the wide-gap property is a characteristic describing the fitness landscape of a problem. However, unlike the previous notion of elementary landscapes [23] which characterizes the landscape of a problem by directly utilizing search space, neighborhood, and objective function information, the wide-gap property indirectly characterizes the landscape of a wide-gap problem via the performances of the so-called (1 + 1) EA starting from different initial solutions. Hence, before presenting the definition of wide-gap problem, we must provide the flow of the (1 + 1) EA first.

928


Given the initial individual x0 , the procedure of the tth generation (t ≥ 0) of the (1 + 1) EA is as follows: (M )

– In the t-th generation, flip each bit of the individual xt with the probability of px . Then an offspring individual xt obtained. (M ) (M ) (M ) – Evaluation the fitness of xt . If f (xt ) ≥ f (xt ), then set xt +1 = xt ; else set xt +1 = xt . – Set t = t + 1.

is

The above procedures are repeated until some stopping criteria is met. The performance of an EA on a problem can be measured by the first hitting time of the Markov chain [3,6,8,9,11,20] modeling the EA. Let t denote the index of generations, the first hitting time is defined formally as below: Definition 1. First hitting time: For a given Markov chain L : {Lt , t = 0, 1, . . .}(Lt ∈ M ) and a subspace H (of M) we are interested in, the first hitting time to H is defined by

τ = min{t ≥ 0; Lt ∈ H }.

(1)

For an EA, H can be regarded as the subset of populations whose elements all contain the global optimum x∗ . Hence, the first hitting time of an EA is given by:

τ = min{t ≥ 0; x∗ ∈ ξt },

(2)

where x is the global optimum and ξt is the population of the EA at the tth generation. On the basis of the (1 + 1) EA and first hitting time, we now introduce the definition of wide-gap problem. Concretely, let P be a problem with a finite search space S , and an objective function f taking a limited number of values. Then, we can sort the values of the objective function f in descending order: fmax = f0 > f1 > · · · > fl = fmin . By the values of f , the whole search space S can be divided into l + 1 subspaces: ∗

∀i ∈ {0, . . . , l} : Si = {x ∈ S ; f (x) = fi }.

(3)

Based on the above decomposition of search space and the definition of first hitting time, He et al. proposed the so-called wide-gap problems: Definition 2 (Wide-gap Problem [14]). If for a problem P , there exist two subspaces Sk and Sk+1 such that the mean first hitting times of the (1 + 1) EA (with bitwise mutation and elitist selection) starting from any a ∈ Sk and b ∈ Sk+1 satisfy that | E[τ | x0 = a] − E[τ | x0 = b] | is an exponential function of the problem size n, then P is a wide-gap problem. Following the above definition, validating a problem P to be a wide-gap problem is straightforward. The first step is to select two subspaces Sk and Sk+1 from the total l + 1 ones decomposed from the whole search space of P , and estimate the mean first hitting times of the (1 + 1) EA starting from any pair of solutions belonging to Sk and Sk+1 respectively. If the difference of the mean first hitting times between the subspaces Sk and Sk+1 is exponentially large, then P is a wide-gap problem. Next we present an example which will then be validated to be a wide-gap problem. Let us consider the following problem: Maximize f (x) =

n X

w i si ,

x = (s1 , . . . , sn ) ∈ {0, 1}n

(4)

i=1

Subject to

n X i=1

wi si ≤ C =

n−1 X

wi + 1,

i=1

w1 = w2 = · · · = wn−1 > 1, wn = C . The above problem has a unique global optimum x∗ = (0, 0, . . . , 0, 1) and a local optimum x0 = (1, 1, . . . , 1, 0). It is a special instance of the Subset Sum problem that is known as NP-complete [9,18,24]. When applying EAs to the problem, we do not adopt any special strategies to utilize/repair the infeasible solutions. Once an individual is judged to be infeasible, we assign it a zero fitness and simply replace it with its feasible parent. According to Definition 2, we can validate that the above problem is a wide-gap problem. By considering the search space decomposition mentioned in (3), we know that S0 is the subspace which only contains the global optimum x∗ , and S1 is the subspace which only contains the local optimum x0 . For the (1 + 1) EA, the mean first hitting time starting from x∗ is obviously 0. Next we will show that the mean first hitting time of the (1 + 1) EA starting from x0 is an exponential function of the problem size n. Since x0 is the local optimum whose fitness is the second largest among all solutions in the search space, any non-optimal offspring generated by the bitwise mutation on x0 will lead to a lower fitness. In response to the elitist selection of the (1 + 1) EA, which retains the one with higher fitness between the parent and offspring, any non-optimal offspring cannot be accepted. The only way to find the global optimum x∗ is via an extremely large jump resulted from the bitwise mutation. However, the probability that x0 mutates to x∗ by the bitwise mutation is (1/n)n , which is exponentially close to 0. Hence, the mean first hitting time of the (1 + 1) EA starting from x0 is nn . Consequently, according to Definition 2, the problem defined in (4) is a wide-gap problem.


929

2.3. Mathematical tools In this subsection, we introduce the mathematical tool utilized in our theoretical analysis. To facilitate our introduction, we first present some necessary notations and definitions. As mentioned in Section 2.2, any infeasible solution will be replaced by a feasible solution immediately. Hence, throughout the evolution process, a population contains only feasible solutions. Here, the whole set of these feasible populations is denoted by E, and it can be further divided into some subsets by the criteria described below. For a population X without the global optimum (x∗ ∈ / X ), we define d(X ) = min{d(x); x ∈ X }. where for a non-optimal feasible individual x = (s1 , . . . , sn−1 , sn ) ∈ X , d(x) =

n−1 X (si − 0). i=1

Then we get n non-optimal population subsets based on d(·):

/ X }, Ek = {X | d(X ) = k, x∗ ∈

k = 0, . . . , n − 1.

Concerning the populations that contain the global optimum (which is called the optimal populations), we further define the set Eopt = {X | x∗ ∈ X }. For each Ek (Eopt ), we define mEk (mEopt ) as the mean first hitting time of the populations in Ek (Eopt ). mEk is further defined formally as mEk = E[τ | ξ0 ∈ Ek ], where ξ0 is the initial population. According to the definition of the first hitting time, mEopt = 0. Let {Lt , t = 0, 1, . . .} be a homogeneous Markov chain with discrete-time parameter on M. Denote its transition matrix as P, pij is the transition probability from state i to state j. Let Q = (qij ) = P − I, where I is the identity matrix. The first hitting time to a subspace H, denoted by τ , is given by Definition 1. Further, the mean finite first hitting time conditional on the initial state i, denoted by mi , is defined as follows: mi = E[τ , τ < ∞, Lτ ∈ H | L0 = i]. The first hitting probability (the probability that the first hitting time is not infinite) to H, denoted by Di , conditional on the initial state i, is defined by Di = P(τ < ∞, Lτ ∈ H | L0 = i)

=

X

P(τ < ∞, Lτ = j | L0 = i).

j∈H

Obviously, if Di = 0, then mi = 0, and further

E[τ , Lτ ∈ H | L0 = i] = ∞. Given the above notations, we now introduce two lemmas on the first hitting probability and the first hitting time, which are originally proposed for the passage time of Markov chain [19]: Lemma 1 ([19]). The first hitting probability Di satisfies

X

qik Dk = 0, i ∈ H c ,

k

Di = 1 ,

i ∈ H,

c

where H = M − H. Lemma 2 ([19]). The conditional mean first hitting times mi (i ∈ H c ) satisfy qi mi = Di +

X

qij mj ,

j∈H c ,j6=i

where qi = −qii =

P

k6=i

qik .

The above analytical tool enables us to study the mean first hitting times mi (i ∈ H c ) of the Markov chain starting from different subsets of states. As a consequence, our later results concerning the Markov chain models of EAs do not rely on any specific initialization strategy which may lead to initial population belonging to different subsets of E.

930


3. Comparative analysis of EAs on the subset sum problem In this section, we will analyze the (µ + µ) EAs on the instance of the subset sum problem. For the mean first hitting time of the (µ + µ) EA with the one-bit mutation and Selection I, we can achieve the following result: Proposition 1. For the (µ + µ) EA with the one-bit mutation and Selection I on the instance of the subset sum problem,

 mEopt = 0, P k m ≤ n + i=1  Ek m = O(n log n),

n i

= O(n log n),

k = 0, . . . , n − 1,

where m = E[τ ] is the mean first hitting time of the EA. Furthermore, the corresponding numbers of function evaluations, µm, µmE1 , . . . , µmEn , are all polynomial functions of n given a polynomial population size µ. Proof. Assume that the initial population ξ0 ∈ Ek (k = 0, 1, . . . , n − 1). We see that Selection I always retain the worst S k−1

individual, then the probability of reaching i=0 Ei in one generation is not smaller than nk . Hence, the mean first hitting time from Ek to E0 , denoted by mk0 can be estimated as follows: mk0 ≤

k X n i=1

i

.

Denote the mean first hitting time starting from E0 to Eopt by m0 . Since Selection I also preserves the best individual, then m0 ≤ n. Hence, the following bounds of the mean first hitting time hold for k = 0, . . . , n − 1: mEk ≤

k X n i=1

i

+ n = O(n log n).

The above result implies that the expectation m = E[τ ] = O(n log n).

Now we will analyze the (µ + µ) EA with Selection II on the problem. We can obtain the following proposition by using the lemmas introduced in the former section: Proposition 2. For the (µ + µ) EA with the one-bit mutation and Selection II on the instance of the subset sum problem, the mean first hitting times satisfy (5). When the population size µ = o(n2 ), m, mE0 , . . . , mEn are not polynomial functions of n, where m = E[τ ].

m Eopt = 0,  n−2 j   2 µ Qj  (1− 1n )µ Pn−2 1−( i+ (1− 1n )µ p p 1  n )  + mE0 = 2 q  i=1 1−( n−i−1 )µ , j =0 q j+2 1−(1− 1n )µ  1−( n )µ p n  1−( 1n )µ q   n −2 j +1   3 µ Pn−3 Qj 1−(1− 1n )µ 1−( i+  p p 1 n )  m = mE0 + + j =0  2 q i=0 1−( n−i−2 )µ ,  q j+3  E1 1−( n )µ p n 1−( 1n )µ q n−k−1 Qk 1−( n−n i )µ 1−(1− 1n )µ  p   i=2 1−( i+1 )µ mEk = mEk−1 + 1−( 1 )µ 2 q q  n  n   j +1  Pn−k−2 Qj  1−( i+kn+2 )µ p  1  + j =0 k = 2, . . . , n − 2  i=0 1−( n−i−k−1 )µ q j+k+2  1−( n )µ p n    1 mE = mE + n−1 n−2 1

(5)

1−( n )µ q

D = 1 , opt     n −1 µ 1 µ  1 − pDopt − pD0 + n− pD1 = 0  n n   n −k µ k+1 µ n−k µ 1 − qD − 1 − p − q D + 1−  k − 1 k n n n        1 − 1 µ qDn−2 − 1 − 1 µ qDn−1 = 0 n n

k+1 µ n

m = 0 , opt     n −1 µ 1 µ  pmopt + n− pm1 pm = 1 + 1 −  0 n n   k+1 µ n −k µ n −k µ 1 − p − q m = 1 + 1 − qm + 1−  k k−1 n n n        1 − 1 µ qmn−1 = 1 + 1 − 1 µ qmn−2 n n

pDk+1 = 0,

k+1 µ n

pmk+1 ,

k = 1, . . . , n − 2

k = 1, . . . , n − 2

(6)

(7)


931

m = 0, opt  j n −2   2 µ Qj  (1− 1n )µ Pn−2 1−( i+ (1− 1n )µ p p 1 n )   + m = 2  i=1 1−( n−i−1 )µ , j =0 q q j+2 1−(1− 1n )µ  0 1−( n )µ p n 1−( 1n )µ q    j +1 n −2  i+3 µ 1 µ  P Q 1−( n ) 1−(1− n )  j n −3 p p 1  m = m + +  1 0 2 i=0 1−( n−i−2 )µ , j =0  q q j+3  1−( n )µ p n 1−( 1n )µ q n−k−1 1 µ n−i µ Q 1−(1− n ) 1−( n ) k  p  mk = mk−1 + 2 q  i=2 1−( i+1 )µ  1 µ  n 1 −( ) q  n  j+1   i + P Q  1−( kn+2 )µ j n−k−2 p 1   + k = 2, . . . , n − 2  i=0 1−( n−i−k−1 )µ , j =0 q j+k+2  1−( n )µ p n    1  mn−1 = mn−2 + 1

(8)

   1−      1−        1−

(9)

1−( n )µ q

n−1 µ n

n−k µ n

1 µ n

n −1 µ pG1 n

pG0 = 1 +

qGk = 1 +

k+1 µ n

1−

pGk+1 ,

k = 1, . . . , n − 2

qGn−1 = 1

Proof. First we provide eight some probability conditions for populations that belong to different subsets Ek . When k = 0, for the population ξt in the tth generation, ξt = X , X ∈ E0 :

P(ξt +1 ∈ Eopt | ξt = X ) =

1−

n−1

µ ! p

n

(10)

%The global optimum has been generated and preserved (by the selection operator).

P(ξt +1 ∈ E0 | ξt = X ) = q

(11)

%One of the worst individuals, belonging to E0 , has been preserved.

P(ξt +1 ∈ E1 | ξt = X ) =

µ

n−1 n

p.

(12)

%Some individual belonging to E1 has been generated and preserved. When 2 ≤ k ≤ n − 2, for the population ξt in the tth generation, ξt = X , X ∈ Ek :

P(ξt +1 ∈ Ek−1 | ξt = X ) =

1−

µ !

n−k

q

n

(13)

%Some individual belonging to Ek−1 has been generated and preserved.

P(ξt +1 ∈ Ek | ξt = X ) =

k+1

µ

p+

n

n−k

µ

n

q

(14)

%As either the best or the worst individual of the parent and intermediate populations, %some individual belonging to Ek has been preserved.

P(ξt +1 ∈ Ek+1 | ξt = X ) =

1−

µ !

k+1 n

p.

(15)

%Some individual belonging to Ek+1 has been generated and preserved. When k = n − 1, for the population ξt in the tth generation, ξt = X , X ∈ Ek :

P(ξt +1 ∈ En−2 | ξt = X ) =

1−

1 µ

q

n

(16)

%Some individual belonging to En−2 has been generated and preserved.

P(ξt +1 ∈ En−1 | ξt = X ) = p +

µ 1 n

q.

%As the best individual of the parent and intermediate populations, the local optimum belonging to En−1 %has been preserved (the first item);

(17)

932


%As the worst individual of the parent and intermediate populations (filled with the local optimum), %the local optimum belonging to En−1 , which has been utilized to replace its infeasible offsprings, %is preserved (the second item). Denote p¯ ij be the transition probability from the state Ei to Ej . The formal definition is given by p¯ 00 = 1

(18)

p¯ ij = P(ξt +1 ∈ Ej | ξt ∈ Ei ),

i = 0, . . . , n, j = 0, . . . , n or ‘‘opt’’.

(19)

¯ = (¯pij ), its mean first hitting time conditional Consider a Markov chain L0t with the states 0, . . . , n and the transition matrix P on then initial state L00 = k, denoted by mk , is given by mk = E[τ , τ < ∞, L0τ ∈ H | L00 = k]. Obviously, mk = mEk holds for k = 0, . . . , n and ‘‘opt’’. Now denote the first hitting probability starting from the state k as Dk , we have Dk = P(τ < ∞, L0τ ∈ H | L00 = k). By applying (10)–(17) to Lemma 1, we can get the equations on the first hitting probabilities in (6). Since p < 1, it can be derived from (6) that Dopt = D0 = D1 = · · · Dn−1 = 1.

(20)

By applying (10)–(17) and (20) to Lemma 2, we can obtain directly the equations in (7). Let Gk = mk − mk−1 (1 ≤ k ≤ n − 1) and G0 = m0 , the equations in (7) yield the non-homogeneous recurrence relation shown in (9). By solving (9), we can obtain (8). Since p/q > 1, then mn−1 > · · · > m0 > mopt (i.e., mEn−1 > · · · > mE0 > mEopt ), and (1− 1n )µ

mE0 = m0 = Ω

1−( 1n )µ

2

O(n) ! p q

.

When n → ∞, mE0 = Ω e

−µ n

O(n) ! p q

.

Hence, if µ = o(n2 ), then the conditional mean first hitting times mE0 , . . . , mEn are not polynomial functions of n. Hence, the mean first hitting time m is not a polynomial function of n, given µ = o(n2 ). To validate our theoretical results presented in Propositions 1 and 2, we also carried out empirical studies on the computation times of the EAs with Selections I and II. Figs. 1 and 2 illustrate the average first hitting times of the two EAs (over 100 runs) on the instance of the subset sum problem (4) with different problem sizes, where both the EAs employ the initial population generated uniformly at random in the feasible search space, and the population size of both EAs is set to 2n ln n. Moreover, for the EA with Selection II, the value of the parameter p is set to 0.6, and thus the value of the parameter q is 1 − p = 0.4. According to Fig. 1, the curve representing the average first hitting time of the EA with Selection I is bounded from above by the curve 1.2n ln n, and approximates the curve 1.1n ln n, thus it approximately exhibits an O(n log n) behavior. Meanwhile, according to Fig. 2, the curve of the EA with Selection II apparently exhibits an exponential behavior. Hence, the experimental results validate our theoretical results well. The propositions presented in this section show the disadvantage of high selection pressure in solving the wide-gap problems with only a few local optima. In this case, the Hamming distance between two local optima can be very large. If the EA adopts high selection pressure, then the individuals are very likely to be trapped in some local optima and it is very hard for them to further jump to the basins of attraction of other optima. Hence, to solve the wide-gap problem presented in this paper, and further, the wide-gap problems with only a few optima, Selection I may be superior to Selection II for the (µ + µ) EA. 4. Conclusion and discussion This paper studies the relation between a problem characteristic and the choice of selection pressure. Through a case study, we theoretically showed that, for wide-gap problems with only a few optima, it is better for the population-based EAs to use low selection pressure. The empirical results also verified our theoretical results. However, for the wide-gap problems that contain many local optima, whether low selection pressure is still better remains to be an open question. Take the SufSamp problem [17] as an example, it is likely that only EAs with sufficiently high selection pressure and large population can solve it efficiently. This problem contains a path leading to the global optimum,


933

EA with Selection I

300

Number of generations

250

200

150

100

50

0

0

10

20

30 40 Problem size n

50

60

Fig. 1. The average first hitting time of (µ + µ) EAs with Selection I (over 100 runs), where the population size of the EA is 2n ln n.

EA with Selection II

106

Number of generations

105

104

103

102

101

4

6

8

10

12 14 16 Problem size n

18

20

22

24

Fig. 2. The average first hitting time of (µ + µ) EAs with Selection II (over 100 runs), where the population size of the EA is 2n ln n.

but near the path there are many local optima with relatively higher fitness than the points on the path, and EAs may be easily trapped in those local optima. The EA should sample enough points near the path and employ high selection pressure to force its individuals search along the path (rather than be trapped into the local optima). In this case, it is likely that low selection pressure does not perform as well as high selection pressure, given the same population size. Moreover, it is possible that for some wide-gap problems, adaptive selection pressure (e.g., [1]) is better. For example, consider a wide-gap problem with lots of local optima: if some of the local optima locate near a path leading to the global optimum like SufSamp, and they are far away from some other optima, then probably it is better to use low selection pressure at the beginning (to find the path quickly) and high selection pressure later (to search along the path). Acknowledgements The authors would like to thank Dr. Ke Tang for his valuable comments and revision of the paper. This work was partially supported by National Natural Science Foundation of China grants (No. 60533020 and U0835002), the Fund for Foreign Scholars in University Research and Teaching Programs (Grant No. B07033), and an Engineering and Physical Science Research Council grant in the UK (No. EP/C520696/1). References [1] E. Alba, B. Dorronsoro, The exploration/exploitation tradeoff in dynamic cellular genetic algorithms, IEEE Trans. Evol. Comput. 9 (2) (2005) 126–142. [2] T. Bäck, Selective pressure in evolutionary algorithms: A characterization of selection mechanisms, in: D. Fogel, (Ed.), Proc. 1st IEEE Conf. Evol. Comput., Orlando, FL, 1994, pp. 57–62. [3] T. Chen, J. He, G. Sun, G. Chen, X. Yao, A new approach to analyzing average time complexity of population-based evolutionary algorithms on unimodal problems, IEEE Trans. Syst., Man, and Cybern., Part B 39 (5) (2009) 1092–1106.

934


[4] T. Blickle, L. Thiele, A comparison of selection schemes used in genetic algorithms, Technical Report No. 11. Gloriastrasse 35, CH-8092 Zurich: Swiss Federal Institute of Technology (ETH) Zurich, Computer Engineering and Communications Networks Lab (TIK), 1995. [5] S. Droste, T. Jansen, I. Wegener, On the analysis of the (1+1) evolutionary algorithm, Theor. Comput. Sci. 276 (1–2) (2002) 51–81. [6] J. Garnier, L. Kallel, M. Schoenauer, Rigorous hitting times for binary mutations, Evol. Comput. 7 (2) (1999) 173–203. [7] D.E. Goldberg, K. Deb, A comparative analysis of selection scheme used in genetic algorithms, in: G.J.E. Rawlings (Ed.), Foundations of Genetic Algorithms, Morgan Kaufmann, San Mateo, CA, 1991, pp. 69–93. [8] J. He, L.S. Kang, On the convergence rate of genetic algorithms, Theor. Comput, Sci. 229 (1–2) (1999) 23–39. [9] J. He, X. Yao, Drift analysis and average time complexity of evolutionary algorithms, Artif. Intell. 127 (1) (2001) 57–85. [10] J. He, X. Yao, From an individual to a population: An analysis of the first hitting time of population-based evolutionary algorithms, IEEE Trans. Evol. Comput. 6 (5) (2002) 495–511. [11] J. He, X. Yao, Towards an analytic framework for analysing the computation time of evolutionary algorithms, Artif. Intell. 145 (1–2) (2003) 59–97. [12] J. He, X. Yao, A study of drift analysis for estimating computation time of evolutionary algorithms, Natur. Comput. 1 (3) (2004) 21–35. [13] J. He, X. Yao, J. Li, A comparative study of three evolutionary algorithms incorporating different amount of domain knowledge for node covering problems, IEEE Trans. Syst., Man, Cybern.— Part C 35 (2) (2005) 266–271. [14] J. He, C. Reeves, X. Yao, A discussion on posterior and prior measures of problem diffciulties, in: Proc. PPSN IX Workshop on Evolutionary Algorithms — Bridging Theory and Practice, 2006. [15] J. He, C. Reeves, C. Witt, X. Yao, A note on problem difficulty measures in black-box optimization: Classification, realizations and predictability, Evol. Comput. 15 (4) (2007) 435–444. [16] M. Hutter, S. Legg, Fitness uniform optimization, IEEE Trans. Evol. Comput. 10 (5) (2006) 568–589. [17] T. Jansen, K.A.D. Jong, I. Wegener, On the choice of the offspring population size in evolutionary algorithms, Evol. Comput. 13 (4) (2005) 413–440. [18] S. Khuri, T. Bäck, J. Heitkotter, An evolutionary approach to combinatorial optimization problems, in: D. Cizmar (Ed.), Proc. 22nd Ann. ACM Comput. Sci. Conf., ACM Press, New York, 1994, pp. 66–73. [19] R. Syski, Passage Times for Markov Chains, IOS Press, Amsterdam, 1992. [20] P.S. Oliveto, J. He, X. Yao, Time complexity of evolutionary algorithms for combinatorial optimization: A decade of results, Int. J. Auto. and Comput. 4 (3) (2007) 281–293. [21] A. Rogers, A. Prügel-Bennett, Genetic drift in genetic algorithm selection schemes, IEEE Trans. Evol. Comput. 3 (4) (1999) 298–303. [22] D. Whitley, The GENITOR algorithm and selection pressure: why rank-based allocation of reproductive trials is best, in: Proc. 3rd Int. Conf. Genetic Algorithms, San Mateo, CA, 1989, pp. 116–123. [23] D. Whitley, A.M. Sutton, A.E. Howe, Understanding elementary landscapes, in: Proc. GECCO’08, 2008, pp. 585–592. [24] Y. Yu, Z.-H. Zhou, A new approach to estimating the expected first hitting time of evolutionary algorithms, Artif. Intell. 172 (15) (2008) 1809–1832.