Hierarchical Building-Block Problems for GA

14 downloads 0 Views 176KB Size Report
trivially dependent on other building-blocks ... Each sub- optima corresponds to a combination of building- blocks that are incompatible - but, these ... successful search algorithm needs to maintain ... We then move on to compare GA variants ... cross-over without mutation is sufficient. ..... However, the relative value of the.
Hierarchical Building-Block Problems for GA Evaluation Richard A. Watson, Gregory S. Hornby, Jordan B. Pollack Dynamical and Evolutionary Machine Organisation group Volen National Centre for Complex Systems Brandeis University Waltham, MA. 02254-9110. USA. [email protected] http://www.demo.cs.brandeis.edu

Abstract Much research effort in the GA community has been directed at identifying what, if anything, GA’s are good for ([Forrest & Mitchell, 1993], [Goldberg, 1994], [Mitchell et al., 1995]). The building-block hypothesis ([Holland, 1975], [Goldberg, 1989]) directs our attention to problems which exhibit a decomposable structure, and the Royal-Road problems [Mitchell et al. 1992] were a first attempt at formalizing a problem with building-block components. Other models have been proposed since, but it remains unclear how to model (or solve) problems that exhibit interdependency between building-blocks - sometimes called ‘linkage’ or ‘cross-talk’. If, for example, the contribution of a building-block is nontrivially dependent on other building-blocks how can it be identified independently? And, if it is not independent, how can it reasonably be called a building-block? This paper offers clarification for the concepts of inter-dependent blocks, or partiallydecomposable problems, and introduces the Hierarchical-IFF (H-IFF) function which provides a canonical example of a class of hierarchical test problems. H-IFF is then used in some preliminary experiments to investigate the GA’s ability to manipulate interdependent building-blocks.

1

Introduction

This paper introduces the Hierarchical-IFF problem as a canonical example for a class of GA test problems. A problem that consists of separable subproblems may be efficiently solved by decomposition into localized searches for the corresponding sub-solutions. A problem that does not exhibit sub-problems must employ much more expensive global optimisation. It is conjectured that most real-world problems (at least, those of interest) are somewhere between these two extremes. ‘Partially decomposable’ problems, (or

‘nearly-decomposable’ in the terminology of Simon [1969]), exhibit sub-problems to some degree, but they are not fully independent. This paper clarifies the concept of partiallydecomposable problems, discussing the inadequacies of the test-problems in the GA literature. Notably, the Royal-Road (RR) functions and trap functions ([Mitchell et al., 1992], [Goldberg, 1992]), which have sub-problems that are completely separable, and the NK landscapes [Kauffman, 1993], which exhibit degrees of separability but no gross-scale structure. Whitley [1995] argues for the use of test-problems that have non-separable components, and suggests some methods for building scaleable test-functions from a given basis-function. In this paper we identify the properties of a basis-function that can be applied hierarchically, or recursively, which creates a more principled form of interdependency. A canonical example is provided by the hierarchical application of the logical IF-and-only-IF (IFF) function. This enables the construction of a problem, reminiscent of the R2 problem from Forrest and Mitchell [1993], that exhibits a gross-scale hierarchical building-block structure quite clearly. But unlike R2, the blocks in this function are not separable. In the Hierarchical-IFF (H-IFF) problem, each building-block has two solutions, and these are rewarded equally. But, each alternative solution has different interactions at the next level in the hierarchy, i.e. with neighboring blocks. This interdependency must be resolved to confer additional fitness. H-IFF provides an interesting test problem for GA’s. The fractal landscape it defines has many sub-optima, but these are not arbitrary. Each suboptima corresponds to a combination of buildingblocks that are incompatible - but, these same blocks could be optimal in a different context. A successful search algorithm needs to maintain alternative solutions to building-blocks for at least as long as it takes to resolve their interdependencies at the next level in the hierarchy. Thus, a population-based search method like a GA would seem to be well matched with the H-IFF problem. However, the maintenance of diversity is

2 Richard Watson paramount - a highly converged search process will not be successful. H-IFF is then used in various experiments to explore the ability of GA’s. In particular, we are interested in identifying a GA’s ability to identify building-blocks, resolve their inter-dependencies, and (re)combine them, as described by the Building-Block Hypothesis ([Holland, 1975], [Goldberg, 1989]). We first verify that H-IFF does not fall prey to trivial algorithms such as Random Mutation Hill Climbing (RMHC), because you can never be too careful [Forrest & Mitchell, 1993]. As expected, RMHC performs poorly, becoming trapped in local optima. We then move on to compare GA variants that attempt to exploit the building-block structure. A simple GA converges quickly on this problem (despite using a steady-state algorithm intended to avoid most fitness-scaling issues) and, having converged, it then, like Hill-Climbing, becomes trapped in sub-optima. Alternate block-value schemes, introduced in an attempt to reduce this, areineffective. A possible explanation for this is the phenomena of ‘hitch-hikers’ [Forrest & Mitchell, 1993]. This term refers to genes which do not contribute to the genotype fitness but are, nonetheless, promoted in the population. This occurs because the rogue genes are resident in individuals of high fitness (fitness gained from other sections of the genotype). As these useless bits are promoted they overwrite, possibly useful, bits in other individuals. But, as we will see, if diversity is enforced, then the standard GA can overcome this problem, and find the global optima successfully. In fact when diversity is maintained cross-over without mutation is sufficient. However, it is clear that even when diversity is maintained in the standard GA, cross-over can only combine building-blocks from different individuals effectivelyif the bits corresponding to a buildingblock are adjacent on the genome. When the bits of a building-block are non-adjacent, recombination tends to disrupt the blocks at least as often as it combines them (termed “linkage disequilibrium” by Altenberg [1995] in his discussion on the inadequacies of the Schema Theorem [Holland, 1989]). Thus, when the bit-ordering of the H-IFF problem is re-arranged, the standard GA performs poorly (no better than mutation only). This illustrates a serious deficiency in the GA’s ability to manipulate building-blocks. The next section details the properties we might expect of real-world problems, the properties of example problems in the literature, and the definition of the Hierarchical-IF-and-only-IF (H-IFF) problem. Subsequent sections detail the implementations of the standard GA and describe the results of experiments investigating the GA’s ability to manipulate building-blocks in H-IFF.

2

Existing Test-Problems

Simon [1969] argues that natural problems exhibit a partially-decomposable structure. Partialdecomposability seems to be an intuitively plausible structure for real-world problems since the natural world is neither an undifferentiated ‘soup’, nor a collection of wholly independent entities. Rather it consists of differentiated objects which interact with various degrees of interdependence. Surviving in the real world navigating, finding food, avoiding predators, finding mates, building nests, etc. - requires coping with (recognising, predicting, affecting, controlling) the various objects relevant to an agent’s existence. To the extent that these objects are independent, the problem of coping with the world is decomposable. And, to the extent that they’re not, it isn’t. If there is a (partially-)decomposable structure to a problem then there is considerable algorithmic advantage to be gained: problem-decomposition and dynamic programming offer exponential speedup over linear programming methods when applicable. However, the decomposition of a problem is not always known a priori, as required by the former, and it is not always practicable to store all intermediate results, as in the latter. The GA is a heuristically-guided, population-based search method that seeks to exploit the advantage of problem decomposition via selection and recombination. Such is the appeal of the BuildingBlock Hypothesis [Goldberg, 1989]. Yet, thus far, there is much controversy over whether the GA really delivers this promise ([Forrest & Mitchell, 1993], [Goldberg, 1994], [Mitchell et al., 1995], [Altenberg, 1995]). Much of this controversy stems from the inadequacy of test functions in the GA literature. Whitley et al. [1995], provide a review of testproblems from the GA literature, and summarize, “most common test problems used in the past to evaluate genetic algorithms lack” ... “strong nonlinear dependencies between state variables”, [Whitley et al., 1995b]. That is, although, “the contribution of a single variable to the overall evaluation” may be non-linear, there may be no non-linear interactions between variables, and therefore “the optimal value for each parameter can be determined independently of all other parameters” [Whitley et al., 1995]. The Royal Road (RR) functions [Forrest & Mitchell, 1993], trap functions [Deb & Goldberg, 1992], and others, clearly emphasize a gross-scale building-block structure. But, like the real-valued functions that Whitley investigated, these too consist of concatenated blocks, each of which may be optimised independently in a cumulative fashion. The R1 problem [Forrest & Mitchell, 1993] can be imagined as a ‘staircase’ leading search in a stepwise fashion in the correct direction.

Hierarchical Building-Block Problems for GA Evaluation 3 In the so called ‘fully-deceptive’ trap functions the fitness gradient leads search away from the solution to each block but, since there is no interdependence between blocks, sub-solutions again accumulate linearly. To continue the analogy, although each tread on the stair is inclined unfavorably, it is still a staircase leading to the global optima. ‘Linkage’ or cross-talk between building-blocks, is not included in these models, although it is discussed and acknowledged as an issue for future research (e.g. [Goldberg & Horn, 1994]). Whitley et al. [1995], propose an “expansion” method for constructing scaleable non-separable functions from a base function of two variables. By selecting pairs of variables in an overlapping fashion and summing the results, they create a nonseparable function over an arbitrarily large set of variables. They also propose that using all possible pairs, rather than a minimal set of overlapping pairs, provides a method to vary the amount of inter-dependency. This notion is not very dissimilar to the NK landscapes of Kauffman [1993]. The NK landscapes certainly do model the interdependency of variables. In the NK model the contribution of each of N variables is defined as a random look-up function of itself and k other variables. The expanded functions of Whitley, are analogous to NK landscapes with k=2 and with a repeated base-function instead of random lookup tables. Using all possible pairs is not wholly different from increasing k. This seems like a step in the right direction from the point of view of defining a partially-decomposable problem, especially since the value of k, in some sense, defines the degree of interdependence. But this still isn’t wholly satisfying - for now we can’t see the building-blocks. The concept of building-blocks implies some modularity to the scope of dependencies, whereas in the NK model (and the expanded function method) the dependencies form a uniform network. We explored varieties of the NK model that cluster the variables into groups with many intra-group dependencies and few inter-group dependencies - an idea inspired by the dependency matrices used by Simon [1969] that approach Jordan Normal Form matrices. But, this does not supply a hierarchical structure either, merely a clustered, single-level one.

3

Hierarchically applied functions

The building-block hypothesis appeals to the notion of a hierarchical problem structure - that is, bits assemble into blocks, and blocks assemble into bigger blocks, and so on, to the global optima. The R2 function is an attempt to make this explicit. However, we surely have no good reason to believe that the interdependencies between bits (that form a block) are any different in nature from the interdependencies between blocks

(that form a meta-block). However in R2, and in the trap functions, the interactions between bits are very different from the interactions between blocks in the ‘fully deceptive’ functions, for example, the bit-wise landscape is fully deceptive whereas the ‘block-wise landscape’ is fully non-deceptive. In order to exhibit a truly hierarchical structure the overall fitness function should have the following form: f(A,B,C,D) = g(g(A,B),g(C,D)). This is in contrast to the RR and trap functions which have the form: f(A,B,C,D) = g(A,B)+g(C,D), which has separable terms; and also distinct from Whitley et al.’s suggested form: f(A,B,C,D) = g(A,B) +g(B,C) +g(C,D) +g(D,A), which is not separable, but also does not have a hierarchical structure. But this only sounds easy if you say it quickly. Unless there is some kind of correlation between the values of variables that produce good values for the blocks and the values of variables that produce good values for the overall system, then the blocks do not mean anything. Now it even seems like a contradiction to want strong non-linear dependencies between building-blocks and yet still have building-blocks - and indeed we have come close to concluding that ‘partial-decomposability’ is a meaningless concept. After all, a block is a cluster of dependent variables, but if those variables are strongly dependent on other variables, then the cluster isn’t really a meaningful entity. What follows is a different way of thinking about it. Consider the block, g(A,B), to follow the above form, and suppose, for simplicity, that A and B are integer variables in the range [0-9] and that g(A,B) produces an integer value, A', also in the range [09]. We do not know what value we want for A' - it might be high, it might be low, it depends on the function applied to it at the next level and the result of g(C,D). So we cannot determine what the best values are for A and B. But, what we do know is that it does not matter what the values of A and B are, so long as they produce the desired value for A'. That is, although there may be many values of A and B that produce the same value for A' we only need to know one pair. Assuming each value of A' is attainable, we need to know a maximum of 10 pairs of values for A and B in order to explore the entire space for A'. Whereas, the original space of A and B is 100 pairs. The search advantage comes then, not from knowing the correct settings for A and B, but by knowing the set of pairs for A and B that are required to produce all values of A'. By saying that the range of A' is the same as that of each variable in the domain we are describing a system exhibiting ‘dimensionalreduction’. That is, we believe that the ‘true’ dimensionality of the problem space is much less than that of the set of base-variables. This must be the case, one way or another, or else we can never expect to do better than random search if our gradient is deceptive. In a real-world problem this

4 Richard Watson corresponds to finding that the ‘degrees-of-freedom’ of a system have constraints between them and so were not really ‘free’, and do not provide a true measure of the dimensionality of the system. A successful search algorithm will incrementally reduce the dimensionality of the search space (to combinations of hyperplanes) without actually committing to any bit values until the end of the process. It is crucial that a search algorithm be able to maintain alternative solutions to a buildingblock for at least as long as it takes to resolve their interdependencies with neighboring blocks. 1This is in marked contrast to a convergent evolutionary process which only supports a singular most-fit species. Now, how can a fitness function lead us to identify which pairs of A&B are useful without knowing what value of A' is required? To understand this, it is first necessary to distinguish between the value that a block will be transformed into, e.g. A', (which will be passed up to the next level to determine its interaction with other blocks), and the concept of ‘block-fitness’. Blockfitness is an indication of how useful a block is, or may be, in contributing to an overall solution given that you do not know what context it will be used in (i.e. its average, or independent, utility). In general we suppose that some values of A', are more rare than others. For example, the number of pairs of A&B that map to A'=3, may be smaller than those which map to A'=6. 2 The fitness functions, play the role of re-distributing the search evenly over the range of A' - e.g. fit(A'=3) should be greater than fit(A'=6). Finally, let us put this altogether. We need a base-function that has a range with the same dimensionality as each of its arguments. This will be the ‘transform function’ and will be applied hierarchically to define the interaction of bits with each other in a block, blocks with each other in a meta-block, and so on. To be an interesting problem we require that some of the values in the range of this transform function are more rare than others. Then we need a fitness function to compensate for this - i.e. to increase the likelihood of guessing rare values in the range. Fortunately, its not hard to define a function that has these properties and, furthermore, it is based on a function everybody is familiar with - logical IF-and-only-IF (IFF).3

4

Hierarchical-IFF

H-IFF uses the logical IF-and-only-IF function (or equality) to define the acceptable solutions to building-blocks (pairs of bits) as shown in the fitness column of Table 1. Then a transform function is defined to determine the interaction of blocks at the next level. The transform function arbitrarily assigns 0 and 1 to the two solutions to IFF, and a ‘null-value’ (shown as “-”) to the remaining combinations. A IFF B A 0 0 1 1

B 0 1 0 1

fit 1 0 0 1

trans 0 1

Table 1: Fitness and tranform functions based on logical IFF.

To apply this in a hierarchical fashion, we feed the transform resulting from one pair of variables, and the transform of a neighboring pair, into the same function. Only the non-null values will be useful in producing non-null values at the next level. But, for completeness, we define Hierarchical-IFF, H-IFF, as a function of two basethree variables (0, 1 and null-value), as shown in Table 2. A 0 0 0 1 1 1

B 0 1 0 1 0 1

A H-IFF B fit trans 1 0 0 0 0 0 0 0 0 1 1

Table 2: Fitness and tranform functions of Hierarchical-IFF. (as per table 1 but ‘filled-out’ to handle nullvalues of the arguments when applied recursively)

We can use a different fitness value at each level of re-application, as we’ll discuss later, but for now, let the fitness of blocks start at 1 and double at each level. Thus, the fitness of four variables, A,B,C,D, using H-IFF is defined as follows: HIFFfit(A,B,C,D) = HIFFfit(A,B) + HIFFfit(C,D) + 2(HIFFfit(HIFFtrans(A,B),HIFFtrans(C,D)))

An example string is given in Table 3 with its hierarchical transforms and fitnesses.

1

This might be viewed as a search process not unlike interval-constraint systems. 2 If this were not so, we need not bother identifying particular pairs for A&B, since whichever value of A' we are trying to find, 10 or so random guesses, will deliver it. This is not unlike the situation in NK landscapes with k=N-1, where 50% of guesses will give maximum fitness. 3 Logical exclusive-or (being the negation of IFF) will suffice equally well in formuating a Hierarchical-XOR. Other boolean functions of two variables are not

suitable since they either have only one solution, or solutions dependent on only one variable.

Hierarchical Building-Block Problems for GA Evaluation 5 (#blocks * value = f it) 0

è

0 * 4 =0

è

1 * 2 =2

1 è

3 * 1 =3

1 -

1

0 0 1 0 1 1 1 1

Table 3: H-IFFtrans recursively applied to a random 8-bit example string. H-IFFfit is accumulated for blocks at each level.

total fit = 5

The result of all this is a problem with some quite interesting properties. For example, there are two solutions to every block and, similarly, two global optima. Also, these solution pairs are always maximally distinct i.e. all 1s or all 0s. In Table 3, the string shown is one bit mutation away from “00001111” which would yield a higher total fitness of 8. However, this is now 4 bit mutations away from the next highest point, either “11111111” or “00000000” which are both worth 12. 450 400 350 300 250 200 150 100 50

all null to all 1s

all null to all 0s

64

60

56

52

48

44

40

36

32

28

24

20

16

8

12

4

0

0 all 0s to all 1s

Table 4: A specific fraction of the H-IFF landscape. See text for details.

If, somehow, an algorithm knew it should never use zeros it would find a ‘staircase’ in the fitness landscape that would lead to the global optimum at 64 ones (Table 4, the “all null to all 1s”, max. fitness=448). Notice the small steps in fitness at each block of 8, and larger jumps at each block of 16. However, a corresponding curve exists for the all-zeros solutions. To illustrate that it is mutually exclusive with the all-ones solution, the all-zeros global optimum is shown at the opposite extreme from the all-ones maximum. But, in general, a candidate string may include blocks of ones and blocks of zeros, and “all zeros to all ones” shows the combination. This clearly shows the fractal structure of the local optima in the landscape. Bear in mind, that we are only viewing 65 points out of a total 264, most of which are very much lower in fitness than the points shown. Thus H-IFF provides a GA test function that exhibits a gross-scale hierarchical building-block structure with strong non-linear interdependencies between blocks. Yet it also provides a fitness gradient for identifying the building-blocks. Thus it provides the ideal test function to determine

whether GA’s really can identify and combine building-blocks, as suggested by the buildingblock hypothesis, and to test their ability to handle building-block linkage.

5

Experiments using H-IFF

The following experiments use H-IFF for some preliminary investigations into the ability of the GA to process building-blocks. To reduce premature convergence, and alleviate many complications of fitness-scaling, a steadystate GA is used. Search progresses by the repeated application of the following steps: 1. Pick three individuals at random from the population. 2. Sort them by their fitnesses. Let A be the highest fitness, then B, and C least fit. 3. Use A and B, and whatever genetic operators are employed (e.g. cross-over and/or mutation), to create a new offspring. 4. Evaluate the offspring. 5. Let the new offspring take the place of C in the population. There are many other variants of GAs (steadystate, or otherwise) that could have been used. However, preliminary trials of a canonical generational GA performed significantly worse than this steady-state version, and this variant suffices for the purposes of making the comparisons that follow. Figure 1 shows several runs of a GA with various parameters and added features, as will be discussed. All GA runs use a population of 1000 individuals, and are run for 1000 generations (‘generation’ meaning 1000 evaluations of the steady-state algorithm). Hill-climbing algorithms are also run for 106 evaluations. Each point is the best fitness from previous evaluations, averaged over 5 runs. A 64-bit H-IFF is used which has a maximum fitness solution of 448 (actually, there are two maxima each worth 448). Where cross-over is used, it is 2-point cross-over with a probability of 0.7. Where mutation is used, it is with a 6/64 per-bit probability of assigning a new random value. Different cross-over and mutation rates were found to decrease performance in preliminary trials. Unless otherwise stated a GA uses both cross-over and mutation. All strings are initialized to 64-bit random strings. The initial strings, and subsequent mutations, are binary, (ie. no null-values).

5.1

Convergence and block-value scaling

Despite using this steady-state algorithm the GA quickly converges and thereafter usually becomes trapped in local optima, as shown in Figure 1.a. Figure 1.a also shows Random Mutation Hill Climbing (RMHC) [Forrest & Mitchell, 1993] for

GA - geometric block-value scaling GA - constant block-value scaling

448

448

403.2

403.2

358.4

358.4

313.6

313.6 fitness

fitness

GA RMHC

268.8

268.8

224

224

179.2

179.2

134.4

134.4

89.6

89.6

44.8

44.8

0

0 0

100

200

300 Figure 1.a)

400

500

600

700 800 generation

900

1000

0

100

200

300 Figure 1.b)

400

500

600

900

1000

GA RMHC GA, mutation only

448

448

403.2

403.2

358.4

358.4

313.6

313.6 fitness

fitness

GA RMHC GA, mutation only GA, crossover only

700 800 generation

268.8

268.8

224

224

179.2

179.2

134.4

134.4

89.6

89.6

44.8

44.8

0

0 0

100

200

300 Figure 1.c)

400

500

600

700 800 generation

900

1000

0

100

200

300 Figure 1.d)

400

500

600

700 800 generation

Figure 1: a. GA and RMHC on hierarchical-IFF; b. di erent block-value scalings; c. as a. with resource-based tness-sharing; d. as c. with bit reordering.

900

1000

Hierarchical Building-Block Problems for GA Evaluation 7 comparison. Fitness-scaling is not an issue in a steady-state GA - only the order of values is relevant. However, the relative value of the building-blocks, before they are summed, may still have an effect. For example, in one block-value scaling, 3 blocks of 2 bits may be worth more than 1 block of 4 bits, and in another the relationship may be reversed. Mitchell and Forrest suggest that a geometric increase in block-values as used in their original R2 problem, may encourage premature convergence and hitch-hikers [1993]. Figure 1.b. shows that even a constant block-value scheme (ie. all blocks at all levels confer a fitness contribution of 1) gives no significant improvement in performance, compared to the geometric increase in block-values (where each block is worth twice the value of blocks at the previous level).

1.1

Enforced diversity and building-block assembly

The H-IFF problem has two possible solutions for each building-block at each level - specifically, all-1 and all-0 solutions. Any algorithm that is to explore the space of solutions properly will need to maintain both sub-solutions at least as long as it takes to find the solution at the next level. So far, all search methods have converged onto one or the other solution for a given block, and if these are not compatible (i.e. both all-1, or both all-0, to be compatible), the search is trapped in this suboptima. To counter the strong convergence effects we introduce a resource-based fitness-sharing method. The principal idea behind fitness sharing is to decrease the value of (sub)solutions that are being exhibited by many members of the population so as to avoid over-convergence. Fitness sharing [Deb & Goldberg, 1989] uses genotype similarity to identify which members of the population are likely to be receiving their credit for solving the same (sub-) problem that other individuals solve. Implicit fitness sharing [Smith et al., 1993] uses an inherent partitioning of the problem, which corresponds directly to subsolutions in the problem space. Both of these methods could be applied to H-IFF. However, the method used here, though similar to implicit fitness sharing, is more transparent: each subproblem (2 for each block at all levels = 2(2N-1) in all) is treated as an independent resource which is depleted when an individual solves it and re-grows whenever an individual does not solve it. This method of resource-depletion uses explicit knowledge of the block-structure, and is therefore not presented here as a general purpose fitnesssharing method. Rather, it simply provides a tool for enforcing diversity, so that we can investigate the GA’s ability to combine building-blocks when diversity is maintained.

All resources are bound to a minimum of 0 and a maximum of 1, and the fitness an individual receives for a block is simply the block-value multiplied by the resource level. So, when a large number of individuals in the population are exhibiting the genes necessary to solve a subsolution, the contribution of that sub-solution to an individual’s fitness evaluation is near 0. Each individual is evaluated only once (at the time it is created), this means that an individual who first finds a new block will receive high-fitness, and will keep this high-fitness no matter how many times it is used for reproduction, without the danger of becoming low-fitness due to the resources it itself depletes. The ‘consumption’ and ‘decay’ of resources are applied via a consumption rate and a re-growth rate (set to 0.8 and 0.003 in these experiments). The re-growth rate is considerably lower than the consumption rate since it is applied for every evaluation for all members of the population, whereas the decay rate is only applied for the few members that solve that sub-solution. Figure 1.c shows that the GA succeeds in finding the global optimum for the 64-bit H-IFF very quickly now that it is prevented from converging. In fact, the GA usually finds both global solutions (i.e. 64-1s and 64-0s) in the same run. Notice that the fitness-sharing is sufficient to enable a cross-over only GA (figure 3.b) to succeed, in fact, it succeeds more quickly than the GA with mutation. A mutation-only and a RMHC curve are also shown. Notice that the hill-climber is really lost now; when it finds one block the resources are depleted and it has to move to another. This confirms that the fitness-sharing methods did not trivialize the problem, and that the population of the GA is necessary to enable recombination of blocks as they are found.

1.2

Linkage disequalibrium

So, it seems that the GA has performed the recombination of building-blocks that we wanted. But this success relies entirely on the heuristic of bit-adjacency. It should be clear that if the bits of a building-block are scattered along the genome, rather than in a contiguous block, the recombination operator of the standard GA cannot manipulate them effectively. Most cross-over events will disrupt blocks at least as often as combining them. When the bit-ordering of the H-IFF problem is re-arranged the GA fails, as shown in Figure 1.d. This is a much more serious problem than regular hitch-hikers which, as Figure 1.c. shows, are not a problem so long as diversity can be maintained in the population. Here diversity is being maintained, and an examination of the blocks that are exhibited by individuals in the population reveals that an even distribution of 8bit, and some 16-bit, blocks are found in both 1s and 0s at each block position. The problem is that

8 Richard Watson the cross-over operator cannot recombine them effectively when the garbage bits are interspersed with the correct-bits. Performance is reduced to that of a mutation-only population. In fact, it performs slightly worse. The use of H-IFF has enabled us investigate several of the central themes in the building-block hypothesis and confirm the problems of linkagedisequalibrium. The results demonstrate that the GA exhibits some successful properties and some inadequacies.

2

Future Work

There are several variants of the H-IFF class of problems that are easily identifiable. For example, blocks could be allowed to overlap, i.e. have shared bits; the branching-factor of the blockstructure could be higher than 2 (e.g. blocks made of eight sub-blocks); all-1s could be slightly higher value than all-0s; or blocks could be allowed to start at any position (which becomes an evaluation based on ‘run-length’). In a more broad framework, there is potential for defining real-valued transform and fitness functions that can be applied recursively. Each of these variants warrants further investigation. The H-IFF problem offers an ideal test-bed for GA methods. The experiments detailed in this paper cover only a narrow class of GA algorithms and a broader range of types and parameter settings may be investigated. In particular, more general methods for promoting diversity, to take the place of the resource-based fitness-sharing, can be tested thoroughly. With respect to the final experiment, preliminary investigations suggest that the Messy GA [Goldberg et al., 1989] has promise for overcoming the problems of recombining buildingblocks in the bit-reordered H-IFF problem.

3

Conclusions

This paper has reviewed the inadequacies of GA test-problems from the literature. In particular, the emphasis is on the modeling of problems with a partially-decomposable structure, where subproblems are identifiable, but, are not separable. The concepts of a transform function and a fitness function that can be applied recursively are identified, and it is illustrated how this can be used to guide a search process to find useful buildingblocks without pre-empting higher-level interdependencies. The Hierarchical-IF-and-only-IF function is introduced as a logical canonical example. H-IFF enables us to explore some of the central issues of the Schema Theorem and the buildingblock hypothesis, such as the ability to manipulate building-blocks and issues of linkage disequalibrium.

We have seen that the GA does not perform optimally in this type of problem (even when allowed to use bit-adjacency) unless a mechanism for promoting diversity is introduced. With a fitness-sharing mechanism to enforce diversity, the GA does successfully recombine the buildingblocks as they are found. However, even though the fitness-sharing method does not allow premature convergence and appropriate building-blocks are exhibited in the population, these cannot be combined effectively, unless the heuristic of bitadjacency is applicable. The H-IFF problem thus provides some useful clarity on the GA’s ability to manipulate buildingblocks and the Building-Block Hypothesis.

Acknowledgments The first author thanks Inman Harvey for providing the impetus to pursue these concepts and challenging me to ground them in a concrete model. To the members of the DEMO group, especially, Sevan Ficici, appreciation for assisting us in the research process.

References [Altenberg, L, 1995] “The Schema Theorum and Price’s Theorum”, FOGA3, editors Whitley & Vose, pp 23-49, Morgan Kauffmann, San Francisco. [Bull, L & Fogarty, TC, 1996], “Horizontal Gene Transfer in Endosymbiosis”, ALife 5, editors Langton & Shimohara, pp77-84, MIT Press. [Deb, K & Goldberg, DE, 1989], “An investigation of Niche and Species Formation in genetic Function Optimization”, ICGA3, San Mateo, CA: MorganKauffman. [Deb, K & Goldberg, DE, 1992], “Sufficient conditions for deceptive and easy binary functions”, (IlliGAL Report No. 91009), University of Illinois, IL. [Dumeur, R, 1996] “Evolution Through Cooperation: The Symbiotic Algorithm” in Artificial Evolution, editors Alliot, Lutton, Ronald, Snyers, Vol.1063, pp145-1587, Springer Verlag. [Forrest, S & Mitchell, M, 1993] “Relative Building-block fitness and the Buildingblock Hypothesis”, in Foundations of Genetic Algorithms 2, Morgan Kaufmann, San Mateo, CA. [Forrest, S & Mitchell, M, 1993] “What makes a problem hard for a Genetic Algorithm? Some anomalous results and their explanation” Machine Learning 13, pp.285-319.

Hierarchical Building-Block Problems for GA Evaluation 9 [Goldberg, DE, 1989] “Genetic Algorithms in Search, Optimisation and Machine Learning”, Reading Massachusetts, Addison-Wesley. [Goldberg, DE, & Horn, J, 1994] “Genetic Algorithm Difficulty and the Modality of Fitness Landscapes”, IlliGAL Report No. 94006, University of Illinois, IL. [Goldberg, DE, Deb, K, Kargupta, H, & Harik, G, 1993] “Rapid, Accurate Optimization of Difficult Problems Using Fast Messy Genetic Algorithms”, IlliGAL Report No. 93004, University of Illinois, IL. [Goldberg, DE, Deb, K, & Korb, B, 1989] “Messy Genetic Algorithms: Motivation, Analysis and first results”, Complex Systems, 3, 493-530. [Holland, JH, 1975] “Adaptation in Natural and Artificial Systems”, Ann Arbor, MI: The University of Michigan Press. [Jones, T, & Forrest, S, 1995] ‘Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms” ICGA 6, Morgan & Kauffman. [Kauffman, SA, 1993] “The Origins of Order”, Oxford University Press. [Mitchell, M, Holland, JH, & Forrest, S, 1995] “When will a Genetic Algorithm Outperform Hill-climbing?” to appear in Advances in Neural Information Processing Systems 6, Mogan Kaufmann, San Mateo, CA. [Maynard-Smith, JM, & Szathmary, E, 1995] “The Major Transition in Evolution”, WH Freeman and Co. [Margulis, L, 1993] “Symbiosis in cell evolution”, 2nd ed, WH Freeman and Co. [Mitchel, M, Forrest, S, & Holland, JH, 1992] “The royal road for genetic algorithms: Fitness landscapes and GA performance”, Proceedings of the first european conference on Alife, Cambridge, MA. MIT Press. [Simon, HA, 1969] “The Sciences of the Artificial” Cambridge, MA. MIT Press. [Smith, RE, Forrest, S, & Perelson, A, 1993] “Searching for Diverse, Cooperative Populations with Genetic Alforithms”, Evolutionary Computation 1(2), pp127149. [Whitley, D, Mathias, K, Rana, S & Dzubera, J, 1995] “Building Better Test Functions”, ICGA-6, editor Eshelman, pp239-246, Morgan Kauffmann, San Francisco.

[Whitley, D, Beveridge, R, Graves, C, & Mathias, K, 1995b] “Test Driving Three 1995 Genetic Algorithms: New Test Functions and Geometric Matching”, Jounal on Heuristics, 1:77-104.