Perturbing the Representation, Decoding, and ... - Semantic Scholar

3 downloads 0 Views 200KB Size Report
Haynes et al., 1996] Thomas Haynes, Dale Schoenefeld, and Roger Wainwright. Type inheritance in strongly typed genetic programming. In Kenneth E. Kinnear,.
Perturbing the Representation, Decoding, and Evaluation of Chromosomes Thomas Haynes

Department of Computer Science Wichita State University Wichita, KS 67260 E-mail: [email protected] Phone: (316) 978-3925

Abstract

insidious complications which hinder both evolution and the veri cation, repeatability, and meaningful comparisons of experiments [Daida et al., 1997]. Are there ways we can retain the same function and terminal set while either increasing or decreasing the complexity of the search? Can we unintentionally devise an experiment such that it supports our conjectures and allows us to be published, yet it also de es e orts by others to duplicate our results? The answer is yes. In the rest of this paper, we will highlight our experiences with clique detection in a graph. We will show how good intentions can harm the con dence we have in our results.

We investigate di erent genetic algorithm and genetic programming variants of representation, decoding, and evaluation of chromosomes for clique detection in graph. Small changes can drastically impact nding the evolutionary process, making fair comparisons dicult.

1 Introduction

While research into the interactions of function and terminal set size is sparse to non{existent (for examples, see [Montana, 1995] and [Haynes et al., 1995]), a rule of thumb for GP researchers is to keep both as small as possible. A key consideration is suciency [Koza, 1992]: the domain must be solvable with the given function and terminal sets. The researcher must balance the space of all possible parse trees against adding perhaps a composite function. For example, any boolean logic problem can be solved using the function set

2 De nitions The canonical GA chromosome, or string, representation utilizes a binary alphabet: f0; 1g. If a don't care symbol, i.e., either 0 or 1 can appear is utilized, we have the schemata alphabet f0; 1; g. A schemata is a template describing subsets of strings within the string. For example, the schema s = **0**1** describes all strings that have a 0 in the third position and a 1 in the sixth. The order of a schema is the number of 0's and 1's present in the template. (In s, the order is 2.) The de ning length of a schema is the distance between the outermost bits de ned on the binary alphabet. (With s, the de ning length is 3.) Building blocks have a small de ning length and are highly t. They are integral to the schema theorem, which de nes how the implicit parallel search of a GA \builds" better solutions over time [Holland, 1975, Goldberg, 1989]. With a string of length l, and a building block of de ning length , any crossover operation has a probability

F1 = fAND; OR; NOTg; but expanding it to

F2 = fAND; OR; NOT; NAND; NOR; XORg might turn an intractable problem into a manageable one. If we consider parse trees with a max depth of 4, the terminal set T = fA; B g, and the functions sets F1 and F2 , the number of unique parse trees expands from 1,075,406 to 99,799,514. However, not having to evolve XOR from rst principles might be worth the increase, i.e., how many of the new parse trees in F2  T capture more functionality than those F1  T ? To test this rule of thumb in action, we can either add extraneous functions and/or terminals or restrict the possible combinations of an existing set, i.e., strong typing [Montana, 1995]. We will discuss the impact changing the function and terminal sets has on evolution and also investigate more

Pl = l ? 1

of destroying a building block [Goldberg, 1989]. Consider the string of length l = 15 and a building block, b1, of de ning length  = 6 in Figure 1. The probability of crossover destroying b1 is Pl = 0:43. 1

building block b1

0

b2

***1*0***0*1*1* 0 14

2

Figure 1: Probability of crossover disrupting a building block.

5

3

6

7

3.1 Domain Characteristics

Given a graph G = (V; E ) a clique of G is a complete subgraph of G. We denote a clique by the set of vertices in the complete subgraph. As the subgraph of G induced by any subset of the vertices of a complete subgraph of G is also complete, it is sucient to investigate the maximal complete subgraphs of G, i.e., the maximal cliques. Furthermore, we de ne a candidate clique to be a complete subgraph which may or may not be maximal. Without de ning a search heuristic or the evaluation function, we already have a constraint on our representation in that we have to be able to represent each vertex from a graph. For a GP, this requirement will e ectively entail that the size of the terminal set will be the size of the graph. Thus, each di erent problem we consider will have di erent terminal set sizes1 . We consider two NP-complete problems, nding either the maximum clique, max clique (MC), of G or the set of all cliques, clique cover (CC), of G [Garey and Johnson, 1979] (page 194):

We can illustrate that the combination of the low cardinality candidate cliques into higher cardinality candidate cliques meets the characteristics of a Royal Road function [Mitchell et al., 1992]: 1) All of the desired building blocks are known in advance. 2) The landscape can be varied systematically. 3) The global optimum, and all local optimum, can be enumerated. With the example graph shown in Figure 2, we can list all of the building blocks: C = f f0; 1g; f0; 2g; f0; 3g; f1; 2g; f1; 3g; f2; 3g; f4; 5g; f4; 6g; f4; 7g; f5; 6g; f5; 7g; f6; 7g; f0; 1; 2g; f0; 1; 3g; f0; 2; 3g; f1; 2; 3g; f4; 5; 6g; f4; 5; 7g; f4; 6; 7g; f5; 6; 7g; f0; 1; 2; 3g; f4; 5; 6; 7gg: Since we know all of the candidate cliques, we can calculate the tness for all interesting combinations of building blocks. We vary the tness landscape by adding or deleting edges. So, given that we construct a graph such that we know all of the possible candidate cliques, i.e., building blocks, we are forming Royal Road functions. We want to construct graphs for which we can characterize properties. With the FC family of graphs, we vary the number of cliques present and their cardinality. For any given graph, we restrict all cliques to have the same cardinality. With the FC family of graphs, we have set the following properties: 1) The number of cliques is known beforehand; 2) The cardinality of each clique is known beforehand, so the maximal clique size is also known; and, 3) The cliques are disjoint. We label a graph by rst the number of cliques and then the cardinality. Thus fc2-64.clq refers to a graph with 2 cliques each of cardinality 64. Again, without even specifying which search heuristic we are using, the representation, and the evaluation function, we have already created scope for others not being able to reproduce our results. Are the vertex labels assigned such that all vertices within a clique are

Covering by cliques: Given that G = (V; E ) and a positive integer K  E , are there are k  K sub-

sets V1 ; V2 ; : : : ; Vk of V such that each Vi induces a complete subgraph of G and such that for each edge fu; vg 2 E there is some Vi that contains both u and v? Or, can we determine all cliques of G?

Clique: Given that G = (V; E ) and a positive integer K  V , does G contains a clique of size K or more, i.e., a subset V 0  V with jV 0  K such that every two vertices in V 0 are joined by an edge in E ? Or, can we nd the maximal cardinality clique of G?

Figure 2 is an eight node graph which illustrates both max clique and clique cover. The max clique size is 4 and the clique cover is CC = ff0; 1; 2; 3g; f4; 5; 6; 7gg: 1 This growth of the terminal set is nothing new. Two examples are the Ephemeral Random Constant, which encompasses the in nite set of real numbers in the range [?1 0 1 0], and the di erent parity problems, where as you increase , you increase the size of the terminal set [Koza, 1992]. : ;

4

Figure 2: Example graph, consisting of 2 fully connected cliques of cardinality 4.

3 Clique Cover

k

1

:

k

2

contiguous? If so, does this labeling system impact the search? The answer is yes to both questions. Are we negligent in not mentioning this property? We could answer in self defense that the observant reader would notice this property in Figure 2 and could use this base case to determine the property held for all of the graphs in the FC family. While we will hold o proving that this property impacts the search, we will introduce a second family of graphs, FCR, which simply shues the vertex labels. The corresponding graph to fc2-64.clq would be fcr2-64.clq.

0

13

5 1

15 2 9 4

3.2 Prior Encodings

14

4

5 7

2 3

8

6

7

0

1

12

12

9

6

10

11

13

4

3

Various approaches have been taken within the GA community for clique detection [Bui and Eppley, 1995, Haynes, 1996, Soule et al., 1996, Soule and Foster, 1997, Haynes, 1998]. In this section, we will present how the di erent representations, decodings, and evaluation functions impact the evolutionary process.

Figure 3: Example graph, vertex labels inside the nodes refer to fc4-4.clq and vertex labels to the upper right and in italics refer to fcr4-4.clq.

3.2.1 Binary GA

3.2.2 Max Clique GP

10

14

15

Soule et al. investigate using GP to nd the max clique. They utilize one function, Union, and their terminal set is the set of vertex labels. In their evaluation of a chromosome, duplicate vertices are thrown out. For the chromosome in Figure 4, the subgraph is f4; 5; 6g. The tness of a chromosome is then based on the number of edges in this subgraph and whether it is a clique and the size of the clique. The rst encourages the evolution of subgraphs with high incidence vertices and the last encourages the evolution of larger cliques.

Bui and Eppley used a binary encoding to solve for the max clique. Each position represented a vertex label from the graph and a '1' indicated that the vertex was present in the max clique. Soule et al. point out that such an approach fails to exploit building blocks. If two labels are connected, but not close together on the chromosome, then the de ning length will not be short, resulting in a greater chance of disruption of the building block [Soule et al., 1996]. We can illustrate this failure to exploit building blocks by considering the chromosomes which represent the same induced subgraph in the context of both of the graphs shown in Figure 3: fc4-4.clq and fcr4-4.clq. For fc4-4.clq, a chromosome is sfc4?4 = 1101 0001 1001 0101 and the de ning length of the max clique, Cfcr4?4 = f0; 1; 3g, is 3. For fcr4-4.clq, the equivalent chromosome is sfcr4?4 = 0011 0100 1101 1100 and the de ning length of the max clique, Cfcr4?4 = f13; 5; 2g, is 11. For this example, crossover is 3 32 times more likely to disrupt the clique in the FCR than in the FC family of graphs. While the FC family of graphs is a special case2, Theorem 1 can be proven for any graph. Bui and Eppley use a preprocessing which relabels vertices in order of the degree of incidence.

Union

Union

4

Union

Union

6

5

A 5

4

Figure 4: Example chromosome for fc4-4.clq and the GP system presented in [Soule et al., 1996]. While the ordering of vertex labels does not in uence their system, depending on the graph being considered, their evaluation function can be very fragile, i.e., very susceptible to destructive crossover or mutation. For example, assuming subtree mutation generates a new subtree of depth 0 or 1 and with the chromosome in Figure 4, there is a 93% probability that subtree mutation at node

Theorem 1 If the vertex labels are assigned such that all vertices within a clique are contiguous, the disruption of building blocks is reduced. 2

11

Because of the disjointness property.

3

With the clique cover, the goal is to discover all maximal cliques in the graph4 and a natural encoding is to represent multiple candidate cliques in the chromosome. Instead of a single Union operator, we employed one (IntCon) to join vertices into a subgraph and one (ExtCon) to connect subgraphs together into a list5 . Each chromosome represents a set of candidate maximal cliques. The tness evaluation rewards for both the clique size and the number of cliques in the tree. To gather the maximal complete subgraphs, the reward for size is greater than that for numbers. The evaluation also does not reward for a clique either being in the tree twice or being subsumed by another clique. The rst falsely in ates the tness of the individual, while the second invalidates the goals of the problem. The algorithm for the decoding of the genotype to phenotype is:

A will remove the property that the induced subgraph is a clique and a 3% probability that it will express the remaining vertex to nd the max clique. The larger the height di erence between the level to mutate and the max depth level, the more likely mutation will disrupt a clique and the less likely that it will expand the candidate clique's cardinality. Even if the subgraph is already not a clique, subtree mutation is not likely to result in a clique. For crossover between two chromosomes, we have the following cases: 1. Both of the induced subgraphs are not complete. The result depends speci cally on the material contained within the chromosomes. One or both of the subgraphs could be almost complete. There is no destructive crossover at the clique level, but the degree of connectivity might be disrupted. A key factor is whether the search is converging to a local optima. 2. One of the induced subgraphs is complete. The results follow from above. 3. Both of the induced subgraphs are complete, but the subgraph which results from the union of the two is not complete. Crossover is disruptive for both chromosomes3. 4. Both of the induced subgraphs are complete and so is the subgraph which results from the union of the two. Crossover is not disruptive for either chromosomes and may lead to a larger clique. The case for crossover is more complex, but Theorem 2 will still hold. Finally, if a vertex is represented two or more times in a chromosome, regardless of whether the subgraph is complete or not, one copy of the vertex can be changed and not decrease the connectivity of the original induced subgraph. While the change may remove the completeness of the subgraph, this backup property allows a candidate clique to expand into a larger candidate clique. Theorem 2 The probability of disruption of clique is inversely proportional to the ratio of the size of the clique to the size of the graph.

1. Parse the chromosome into a sequence of candidate subgraphs, each represented by an ordered list of vertex labels. 2. Throw away any candidate subgraphs which duplicate any of the vertex labels. 3. Throw away any candidate subgraphs that are not complete, which leaves only candidate cliques. 4. Throw away any duplicate candidate cliques and any candidate cliques that are subsumed by other candidate cliques.

We have used GP to nd both the clique cover [Haynes, 1996] and the max clique of a graph [Haynes, 1998].

An example chromosome for the fc4-4.clq graph in Figure 3 is shown in Figure 5. It has four induced subgraphs, and the only clique is B: C = ff4; 6; 7gg: The others are eliminated because they violate at least one of the rules: C contains duplicate vertices, i.e., 3 is repeated; A is not completely connected; and, D is subsumed by B. For a given graph, the representation in [Haynes, 1996] can generate more unique chromosomes than that used in [Soule et al., 1996]. This increase in the search space is o set as the representation causes the chromosomes in [Haynes, 1996] to be less susceptible to destructive crossover and mutation than those in [Soule et al., 1996]. Consider the chromosome in Figure 5 and assume that subtree mutation has disrupted subtree B, i.e., the candidate clique f4; 5; 6g. While the subgraph induced by this subtree is no longer complete, the ExtCon node acts as barrier and isolates the damage to this subtree (see [Andre and Teller, 1996] for an overview of adverse subtree interaction). Any other candidate cliques would still be complete.

3 This is especially true for the FC family of graphs, but less so for graphs in which the union of two cliques is not empty: the union of the subgraphs may not be complete, but there may be a subset of the vertices which can safely be transfered as they form the \core" of both subgraphs.

4 And for max clique, the goal is to discover the maximal maximal clique. 5 Strong typing [Montana, 1995] and type inheritance [Haynes et al., 1996] are used to ensure that the parent of an ExtCon node is either the root or another ExtCon node.

3.2.3 Clique Cover GP

4

ExtCon

ExtCon

A

B

C IntCon

IntCon

14

ExtCon

9

6

IntCon

5

D IntCon

IntCon

IntCon

IntCon

4

1

5

3

4

IntCon

0

3

2

Figure 5: Example chromosome for fc4-4.clq and the GP system presented in [Haynes, 1996]. A semantical interpretation is that ExtCon lters out subgraphs which are not complete. It also removes duplicate or subsumed subgraphs. As a result, ExtCon facilitates the growth of back{up material in segments which are non{coding6 [Haynes, 1996]. However, the evaluation function introduces some fragility back into the system. Step #2 forces all subtrees to have unique vertices present, e.g., in Figure 5, subtree C is not a candidate clique. If we select one of the terminal nodes in subtree B and perform subtree mutation such that a subtree of depth 0 or 1 is created, then the probability of disruption is 99.6% and the probability of enlarging the candidate clique is 0.3%. The intent is for Step #2 to force the evolution of exactly the correct solution in the chromosome. If we remove Step #2 from the decoding algorithm, then IntCon is no longer selective. In e ect, ExtCon now serves as a group separator for the representation used in [Soule et al., 1996], i.e., each chromosome can represent several cliques in parallel. Notice that we have neither changed the function nor terminal set sizes, yet this seemingly innocuous change has altered the dynamics of evolution: the probability of destructive crossover or mutation has decreased (99.6% versus 93fc4{4.clq graph) and the probability of constructive mutation has increased (0.3% versus 3%). This new decoding algorithm has a further bene t in that destructive mutation is still isolated since it need not e ect the whole chromosome as in [Soule et al., 1996]. Finally, since it enforces parsimony, Step #2 must reduce code growth inside each group expressed in the chromosome. Another concern about the veri cation, repeatability, and meaningful comparisons of our experiments is that while Step #2 has been used in all of our research into clique detection, it has not been reported in any of

the prior publications (see for example [Haynes, 1996, Haynes, 1997]). Any researcher trying to duplicate our results would have seen larger chromosomes and a steeper learning curve than our published ndings.

6 These are segments which do not contribute either positively or negatively to the evaluation of the chromosome.

7 We also used this encoding for random search, hill climbing, and simulated annealing heuristics.

3.2.4 Max Clique GA Soule and Foster have also used a GA encoding to investigate the relation between graph characteristics and GA hardness [Soule and Foster, 1997], i.e., how hard a particular problem is for a GA to solve. As mentioned in Section 3.2.3, both the GA encoding of Bui and Eppley and their previous GP encoding could not maintain several candidate cliques in a single chromosome. They utilize a grouping GA [Falkenauer, 1995] to maintain multiple candidate cliques in the chromosome. They x the number of initial groups in a chromosome to be initially 30 and slowly reduce it to 4. Since they are interested in max clique and not clique cover, only the largest group contributes to the tness of the chromosome.

3.2.5 Clique Cover GA We have used a GA encoding7 to investigate both max clique and clique cover in a subset of the FC family of graphs [Haynes, 1998]. In this encoding, we x the chromosome length, use a vertex encoding, and adopt a grouping GA to allow for a variable number of candidate cliques in a chromosome. Unlike Soule and Foster, we do not x the number of groups in the chromosome. Instead we extend the alphabet, allowing for a grouping marker, i.e., -1, which is never a valid vertex label, to indicate the ending of a group. We allow a grouping marker to appear with some probability pm (set to 0.1 for all experiments) during both the

5

References

initial random generation of chromosomes and mutation. The number of groups needed for a given graph is evolved along with the solution. The chromosome 14 9 -1 5 4 6 -1 1 0 3 2 3 -1 5 4 -1 corresponds to the GP chromosome given in Figure 5. The change of search heuristic from GP to GA drastically changes the evolutionary curve. The GA is able to detect cliques faster than the GP. This increase could be a result of transforming to a more natural representation, but we believe it is due to the di erence in the e ectiveness of the two crossover operators. As reported in [Rosca and Ballard, 1995], even with the traditional choice of selecting the crossover point at 90% as a function node and 10% as a terminal node, uniform selection causes the majority of subtree crossover to occur at the bottommost levels of the parse tree, i.e., the terminals and the level just above them. As a result we are not seeing the exchange of large amounts of genetic material. If the clique cardinalities are large, then this crossover operator will slowly evolve large candidate cliques. In contrast, with uniform selection used in the GA crossover operation, larger amounts of genetic material are likely to be exchanged. With larger clique cardinalities, this crossover will quickly evolve large candidate cliques. A consequence of this integration of candidate cliques is a chromosome with candidate cliques larger than the average is more likely to have more children in the next generation (sound familiar?). Thus, once we have a large candidate clique, it can serve as a springboard for discovering even larger candidate cliques: if a candidate clique of cardinality n is not maximal, then it serves as the core subgraph of n candidate cliques of cardinality n + 1 and recursively, it also serves as a core subgraph for n(n + 1) candidate cliques of cardinality n + 2. Finally, with the GP representation, half of the positions will always be either intragroup or intergroup separators. With the GA representation, the initial distribution of group markers is controlled by the pm parameter and eventually evolves to encompass the optimal number of groups in each chromosome.

[Andre and Teller, 1996] David Andre and Astro Teller. A study in program response and the negative e ects of introns in genetic programming. In John R. Koza, David E. Goldberg, David B. Fogel, and Rick L. Riolo, editors, Genetic Programming 1996: Proceedings of the First Annual Conference, pages 12{20, Stanford University, CA, USA, 28{31 July 1996. MIT Press. [Bui and Eppley, 1995] Thang Nguyen Bui and Paul H. Eppley. A hybrid genetic algorithm for the maximum clique problem. In Larry Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 478{484, San Francisco, CA, 1995. Morgan Kaufmann. [Daida et al., 1997] Jason Daida, Steven Ross, Je rey McClain, Derrick Ampy, and Michael Holczer. Challenges with veri cation, repeatability, and meaningful comparisons in genetic programming. In John R. Koza, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max Garzon, Hitoshi Iba, and Rick L. Riolo, editors, Genetic Programming 1997: Proceedings of the Second Annual Conference, pages 64{69, Stanford University, CA, USA, 13-16 July 1997. Morgan Kaufmann. [Falkenauer, 1995] Emanuel Falkenauer. Solving equal piles with the grouping genetic algorithm. In Larry Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 492{ 497, San Francisco, CA, 1995. Morgan Kaufmann. [Garey and Johnson, 1979] Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Co., San Francisco, CA, 1979. [Goldberg, 1989] David E. Goldberg. Genetic Algorithms in Search, Optimization & Machine Learning. Addison-Wesley, Reading, MA, 1989. [Haynes et al., 1995] Thoms Haynes, Roger L. Wainwright, Sandip Sen, and Dale A. Schoenefeld. Strongly typed genetic programming in evolving cooperation strategies. In Larry Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 271{278, San Francisco, CA, 1995. Morgan Kaufmann. [Haynes et al., 1996] Thomas Haynes, Dale Schoenefeld, and Roger Wainwright. Type inheritance in strongly typed genetic programming. In Kenneth E. Kinnear, Jr. and Peter J. Angeline, editors, Advances in Genetic Programming 2, chapter 18. MIT Press, 1996. [Haynes, 1996] Thomas Haynes. Duplication of coding segments in genetic programming. In Proceedings of

4 Conclusions Researchers often key in on various control parameters, e.g., population size, crossover rate, mutation rate, max depth, max number of nodes, etc., to explain di erences in results between independent implementations of experiments in both GA and GP. We have shown how the selection of the representation of candidate solutions in the chromosome, the decoding of genotype to phenotype, and the evaluation function can all in uence the results of evolution. Small changes in each of these selections can increase the probability of destructive crossover and mutation while not changing the search space. 6

the Thirteenth National Conference on Arti cial Intelligence, Portland, OR, August 1996. [Haynes, 1997] Thomas Haynes. Phenotypical building blocks for genetic programming. In Thomas Back, editor, Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA97), San Francisco, CA, 1997. Morgan Kaufmann. [Haynes, 1998] Thomas D. Haynes. Collective Adaptation: The Sharing of Building Blocks. PhD thesis, The University of Tulsa, 1998. [Holland, 1975] John H. Holland. Adpatation in Natural and Arti cial Systems. University of Michigan Press, Ann Arbor, MI, 1975. [Koza, 1992] John R. Koza. Genetic Programming: On the Programming of Computers by Natural Selection. MIT Press, Cambridge, MA, 1992. [Mitchell et al., 1992] Melanie Mitchell, Stephanie Forrest, and John H. Holland. The royal road for genetic algorithms: Fitness landscapes and GA performance. In Toward a Practice of Autonomous Systems: Proceedings of the First European Conference on Arti cial Life, pages 245{254, Cambridge, MA, 1992. MIT Press. [Montana, 1995] David J. Montana. Strongly typed genetic programming. Evolutionary Computation, 3(2):199{230, 1995. [Rosca and Ballard, 1995] Justinian Rosca and Dana H. Ballard. Causality in genetic programming. In L. Eshelman, editor, Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA95), pages 256{263, Pittsburgh, PA, USA, 15-19 July 1995. Morgan Kaufmann. [Soule and Foster, 1997] Terence Soule and James A. Foster. Genetic algorithm hardness measures applied to the maximum clique problem. In Thomas Back, editor, Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA97), San Francisco, CA, 1997. Morgan Kaufmann. [Soule et al., 1996] Terence Soule, James A. Foster, and John Dickinson. Using genetic programming to approximate maximum clique. In John R. Koza, David E. Goldberg, David B. Fogel, and Rick L. Riolo, editors, Genetic Programming 1996: Proceedings of the First Conference, Stanford University, CA, USA, 28{31 July 1996. MIT Press.

7