Genetic Algorithms and Relational Landscapes - Semantic Scholar

3 downloads 0 Views 160KB Size Report
A DGA is a genetic algorithm with novel features: relational ..... Royal Road functions (RRs) have been originally designed by M. Mitchell, S. .... D. E. Goldberg.
Genetic Algorithms and Relational Landscapes Philippe Collard

Cathy Escazut

Alessio Gaspar

Laboratory I3S | CNRS-UNSA 250 Av. A. Einstein, Sophia Antipolis, 06560 Valbonne FRANCE email: fpc,escazut,[email protected]

Abstract. A DGA is a genetic algorithm with novel features: relational

schemata. These structures allow a more natural expression of relations existing between loci. Indeed, schemata in standard genetic algorithms can only specify values for each locus. Relational schemata are based on the notion of duality: a schema can be represented by two strings. The intent of this paper is to show the superiority of DGAs over conventional genetic algorithms in two general areas: eciency and reliability. Thus, we show with theoretical and experimental results, that our algorithm is faster and perform consistently. The application chosen for test DGAs is the optimization of an extension of Royal Road functions we call relational landscapes.

1 Introduction Standard genetic algorithms (SGAs) are adaptive methods used to solve search and optimization problems. They are based on the genetic processes of biological organisms. They encode a potential solution to a speci c problem on a simple chromosome-like data structure to which operators of recombination are applied.

1.1 Basics of genetic algorithms A GA works on a population of individuals, being a possible solution of a given problem. Members of this population are generally represented by a binary string of length  which corresponds to the problem encoding. Each individual is assigned a tness score according to how good solution it is. The highly t individuals are given opportunities to \reproduce": they are rst randomly selected following a scheme which favors the more t members, and then recombined, typically using the mechanisms of crossover and mutation. Thus, this produces new \o springs", which share features taken from each \parent". An new population is thus created, containing a higher proportion of the characteristics possessed by the good members of the previous generation. In this way, over many generations, good characteristics are spread throughout the population, being mixed and exchanged with other good characteristics as they go. GAs manipulate individuals. However, most of the theoretical works focuses on the implicit processing of schemata.

1.2 Schemata and their properties Schemata, also called hyperplanes, are implicit similarity templates: they identify

subsets of individuals sharing certain characteristics. They are usually de ned over the ternary alphabet f0,1,?g where `?' is a \don't care" symbol. The schema theorem proves that some schemata, called building blocks get an exponentially increasing number of representatives in the population [8, 6]. Radcli e [10] de nes four properties he thinks necessary to a useful representation for GAs. These are:  the closure: the intersection of any pair of compatible schemata1 should itself be a schema,  the respect: crossing two instances of any schema should produce another instance of this same schema,  the proper assortment: given instances of two compatible schemata, it should be possible to cross them to produce a child which is an instance of both schemata,  the ergodicity: it should be possible, through a nite sequence of applications of genetic operators, to access any point in the search space given any population. Unfortunately, schemata are not sucient to represent all the hyperplanes of the space. Indeed, let us remember that hyperplanes of varying dimension in a -space describe schemata. For instance, in a two-dimensional space (=2), points of the space are schemata of order 2, lines are schemata of order 1, and the whole space is covered by the schema ?? of order 0. One can notice that each schema is represented by only one hyperplane; but the converse is false. For instance the hyperplane f00,11g is not associated with any schema.

2 An implementation of relational schemata Our objective is to increase the expressiveness of schemata in order to allow a more natural expression of solutions, while keeping the trueness of the properties. The approach we proposed is based on an implicit implementation of richer structures. The implicit character is due to the fact that the alphabet remains unchanged. The richness of the representation comes from taking into account the values on each loci, but also the relation between them. Indeed, in a standard schema, we can not enforce the equality or inequality of the values on di erent locus. Therefore we cannot adequately represent solutions which require to express links between di erent bits, as for the hyperplane f00,11g. This problem seems to disappear if we use variables [11], but this vastly increases the size of the search space. In order to express relations in schemata we have de ned new structures, called relational schema. 1

Two schemata, s1 and s2 , are said compatible if there exists a chromosome being an instance of both s1 and s2 .

2.1 De nition of relational schemata

A relational schema, or R-schema is de ned as a string built over the alphabet fX,X',?g. The symbol ? is a \don't care" character, the two symbols X and X' represent complementary variables: if X is bound to 0 then X' stands for 1 and vice versa. Thus, a R-schema allows to express a relation between values on loci. In order to represent at least one relation, a R-schema must have at least two variables. As standard schemata, de ned over the alphabet f0,1,?g, only express values on loci, we call them positional schemata or P-schemata. A R-schema can be identi ed with the set of its instances. For instance, the R-schema XX?X' represent the set f0001,0011,1100,1110g, we call a R-similarity set. In order to guarantee for the uniqueness of the representation, we x the rst variable to X2 . We can extend the notion of order, noted O, to a R-schema: the order of a R-schema is the number of variables decremented by 1. For example, the order of X?X' is 1. This de nition is consistent with the one for P-schemata. Indeed, in both cases, the number of instances of an O-order schema is 2?O .

2.2 Implementation of R-schemata

Taking advantage of an implicit implementation of R-schemata, we expect a double bene t: on the one hand, we desire the alphabet to remain unchanged| we keep on using f0,1,?g|on the other hand, we covet R-schemata to possess the four properties previously stated. We propose a quite simple implementation of this idea through a new encoding of the binary string. A head-bit is added to the string and manages the interpretation of the rest of the string. When the head-bit is set to `0', the rest of the string remains unchanged. If it is set to `1' the string is interpreted as its binary complement. For example, the string 0101 can be expressed as 0 0101 or 1 1010 (the head-bit is the underlined character). It is worth noticing that di erent chromosomes can be interpreted in the same way. Their phenotypes are identical (0101 in the example) while their genotypes are di erent (0 0101 and 1 1010). We propose to call strings of such a pair, dual chromosomes. Let us present the solution in a more formal way. Considering -bit strings, the search space is = f0; 1g. We de ne the dual space as < >= f0; 1g  . A GA only a ects the dual space. We thus have to de ne a mapping T , so-called transcription function, from the dual space < > to the basic space as: 8! 2 ; T (0!) = ! and T (1!) = !0 In other words, if the rst bit is equal to `1', the rest of the chromosome is complemented. Thus, a string and its bitwise complement encode a single phenotype. A fundamental point we wish to stress is that the GA is applied on the dual space < > but via the transcription function, it is implicitly activated on the basic space . Thus, the dual space is larger than the basic space. But, we can notice that the size of the phenotypical space remains unchanged. 2

This restriction does not decrease the generality of the approach since the R-schemata X?X' and X'?X represent the same R-similarity set.

2.3 P-schemata versus R-schemata We are going to establish a mapping between P-schemata of < >, i.e. strings of +1 characters over f0,1,?g, and R-schemata of , i.e., strings of  characters over fX,X',?g. This mapping founds our implicit implementation of R-schemata. According to the head-bit value of a P-schema of < >, we have two cases: 1. The head-bit is speci ed (`0' or `1'): the transcription is obvious. For instance, the P-schemata 0 1?1 and 1 0?0 of < > are associated to 1?1. More generally, each P-schema of corresponds with two P-schemata of < >. 2. The head-bit is undetermined (`?'): variables allow to describe a R-schema. For instance, the P-schema ? 01?0 becomes the R-schema XX'?X. Indeed, if the head-bit is for a `0' the transcription gives 01?0; if the head-bit is for a `1', the result is 10?1. These two schemata express that the rst and last bits are identical, but di erent from the second one. More generally, the transcription of such a schema is a R-schema of obtained as follows: (i) substitute the rst speci ed value and its occurrences, for X, (ii) substitute all the complementary values for X', (iii) keep all the undetermined locus. One can note that the transcription of a P-schemata of < > for which the order is smaller than 2 is the P-schema ?: : :? of . More generally, R-schemata of whose order is O are the transcriptions of P-schemata of < > for which the order is equal to O+1 and the head-bit is undetermined.

2.4 Properties of schemata in a DGA Let us study R-schemata considering the properties previously stated. Our aim is not to show that R-schemata are better than P-schemata. The purpose is to show that they are two complementary models. The closure : Before seeing if R-schemata own the closure property, let us extend the notion of compatibility: two R-schemata, r1 and r2 are compatible if there exists a chromosome being an instance of both r1 and r2. For instance, ?X?X and XX'?? are compatible. We re ne the notion of compatibility, considering corroborating schemata: two compatible R-schemata are corroborating if they share at least one variable locus. For example, the two previous R-schemata are corroborating, while ?XX'? and X??X' are not. The intersection of any pair of corroborating R-schemata is itself a R-schema. For instance, the intersection of ?X?X and XX'?? is the R-schema: XX'?X'. We can note that two non-corroborating R-schemata are necessarily compatible. Nevertheless, their intersection cannot be expressed as a single R-schema; for instance, the intersection of ?XX'? and X??X' can only be expressed by XYY'X' which is not a R-schema since there are two distinct variables. However, this expression can be represented by a disjunction of R-schemata: XYY'X' is equivalent to the disjunction (XXX'X' or XX'XX'). More generally, we can say that the intersection of any pair of non-corroborating R-schemata is a disjunction of R-schemata. In this sense, we say that R-schemata are semi-closed for intersection.

The respect : We have shown in [3] that explicit R-schemata do not own the respect property . Thus, we could think that they are not relevant for GAs. First, let us notice that the respect property is related to the crossover of schemata, which are strings explicitly handled by GAs. In our context, these strings are in the dual space. Moreover, for each R-schema, a choice between dual strings exists [2] and allows the crossover to respect R-schemata. Let ! be an instance of a R-schema of , if the variable X corresponds to a `0' (respectively `1') in !, we choose to represent ! in 0 (respectively 1 ). We can show that this choice creates a R-similarity set in < > closed by crossover. For example, let be the R-schema X?X', the R-similarity set in , f001,011,100,110g, is not closed but the corresponding R-similarity set in < >, f0 001,0 011,1 011,1 001g, is the P-schema ? 0?1. We can conclude that a DGA through its choices between dual strings, allows to handle R-schemata of which possess the respect property. The proper assortment Traditional 1-point and 2-point crossover do not properly assort neither P-schemata nor R-schemata [3]. A uniform crossover properly assorts P-schemata but it does not assort R-schemata. We are going to show that our implicit implementation allows crossover operators to properly assort R-schemata. Let us consider the corroborating R-schemata ?X?X and XX'??. Let 0000 and 0100 be two respective instances. A 1-point and a uniform crossover breed the children f0000,0100g. No o spring is an instance of the intersection XX'?X'. Let us choose to represent the string 0000 by the chromosome 1 1111 and 0100 by 0 0100. A uniform crossover allows to obtain 0 0101 that can be transcripted by 0101 and which belongs to the intersection XX'?X'. More generally, let a and b be two instances of two corroborating R-schemata of . Two equivalent strings a0 and b0 exist in the dual space and a uniform crossover between them generates an instance of the intersection. We can conclude that, through judicious choices between dual strings, a DGA allows uniform crossover to properly assort R-schemata. The ergodicity The property of ergodicity makes possible to access any point in the search space, given any population, through a nite sequence of application of genetic operators. The mutation operator usually ensures this property. Obviously, by applying a sequence of mutations, it is possible to reach any point in the space. This property is qualitatively independent of the space ( or < >) on which we apply the mutation operator. The only improvement we can expect using the dual space is quantitative. In the basic space , the minimum number of mutations between two strings is the Hamming distance. For two strings of (), this number is smaller than the length . We can show that, in the dual space, < >, the minimum number of mutations between two points is smaller than (=2) + 1. Let us consider a -dimensional space. Each point in this space can be considered as a vertex of the -dimensional unit hypercube. The greater Hamming distance between two points is equal to  (this is the case when the two strings are complementary ones). So, with the standard approach,  mutations are needed to go from one chromosome to the other. When we apply the mutation operator on the dual space, we are able to

cross from a string in the basic space to its complement by only one mutation3. In this way, two complementary strings are at a Hamming distance of 1 from each other.

3 Royal Road functions Now, we brie y introduce royal road functions and study the DGA's behavior in such an environment in comparison to a SGA. As DGAs are supposed to implicitly process R-schemata, these experiments are a study of the relevance of R-schemata versus standard P-schemata.

3.1 An historical overview Royal Road functions (RRs) have been originally designed by M. Mitchell, S. Forrest and J. Holland [9, 5] as a set of easy problems for GAs. In practice, they reveal themselves dicult despite of the fact that they reward presence of user-speci ed building blocks in chromosomes. This should normally make easier their implicit processing by GAs. Their speci cation is supposed to o er a \royal road" for GAs' convergence but even with an increasing number of such blocks GAs encounter problems. Hence, as DGAs are particularly ecient to solve GA-hard problems [4] we confront them to RRs.

3.2 Structural de nitions of Royal Road functions There are two kinds of RRs. The Royal Road Function 1 (R1) is composed by eight building blocks (each 8 bits long) rewarding presence of consecutive `1' within a 64 bits long chromosome. Thus, the global optima 111:::111 features a tness of 64 by combining all building blocks and associated rewards. The Royal Road Function 2 (R2) introduces new building blocks rewarding presence of two consecutive R1's ones (Figure 1). Thus optimal tness is increased up to 256. +16

... +8

R2

R1 +32 +64

Genes to 1

Genes to 0

Fig. 1. Royal Road 1 and 2 Decomposition. 3

A mutation on the head-bit.

3.3 Experiments Experiments have been achieved with respect to speci cations published in Mitchell and Al's papers. We used populations of 128 chromosomes, each 64 genes long, and measured at which generation an occurrence of the optima appeared. Each result has been averaged over 500 experiments. A DGA doubles its search space by introducing a head-bit. But the populations manipulated by both GAs always contain 128 individuals. The di erence of results obtained with the two systems remains constant, whether we faced them with R1 or R2. This advocates the fact that it only relies on the size of the population rather than on an hypothetic DGAs' weakness. Consequently, in order to obtain comparable results we have chosen to measure the number of evaluations computed before reaching the optima 4 rather than simply counting generations elapsed. DGAs have thus been allowed to work on a larger population (200) without altering the consistence of the measured criterion. Results on Royal Road 1: The di erence between both GAs remains constant, it takes 89472 for a SGA versus 95744 evaluations for a DGA to nd the optimal string with populations of 128 chromosomes. If we adjust its size to 200, we get 78400 versus 84000 evaluations to reach a similar goal. According to the di erence, their capabilities may be viewed as equivalent (with a little advantage to SGAs). Results on Royal Road 2: When applying a SGA and a DGA to R2, we get more interesting results; it takes them respectively 202112 and 207872 evaluations to nd the solution with population sized to 128 individuals. The di erence has decreased and so they may really be considered as becoming equivalent as the function diculty increases. When sized to 200 chromosomes, the population allows the DGA to nd a solution within 248600 evaluations while it takes 251400 evaluations for the SGA. At this stage, DGAs have proved to be as e ective than SGAs when the environment get more complex. Thus, in our search for an universal, adaptive, and robust optimization method, we must advocate that a DGA o ers multiple interesting features despite of the need of an increased population size. In order to deeply study reasons of this e ectiveness we will de ne new functions based on the relational concept and compare both GAs on them.

4 Relational landscapes This section is devoted to a description of our \extended royal road functions" featuring the use of R-schemata or a combination of R-schemata and P-schemata, instead of traditional P-schemata. 4

This is simply accomplished by multiplying the average value of the halting generation by the population size.

4.1 Foundations

In a machine learning context, R-schemata have already featured interesting results [3]. This section introduces them in the eld of optimization by adding to RRs the concept of relational building-blocks. While RRs only rely on P-schemata and thus only deal with positional informations about loci, we use R-schemata to build functions featuring more complex relations within binary chromosomes. Indeed, just as SGAs were supposed to implicitly process P-schemata, we now expect DGAs to naturally manage with R-schemata. The implicit handling of such schemata is sucient to justify DGAs expected e ectiveness on a landscape explicitly featuring such building blocks. So, we de ne R-Landscapes as such relational functions using R-schemata to describe their tness landscapes. Similarly, using both P-schemata and R-schemata, we de ne RP-Landscapes that combines in a same structure relational and positional informations and thus features the maximal expressiveness for a binary chromosome. Furthermore, RP-Landscapes may become a generic de nition of higher-level RR functions, allowing empirical studies and comparisons between di erent \enhanced GAs" able to handle more complex structures than P-Schemata.

4.2 Structural de nitions of R-Landscapes

A R-Landscape is a set of R-schemata rewarded as detailed in gure 2. By rewarding alternation of 8-bits sequences of consecutive `1' and then `0' we introduce two global optima starting by `1' or `0' and both featuring a 64 points tness. Similarly, a RP-Landscape is a set of R-schemata directly taken from the definition of R-Landscape and an additional P-schema that breaks its symmetry by rewarding the presence of a sequence of `1' at the beginning of the optimal string. This combination removes one of the prior optima and keeps only the one beginning with '1' for a tness of 72. +8

R-Landscape +8

RP-Landscape X

?

X

’1’

Fig.2. Relational Landscapes Decomposition.

4.3 Experiments

In this second set of experiments we compare both GAs on functions taking into account loci's dependencies. Experimental conditions remain unchanged.

Results on R-Landscapes: The DGA nds more quickly the global optimum than the SGA (142848 generations against 210048 for the SGA). We can note that the di erence is more signi cant than on classical RRs. Results on RP-Landscapes: Results are more interesting with a RPLandscapes since the DGA only needs only 125312 generations versus 212352 for the SGA. As expected, DGAs revealed an unchallenged superiority on R-Landscapes which bring to us an example of P-schemata intrinsic limitations. The superiority is even more signi cant than the SGA's one on RRs. Furthermore, as we have not increased the population size during DGA's experiments, we should view those results as a lower bound. This advocates the hypothesis of the DGA's ability to deal with higher-level buildings blocks (R-Building Blocks). More generally R-Landscapes feature symmetry, they represent a new way to describe multimodal deceptive functions as the ones introduced by Goldberg in [7]. These dual functions represent a new perspective in optimization. After the deceptiveness, the multimodality and royal road structures, RP-Landscapes and more generally dual functions appear as a new promising gap.

5 Conclusion and future works A DGA is an e ective improved GA keeping close enough to the original one in order to be formally studied. We have introduced two new kinds of schemata, both expected to be implicitly processed by DGAs. The empirical study of DGAs over a positional and relational testbed leads to encouraging results: it has con rmed the DGA's superiority on the new testbed and its equivalent e ectiveness on the classical one. A brief overview of related work shows that many attempts to increase chromosome's expressiveness are based on an extension of their allelic alphabet. Unlike this, DGAs keep using a low level approach: they remain based on binary chromosomes easy to be studied. GAs using larger alphabets, Evolutionary Algorithms, or even Genetic Programming feature more expressive chromosomes, but none of these paradigms provides enough formal, theoretical foundations comparing to the GA's schema theorem. For this reason, the meta approach appears to be very promising. DGAs are not just an other \good device", they contain strong theoretical foundations. As RRs do and with similar methods, this paper tried to empirically understand and quantify their behavior. A natural extension to both DGAs and R-Landscapes should consist in adapting the meta gene in uence. Rather than applying it to all genes it may be relevant to restrict its e ects to a few selected loci. This is a possible direct implementation of RP-schemata allowing the DGA to decide whenever it should use relational variables. Next step will consist in de ning a formal theory for meta-GAs inspired by traditional biology for example for which the genome expression remains an unresolved question, and by work concerning self adaptive GAs.

Among all these perspectives, on the one hand, DGAs appear as a privileged tool for applying R-schemata processing. On the other, RP-Landscapes provide a set of test functions for quantifying their e ectiveness. Thus, studying these R-Landscapes as multimodal deceptive functions, should be an other relevant extension to this work continuing the one begun by Goldberg but providing a new easy method for de ning such functions.

References 1. P. Collard and J.P. Aurand. DGA: An ecient genetic algorithm. In A.G. Cohn, editor, ECAI'94: European Conference on Arti cial Intelligence, pages 487{491. John Wiley & Sons, 1994. 2. P. Collard and C. Escazut. Genetic operators in a dual genetic algorithm. In ICTAI'95: Proceedings of the seventh IEEE International Conference on Tools with Arti cial Intelligence, pages 12{19. IEEE Computer Society Press, 1995. 3. P. Collard and C. Escazut. Relational schemata: A way to improve the expressiveness of classi ers. In L. Eshelman, editor, ICGA'95: Genetic algorithms and their applications: Proceedings of the Sixth International Conference on Genetic Algorithms, pages 397{404, San Francisco, CA, 1995. Morgan Kaufmann. 4. P. Collard and C. Escazut. Fitness Distance Correlation in a Dual Genetic Algorithm. In ECAI 96: 12th European Conference on Arti cial Intelligence, 1996. To appear. 5. S. Forrest and M. Mitchell. Towards a stronger building-blocks hypothesis: Effects of relative building-blocks tness on ga performance. In L. D Whitley, editor, Foundations of Genetic Algorithms 2, pages 109{126. Morgan Kaufmann, San Mateo, CA, 1993. 6. D. E. Goldberg. Genetic algorithms in search, optimization, and machine learning. Reading, MA: Addison-Wesley, 1989. 7. D. E. Goldberg, K. Deb, and J. Horn. Massive multimodality, deception and genetic algorithms. Technical Report 92005, Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign, Urbana, Il 61801, 1992. 8. J. H. Holland. Adaptation in natural and arti cial systems. Ann Arbor: University of Michigan Press, 1975. 9. M. Mitchell, S. Forrest, and J. H. Holland. The royal road for genetic algorithms: Fitness landscape and GA performance. In F.J Varela and P. Bourgine, editors, Proceedings of the First European Conference on Arti cial Life, pages 245{254, Cambridge, MA, 1992. MIT Press/Bradford Books. 10. N. J. Radcli e. Forma analysis and random respectful recombination. In R. K. Belew and L. B. Booker, editors, ICGA'91: Genetic algorithms and their applications: Proceedings of the Fourth International Conference on Genetic Algorithms, pages 222{229, San Mateo, CA, 1991. Morgan Kaufmann. 11. L. Shu and J. Schae er. VCS: Variable classi er systems. In J. D. Scha er, editor, ICGA'89: Genetic algorithms and their applications: Proceedings of the Third International Conference on Genetic Algorithms, pages 334{339, San Mateo, CA, 1989. Morgan Kaufmann. This article was processed using the LaTEX macro package with LLNCS style