Optimizing Worst-case Scenario In Evolutionary ... - Semantic Scholar

Optimizing Worst-case Scenario In Evolutionary Solutions To The MasterMind Puzzle Juan-J. Merelo and Antonio M. Mora

Carlos Cotta

Departamento de Lenguajes y Ciencias de la Computación Universidad de Málaga Departamento de Arquitectura y Tecnolog´ıa de Computadores Email: [email protected] University of Granada Email: {jmerelo,amorag}@geneura.ugr.es

Abstract—The MasterMind puzzle is an interesting problem to be approached via evolutionary algorithms, since it is at the same time a constrained and a dynamic problem, and has eventually a single solution. In previous papers we have presented and evaluated different evolutionary algorithms to this game and shown how their behavior scales with size, looking mainly at the game-playing performance. In this paper we fine-tune the parameters of the evolutionary algorithms so that the worstcase number of evaluations, and thus the average and median, are improved, resulting in a better solution in a more reliably predictable time.

I. I NTRODUCTION MasterMind [1] is a board game that has enjoyed worldwide popularity in the last decades. Although its current version follows a design created in the 70s, the antecedents of the game can be traced back to traditional puzzles such as bulls and cows or the so-called AB [2]. Briefly, MasterMind is a twoplayer code-breaking game, or in some sense a single-player puzzle, since one of the players –the codemaker (CM)– has no other role in the game than setting a hidden combination, and automatically providing hints on how close the other player –the codebreaker (CB)– has come to correctly guess this combination. More precisely, the flow of the game is as follows: •

•

•

The CM sets a length ` combination of κ symbols. Therefore, the CB is faced with κ` candidates for the hidden combination, which is typically represented by an array of pegs of different colors (but can also be represented using any digits or letters string) and hidden from the CB. The CB tries to guess this secret code by producing a combination with the same length, and using the same set of symbols. As a response to this move, the CM acting as an oracle (which explains the inclusion of this game in the category called oracle games) provides information on the number of symbols guessed in the right position (black pegs in the physical board game), and the number of symbols with the correct color, but in an incorrect position (white pegs); this is illustrated in Table I. The CB uses (or not, depending on the strategy he is following) this information to produce a new combination, that is assessed in the same way. If he correctly

978-1-4244-7835-4/11/$26.00 ©2011 IEEE

guesses the hidden combination in at most N attempts, the CB wins. Otherwise, the CM takes the game. N usually corresponds to the physical number of rows in the game board, which is equal to fifteen in the first commercial version. • CM and CB are then interchanged, and several rounds of the game are played. The player that is able to obtain the minimal amount of attempts wins. The solution to this puzzle is a quite interesting combinatorial problem, as it relates to other generally called oracle problems such as the hacking of the PIN codes used in bank ATMs [3] or uniquely identifying a person from queries to a genetic database [4]. It is also a complex problem, which has been shown to be NP-complete under different formulations [5], [6], for which several issues remain open, e.g., what is the lowest average number of guesses needed to solve the problem for any given κ and `. Associated to this, there arises the issue of coming up with an efficient mechanism for finding the hidden combination independently of the problem size, or at least, a method that scales gracefully when the problem size increases. This paper is mainly concerned with using evolutionary algorithms to solve it. Most such evolutionary approaches are based on providing the CB with a set of potential combinations among which she has to select her next move. This decisionmaking process is very important, since although all the potential candidates may be compatible with the information available, the outcome of the move can be very different, ranging from minimal reductions in the set of potential solutions, to a major pruning of the search space. In the past the biggest effort has been set on designing the overall structure of the algorithm: finding the best population structure [7], [8] or the best fitness function (in the proposed EvoRank algorithm [9], [10]). However, as already noted in [11], it is a needle in a haystack problem with constraints changing dynamically along the way, so a good analysis and tuning of the evolutionary algorithm is needed to find the best solutions. In this paper we will concentrate on finding the parametrization and general structure of the evolutionary algorithm that yields the best worst case results, that is the one whose highest number of evaluations for finding the solution in a single game is the lowest, improving at the same time the performance

2669

of the solution finding algorithm. This implies designing the evolutionary algorithm with the best average running time (in terms of evaluations needed to find the solution, but also of absolute running time), and, in turn, will allow to extend the algorithm to sizes not approachable before. We will redesign the algorithms used so far, and analyze differences with respect to EvoRank [9], [10], E GA [7] and other algorithms tested. At the same time, we will also try to improve the quality of the solutions to the MasterMind puzzle. The rest of the paper is organized as follows: next section presents the state of the resolution of the MasterMind puzzle and its evolution, having special emphasis in its evolutionary solutions; we will explain then the general characteristics of the evolutionary algorithm used to solve MasterMind in this work. Results obtained with this setup will be presented in Section IV, and we will finish the paper with the conclusions that derive from them (commented in Section V). II. BACKGROUND As mentioned in Section I, a MasterMind problem instance is characterized by two parameters: the number of colors κ and the number of pegs `. Let Nκ = {1, 2, · · · κ} be the set of symbols used to denote the colors. Subsequently, any combination, either the hidden one or one played by the CB, is a string c ∈ N`κ . Whenever the CB plays a combination cp , a response h(cp , ch ) ∈ N2 is obtained from the CM, where ch is the hidden combination. A response hb, wi indicates that the cp matches ch in b positions, and there exist other w symbols in cp present in ch but in different positions. A central notion in the context of the game is that of consistency. A combination c is consistent with a played combination cp if, and only if, h(c, cp ) = h(cp , ch ), i.e., if c has as many black and white pegs with respect to the cp as cp has with respect to the hidden combination. Intuitively, this captures the fact that c might be a potential candidate to be the hidden combination in light of the outcome of playing cp . We can easily extend this notion and denote a combination c as consistent (or feasible) if, and only if, it is consistent with all the combinations played so far, i.e., h(c, cip ) = h(cip , ch ) for 1 6 i 6 n, where n is the number of combinations played so far, and cip is the i−th combination played. Any consistent combination is a candidate solution. It is straightforward to see that the number of feasible solutions decreases with each guess made by the CB (as long as she always plays feasible solutions; otherwise no reduction at all might happen). For the same reason, feasibility is a dynamic property that all the solutions initially have, and eventually loose at some point (depending on the feedback from the CM), except for the hidden combination that always remains feasible. This transient nature of feasibility (or in other words, the decreasing size of the space of potential solutions) turns out to be a central feature in the strategies devised to play MasterMind. Thus, a very na¨ıve approach is to play a combination as soon as it is found, in which case the objective is to find a consistent guess as fast as possible. For example, in [12] an evolutionary algorithm is described for this purpose.

TABLE I P ROGRESS IN A M ASTER M IND GAME THAT TRIES TO GUESS THE SECRET COMBINATION ABBC. 2 ND AND 4 TH COMBINATIONS ARE NOT consistent WITH THE FIRST ONE , NOT COINCIDING IN TWO POSITIONS AND ONE COLOR WITH IT. Combination AABB ABFE ABBD BBBE ABBC

Response 2 black, 1 white 2 black 3 black 2 black 4 black

These strategies are fast and do not need to examine a big part of the space. Playing consistent combinations eventually produces a number of guesses that uniquely determine the secret code. However, both the maximum and mean number of combinations that need to be examined are usually high. Hence, some bias must be introduced in the way how combinations are searched; if not, the guesses will be no better than a purely random approach, as solutions found (and played) are a random sample of the space of consistent guesses. The main problem of the previous strategy is the fact that, after a combination is played, every consistent combination in the set will yield a different result, and a different reduction in search space size, once played; not discriminating about these possible outcomes when selecting the combination to be played impairs performance. For this reason a sensible algorithm should try to find out which combination within the consistent set is expected to maximally reduce the set of remaining combinations. This leads to a generic framework for defining MasterMind strategies endowed with 1) a procedure for finding a large set (even a complete one) Φ of feasible combinations, and 2) a decision-making procedure to select which combination c ∈ Φ will be played. Regarding the first item, Φ needs not be the full set of feasible solutions at a certain step: a fraction of around 1/6 is enough to find solutions that are statistically indistinguishable from the best solutions found. This was statistically established, and then implemented by the authors in an EA termed EvoRank [10]. As to the second item, this procedure should minimize the losses of the CB, i.e., reducing the number of feasible solutions in the next step as much as possible. A popular option to achieve this is to rely on the idea of Hash Collision Groups, HCG [2] or partitions. These are maximal sets of solutions that will remain feasible given a certain feedback provided by the CM. Since the goal of the CB is to find the hidden combination in as few steps as possible, she is interested in obtaining as small a partition as possible. Therefore, it makes sense to focus on the size of these partitions (which one will be the remaining feasible partition is obviously not known in advance, so some heuristic reasoning is here required). As an example, let us consider the first combination in Table I: if the hidden combination considered is AABB, there will be 256 combinations whose response will be 0b, 0w (those with other colors), 256 with 0b, 1w (those with either an A or a B), and so on. Some partitions may also

2670

TABLE II TABLE OF PARTITIONS AFTER TWO COMBINATIONS HAVE BEEN PLAYED ; THIS TABLE IS THE RESULT OF COMPARING EACH COMBINATION AGAINST ALL THE REST OF THE SET, WHICH IS THE SET OF CONSISTENT COMBINATIONS IN A GAME AFTER TWO COMBINATIONS HAVE ALREADY BEEN PLAYED . I N BOLDFACE , THE COMBINATIONS WHICH HAVE THE MINIMAL WORST SET SIZE ( WHICH HAPPEN TO BE IN THE 1 B -1 W COLUMN , BUT IT COULD BE ANY ONE ); IN THIS CASE , EQUAL TO TEN . A STRATEGY THAT TRIES TO MINIMIZE WORST CASE WOULD PLAY ONE OF THOSE COMBINATIONS . T HE COLUMN 0 B -1 W WITH ALL VALUES EQUAL TO 0 HAS BEEN SUPPRESSED ; COLUMN FOR COMBINATION 3 B -1 W, BEING IMPOSSIBLE , IS NOT SHOWN EITHER . S OME ROWS HAVE ALSO BEEN ELIMINATED FOR LACK OF SPACE .

Combination AABA AACC AACD AACE AACF ABAB ABAD ABAE ABAF ABBC ABDC ABEC ABFC ACAA ACCB ACDA BBAA BCCA BDCA BECA BFCA CACA CBBA CBDA CBEA CBFA DACA EBAA FACA FBAA

0b-2w 0 8 6 6 6 8 6 6 6 4 3 3 3 0 4 0 8 4 3 3 3 8 4 3 3 3 6 6 6 6

0b-3w 0 0 2 2 2 0 2 2 2 4 4 4 4 0 4 0 0 4 4 4 4 0 4 4 4 4 2 2 2 2

Number of combinations in the partition with response 0b-4w 1b-1w 1b-2w 1b-3w 2b-0w 2b-1w 0 14 8 0 13 1 0 10 5 0 8 4 0 11 6 1 4 5 0 11 6 1 4 5 0 11 6 1 4 5 0 10 5 0 8 4 0 11 6 1 4 5 0 11 6 1 4 5 0 11 6 1 4 5 0 10 8 0 8 1 1 11 9 1 4 2 1 11 9 1 4 2 1 11 9 1 4 2 0 14 8 0 13 1 0 10 8 0 8 1 0 16 10 2 5 3 0 10 5 0 8 4 0 10 8 0 8 1 1 11 9 1 4 2 1 11 9 1 4 2 1 11 9 1 4 2 0 10 5 0 8 4 0 10 8 0 8 1 1 11 9 1 4 2 1 11 9 1 4 2 1 11 9 1 4 2 0 11 6 1 4 5 0 11 6 1 4 5 0 11 6 1 4 5 0 11 6 1 4 5

be empty, or contain a single element (4b, 0w will contain just AABB, obviously). For a more exhaustive explanation see [13]; the whole partition set for an advanced stage of the game is shown in Table II. Each combination is thus characterized by the features of these partitions: the number of non-empty ones, the average number of combinations in them, the maximum, and other characteristics related to the reduction of the size of the search space that might distinguish them. For instance, in Table II, the combination ABDC (and others) would have a maximum of non-zero partitions (none of them has zero elements), while AACC (and others, shown in boldface) have a minimum worst-case partition size (equal to ten). ~ = {Ξibw } be a threeTo formalize these ideas, let Ξ dimensional matrix that estimates the number Ξibw of combinations that will remain feasible after combination ci is played and response hb, wi is obtained from the CM. Then, the potential strategies for the CB are: 1) Minimizing the worst-case partition [14]: pick ci = arg mini {maxb,w (Ξibw )}. For instance, in the set in Table II this algorithm would play one of the combinations shown in boldface (the first one in lexicographical order,

2b-2w 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1

3b-0w 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

since it is a deterministic algorithm). 2) Minimizing theP average-case partition [15], [16]: pick ci = arg mini { b,w pbw Ξibw }, where pbw is the prior probability of obtaining a P particular P outcome. If for instance we compute pbw = i Ξibw / i,b,w Ξibw , then AACD would be the combination played among those in Table II. 3) Maximizing the number of potential partitions [13]: pick ci = arg maxi {|{Ξibw > 0}|}, where |C| isthe cardinality of set C. For example, a combination such as ABDC in Table II would result in no empty partition. This strategy is also called most parts. 4) Maximizing the information gained [7], [17], [18]: pick ci = arg maxi {Hb,w (Ξibw )}, where Hb,w (Ξi[·][·] ) is the entropy of the corresponding sub-matrix. On the other hand, the strategy defined by Kooi [13] is based on the assumption that the size of the partitions is irrelevant and that rather the number of non empty partitions created, n, was important. This is an advantageous assumption since computing the number of partitions is faster than determining their expected size or entropy. For this reason the most parts

2671

Algorithm 1: Evolutionary MasterMind approach with endgames (Evorank-EG) 1

typedef Combination: vector[1..`] of Nκ ;

2

procedure M ASTER M IND (in: ch : Combination, out: guesses, evals: N); var c: Combination; var b, w, e: N; var P : List[hCombination, N2 i] ; // game history var F : List[Combination] ; // known feasible solutions var χ: Set[Nκ ] ; // potential colors in the hidden combination

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

evals ← 0; guesses ← 1; P ← []; χ ← Nκ ; F ← []; ; // initialize game c ← I NITIAL G UESS(`, κ); hb, wi ← h(c, ch ) ; // initial guess is evaluated while b 6= ` do played.A DD(hc, hb, wii) ; // game history is updated if b + w = 0 then // endgame #1 R EMOVE(χ, c) ; // remove colors in c end if if b + w = ` then // endgame #2 R EMOVE(χ, c¯) ; // remove colors not in c end if guesses ← guesses + 1 ; // run the EA RUN EA (↓ χ, ↓ P , l F , ↑ c, ↑ e) ; // Arrows indicate in, out and in/out parameters evals ← evals + e ; // update cumulative number of evaluations hb, wi ← h(c, ch ) ; // current guess is evaluated end while

strategy has a computational advantage, and has been used by us in our previous work [19]. Algorithm 1 shows a sketch of our evolutionary approach to MasterMind. A major component in the algorithm are endgames [20], which were added to EvoRank to create the EvoRank-EG method; in this context endgames refer to exhaustive search algorithms once that, due to special combination of the constraints (the answers given by the CM), the search space has been quite reduced. These endgames were used in two particular cases: when the answer is all whites, which means that the solution is a permutation of the played combination, and when it is no whites or blacks, effectively reducing the number of colors in the combination to those that are not in the combination played. The general framework, already presented in Figure 1, will be explained in detail just next. III. A LGORITHM ::M ASTER M IND ::E VO , SOLVING M ASTER M IND USING EVOLUTIONARY ALGORITHMS Previous instances of the MasterMind-solving EA used particular evolutionary algorithms: Estimation of Distribution Algorithms [8] or a Canonical GA [10]. In this paper we have introduced a more general evolutionary algorithm, so that we can explore all the possibilities to obtain the required exploitation/exploration balance. We will explain the different parts of this algorithm in the next paragraphs. The fitness function was introduced in previous papers, and has two different factors. One of them is the number of black and white peg changes needed to make the current

combination cguess consistent [19]:

f (cguess ) = −

n X

|h(ci , cguess ) − h(ci , csecret )|

(1)

i=1

This number is computed via the absolute difference between the number of black and white pegs h that the combination ci has had with respect to the secret code csecret (which we know, since we have already played it), and what cguess obtains when matched with ci ; being h a vectorial function, || is then equivalent to the taxicab distance or L1 . For instance, if the played combination ABBB has obtained as result 2w, 1b and our cguess CCBA gets 1w, 1b with respect to it, this difference will be |2 − 1|w + |1 − 1|b = 1. This operation is repeated over all the combinations ci that have been played. There was a problem with this fitness: if a combination is consistent, its fitness was equal to zero, and also equal to all the other consistent partitions, creating a neutral evolution landscape which impeded the progress of evolution for them. This distance is not the only factor; as we have seen in the previous section, not all combinations have the same ability to solve the puzzle, so it is sensible to include whatever score we are using to rank them also within the fitness function. That is what we introduced, starting with EvoRank, and carrying it over to this algorithm. Initially, the fitness of non-consistent solutions is computed as f (cguess ). Let us call this score graw ,

2672

which is then defined as: ( graw (cguess ) =

f (cguess ) P (cguess )

f (cguess ) < 0 f (cguess ) = 0

structure used in the Algorithm::MasterMind::Evo module, which we have shortened to just E VO. (2)

That is, for a consistent solution fitness is the number of nonempty partitions (noted with P ), resulting in negative fitness for non-consistent and positive for consistent solutions. This fitness g is then lineally transformed (in case it is used in fitness-proportional selection methods) by making g(cguess ) = graw (cguess ) + 1 − min{f (c)} c

so that the worst non-consistent solution will have fitness 1, and the best consistent solution its initial number of partitions plus one plus the minimum negative distance to consistency, ensuring in addition that consistent solutions are always better than non-consistent ones, and also different depending on their score, which can then be used by evolution to improve the population. The first combination is fixed to include half the colors present in the alphabet; for instance, for κ = 6 it would be ABCA. This option was initially proposed by Knuth [14] and has been adopted ever since1 . The evolutionary algorithm runs as shown above, but it includes an anti-stagnation mechanism as follows: • Continue evolutionary search until at least a certain number of consistent solutions are found. • If not enough solutions are found, continue until the number of consistent solutions does not change for three generations. This low number was chosen to avoid stagnation of the population. • If at least a single consistent solution is not found after 100 generations, reset the population substituting it by a random one. Again, this was a number considered high enough to imply stagnation of search, giving at the same time the algorithm a chance to find solutions. This means that, when the evolutionary loop exits, we always have a non-zero set of consistent solutions; if it contains a single individual, it is played; if it contains several, one is randomly chosen among those with the highest number of non-zero partitions. These algorithms are valid through a wide amount of situations, but it is difficult to adapt them to different situations by altering the selective pressure or adding different strategies for selection or replacement. In any case, they were all programmed using the A LGORITHM ::E VOLUTIONARY Perl EA library [21], which does have many other modules to create any kind of evolutionary algorithms (or hooks so that you can create your own and insert them), so in this paper we decided to go for the most general skeleton algorithm with a pluggable architecture that allows to select whatever method we want, and parametrize them in any way we want too; that is the 1 It remains to be proved, however, that it is really optimal; so far most algorithms have adopted it, and most experiments prove it yields the best results.

Since we had more flexibility with this general evolutionary algorithm skeleton, first we changed the operators used. While previously we had used single-character mutation and singlepoint crossover, in this paper we kept the mutation, but added permutation, an operator that without changing the characters present in the combination, alters its order. Besides, after realizing that since at a particular point in search the strings composing the population were very similar and thus crossover did not have any effect on them, we have introduced a diff uniform crossover: an uniform crossover operator that swaps a randomly selected portion of those characters in the string that are different; for instance when applied to ACCDE and BCEEF it could interchange all but the second character (a C in both cases), leaving, for instance, ACCEF and BCEDE or ACEEE and ACCDE. This operator finds which characters in the string are different, and swaps a predefined portion of them, 50% in this case, which in practice means interchanging one or at most two characters for strings with length up to five. So, in the example, it does nothing if it differs only in one character, but if it differs in two, it will swap precisely one of the two that is different; thus, the distance between parents and offspring will be at least one and at most half as many different characters as the parents; in that sense, it is similar to the DPX crossover operator proposed by Freisleben [22], which produces offspring that is as distant from its parents as they are from each other. This new crossover operator is intended to create diversity by making as many operations as possible create new individuals. This operator, besides, has been complemented by a selection that ensures that both parents in a crossover operation are different; this was not done in the previous version. Instead of using roulette wheel selection, which is slow and does not offer much in the way of tuning, we have used in this paper tournament selection. Tournament selection introduces stronger selective pressure than roulette wheel [23], but in any case allows for a stronger exploitation of the search space, which can balance the bigger exploration brought by the new crossover and permutation operators. This exploration is also increased by using a selector that makes sure that the two individuals chosen for crossover are different. These modifications we have introduced in Evo have the effect of decreasing the probability of takeover by a single individual of the whole population. All these changes make the new algorithm almost a 70% faster, at least for the problem sizes contemplated in the past. These are, in part, algorithmic (average and median number of evaluations is decreased, as will be seen later on) and in part implementation improvements, which are considered by us as also essential in EA research. As usual, the implementation of this algorithm is available at the writing of this paper at http://opeal.cvs.sourceforge.net/opeal/ and will be later on part of the Algorithm::MasterMind Perl module published in CPAN (http://search.cpan.org/dist/Algorithm-MasterMind/).

2673

TABLE V VALUES FOR THE E VO PARAMETERS THAT OBTAIN THE BEST RESULT. M UTATION , PERMUTATION AND CROSSOVER RATE ARE ALL 33% IN ALL CASES , AS EXPLAINS IN THE TEXT. Size Consistent Set Population ` = 4, κ = 6 ` = 4, κ = 8 ` = 5, κ = 8 ` = 5, κ = 9

20 30 40 100

400 400 800 1000

Replacement Rate 90% 75% 75% 90%

IV. E XPERIMENTAL RESULTS AND DISCUSSION The experiments have been performed on three instances of the MasterMind problem, namely the version defined by ` = 4 pegs and κ = 8 colors, and others with ` = 5 and κ ∈ {8, 9}. In all cases a problem-generator approach has been considered: 5,000 runs of each algorithm have been carried out, each one of a randomly generated instance (i.e., hidden combination); the file with these instances is available from the above mentioned URL for comparison. To maximize the breadth of the benchmark, the instances have been generated so that any combination in the test set is used at most once more than any other existing instance (for the problem and space sizes handled), this means that no instance is used twice unless all have been used at least once for combinations of length 4. For combinations of length 5, it implies that no combination will appear twice. This has been the same approach taken in previous papers [20], so that we will be able to compare results with them. By default, the new individuals created each generation replace 25% of the population, and are generated in equal proportion by mutation, permutation or crossover, that is, one third of new individuals are generated using each method. Tournament selection size is two, and the new individuals always replace the worst. In order to find the best combination of values the best option would be to make an experimental design using ANOVA. However, since every run takes several hours and there are many parameters to evaluate, we have proceeded by fine-tuning the parameter values until we find the best combination. Results for the average number of guesses needed to play are shown in Table III, along with results for previous algorithms published in [20] and the previous state-of-the-art results published by Berghman et al. [15]; the average number of evaluations needed is shown in Table IV. Finally, the parameters used to obtain those results with this new E VO algorithm are shown in Table V. The first thing that can be observed in these results is that it is possible to improve on previous results on both fronts: no need to sacrifice the average of turns played to get results in less time. In fact, they are the best results obtained in the series of papers published so far, and close to the results published by Berghman et al., although it is not exactly comparable due to the fact that the number of instances they used was different and no standard deviation was published. The number of evaluations needed is also much better than the one we

Fig. 1. Distribution of the number of evaluations for the former EvoRank-EG algorithm [20] (shown in red color and dashed lines) and the one presented in this paper, E VO, this is shown in solid blue) solving the ` = 5, κ = 9 MasterMind.

obtained for EvoRank-EG, although still worse than the much better results obtained by E GAC M (for a worse number of guesses, though). Improvement varies from around 20% for the easiest problem to almost 70% for the most difficult problem. Not only average is better, standard deviation is also much better, showing that the methods are much more robust in finding the solution. From a statistical point of view, the results of E VO are significantly better (in number of guesses) than those of EvoRank-EG for ` = 4, κ = 6 (at 10% level) and not significantly different in the remaining instances. The number of evaluations of E VO is in all instances significantly better (at 5% level) than those of EvoRank-EG. Thus, from a multiobjective point of view it can be said that E VO dominates EvoRank-EG in a Pareto sense. As to E GAC M, differences with E VO are always statistically significant at 5% level. Hence, they are mutually non-dominated (better number of guesses in E VO, lower number of evaluations in E GAC M). If we use a multiobjective performance indicator such as hypervolume (picking the worst values of number of guesses and number of evaluations as reference points, we obtain that E GAC M dominates a larger portion of the fitness space for ` = 4, κ = 6, and E VO does so in the remaining instances, thus providing a good trade-off between computational cost and quality of solutions. As to the motives why E VO outperforms EvoRank-EG, Figure 1 sheds some light. It plots the percentiles of the number of evaluations needed to find the solution in the 5,000 instances of the ` = 5, κ = 9 MasterMind. This particular instance has been chosen as typical and also due to the fact that the differences are quite big and can be easily visualized; the other cases would be more or less similar. Both distributions of the number of evaluations per game are centered between

2674

TABLE III C OMPARISON OF THE EVOLUTIONARY APPROACHES , ALONG WITH PREVIOUS RESULTS PUBLISHED BY B ERGHMAN [15]. C ELL VALUES INDICATE THE MEAN NUMBER OF GUESSES AND THE STANDARD ERROR OF THE MEAN . `=4 E GAC M E VO R ANK -EG E VO

Berghman et al.

κ=6 4.425 ± 0.011 4.438 ± 0.011 4.414 ± 0.011 4.39

`=5 κ=8 5.207 ± 0.013 5.158 ± 0.013 5.148 ± 0.012

κ=8 5.733 ± 0.013 5.627 ± 0.012 5.619 ± 0.011 5.618

κ=9 6.057 ± 0.013 5.956 ± 0.012 5.95 ± 0.012

TABLE IV C OMPARISON OF THE EVOLUTIONARY APPROACHES . E GAC M IS A COEVOLUTIONARY ALGORITHM , INTRODUCED BY US WITH E VO R ANK -EG IN [20]; E VO IS THE ALGORITHM INTRODUCED IN THIS PAPER . C ELL VALUES INDICATE THE MEAN NUMBER OF EVALUATIONS AND THE STANDARD ERROR OF THE MEAN . `=4 E GAC M E VO R ANK -EG E VO

κ=6 1902 ± 15 5020 ± 27 3899 ± 28

`=5 κ=8 2698 ± 36 8318 ± 114 6949 ± 48

20,000 and 50,000 evaluations; in fact, their medians are quite similar, 33,000 for EvoRank-EG and 31,250 for E VO. However, the distribution for the former EvoRank-EG is much more skewed, with very heavy runs (games in the 5·104 −5·105 range) up from the 75th percentile, and even a few (150 games) with even more evaluations than that, accounting for the big difference in average number of evaluations. In general, thus, the worst case is optimized: while the maximum number of evaluations stood at 72,255,000, it has been now reduced to 3,632,500.

Fig. 2. Comparison of the evolution of entropy for the former EvoRank-EG algorithm [20] (shown in red color and dashed lines) and the one presented in this paper, E VO (solid black) trying to discover the combination FIHCA. The graph points are same-generation averages for ten runs.

Since there are several differences between both algorithms, what is exactly the key for this reduction? First, we should analyze the reason for the number of evaluations being so high the first place. One of the mechanisms the algorithms have to avoid getting stuck without finding the solution is a reboot of the whole population when a certain number

κ=8 9540 ± 431 28556 ± 6359 19758 ± 556

κ=9 14741 ± 840 113551 ± 15574 36485 ± 413

of generations pass without finding a single combination to play [10]. When one game goes over a certain number of evaluations it is because this mechanism has activated. This allows the game to proceed in some occasions, but obviously at the cost of starting in a purely exploratory fashion, and also to completely lose the occasion of finding the solution if the allotted number of generations (50) is spent without finding a consistent combination. When the number of evaluations reaches several millions, that is exactly what happens: the population keeps restarting until, by pure chance, it finds the (possibly unique) consistent combination. This all point to the causes of this: diversity is not kept along evolution, so it is impossible to generate the needed solution when a certain point is reached. We have tried to prove this point by making a comparison of both algorithms for a single combination. EvoRank and E VO algorithms have been run for ten times, with the parameters shown in [20] and here (Table V), and trying to find the same combination: FIHCA, to ensure that the problem was exactly the same for both.

The average same-generation entropy (measured as the Jensen-Shannon entropy) for the algorithms is plotted in Figure 2. In fact, the x axis has been limited, since the maximum generation for EvoRank was four times as big. The graph shows that entropy is bigger for E VO in two crucial phases of the algorithm: at the beginning, when it is trying to find out as many combinations to fill the set of consistent solutions and at the end, when it is seeking the lone combination that constitutes the hidden one. In fact, entropy, on average, grows during the evolution for E VO, while it shows some growth at the beginning for EvoRank, dropping afterwards. Even if this is just a single experiment, and the relationship among good results and entropy would have to be exhaustively proved at least for all possible sizes, it hints that the reason for obtaining better results with the algorithm presented in this paper is that it keeps diversity high, as intended.

2675

V. C ONCLUSIONS AND FUTURE WORK Once the basics of the evolutionary solution of MasterMind had been laid out in previous papers, in this one we have tried to enhance the results by trying a more general evolutionary algorithm, with new diversity-maintaining operators and finetuning it so that it improves the previously found results, doing it at the same time in less number of evaluations, mainly by minimizing the maximum number of evaluations needed to find the solution. We have proved in this paper that by using this more general evolutionary algorithm and fine-tuning the parameters we can obtain better results than before, and besides in less expected time: the average time is more than halved, but the worst time improves by orders of magnitude making simpler a game or demo oriented implementation. It should be noted also that while before the expected number of evaluations was bigger than the total size of the search space (which is expected, since the combinations already evaluated are not memorized) it is now smaller, at least for the ` = 5 instances. This also implies that the number of evaluations grows slower-than-linear with space size; however, there are too few points to really be able to make a model of how this number will increase. The parameters needed to find the best solution point to an algorithm with a good amount of exploration, as shown by the fact that 2/3 of the new individuals are generated by exploratory mutation operators; this quantity is consistent along all combinations, and although bot the percentage of permutation/mutation and crossover have been increased in some experiments, results are worse. The critical parameter seems to be the replacement rate, which is quite higher pointing again to a exploratory regime that can keep diversity high to be able to eventually generate the single combination that is the solution. Please note that, since the consistent set size and the population are the same in this paper and in [20], the improvements in the solution must be due to the other changes in the algorithm. It remains to be seen whether these results can be further improved by applying specific diversity-enhancing mechanisms such as crowding or sharing, or else using again a coevolutionary algorithm as we did in previous papers [7]. It would also be interesting to perform an ANOVA analysis on parameters to find its influence on performance. ACKNOWLEDGMENTS This work is supported in part by the CEI BioTIC GENIL (CEB09-0010) MICINN CEI Program (PYR-2010-13) project, projects NEMESIS (TIN2008-05941) awarded by the Spanish Ministry of Industry, Science and Technology and P08-TIC03903 awarded by the Andalusian Regional Government. R EFERENCES [1] E. W. Weisstein, “Mastermind.” From MathWorld–A Wolfram Web Resource. [Online]. Available: http://mathworld.wolfram.com/ Mastermind.html [2] S.-T. Chen, S.-S. Lin, and L.-T. Huang, “A two-phase optimization algorithm for mastermind,” Computer Journal, vol. 50, no. 4, pp. 435– 443, 2007.

[3] R. Focardi and F. Luccio, “Cracking bank pins by playing mastermind,” in Fun with Algorithms, ser. Lecture Notes in Computer Science, P. Boldi and L. Gargano, Eds. Berlin Heidelberg: Springer-Verlag, 2010, vol. 6099, pp. 202–213. [4] M. Goodrich, “On the algorithmic complexity of the Mastermind game with black-peg results,” Information Processing Letters, vol. 109, no. 13, pp. 675–678, 2009. [5] J. Stuckman and G.-Q. Zhang, “Mastermind is NP-complete,” INFOCOMP J. Comput. Sci, vol. 5, pp. 25–28, 2006. [Online]. Available: http://arxiv.org/abs/cs/0512049 [6] G. Kendall, A. Parkes, and K. Spoerer, “A survey of NP-complete puzzles,” ICGA Journal, vol. 31, no. 1, pp. 13–34, 2008. [7] C. Cotta, J. Merelo Guervós, A. Mora Garc´ıa, and T. Runarsson, “Entropy-driven evolutionary approaches to the Mastermind problem,” in Parallel Problem Solving from Nature PPSN XI, ser. Lecture Notes in Computer Science, R. Schaefer, C. Cotta, J. Kolodziej, and G. Rudolph, Eds., vol. 6239. Springer Berlin / Heidelberg, 2010, pp. 421–431. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-15871-1\ 43 [8] T. P. Runarsson and J. J. Merelo, “Adapting heuristic Mastermind strategies to evolutionary algorithms,” in NICSO’10 Proceedings, ser. Studies in Computational Intelligence. Springer-Verlag, 2010, pp. 255– 267, also available from ArXiV: http://arxiv.org/abs/0912.2415v1. [9] J. J. Merelo, “Genetic Mastermind, a case of dynamic constraint optimization,” 1996, GeNeura Technical Report G-96-1, Universidad de Granada. [10] J. Merelo, A. Mora, T. Runarsson, and C. Cotta, “Assessing efficiency of different evolutionary strategies playing mastermind,” in Computational Intelligence and Games (CIG), 2010 IEEE Symposium on, August 2010, pp. 38–45. [11] J.-J. Merelo-Guervós, J. Carpio, P. Castillo, V. M. Rivas, and G. Romero, “Finding a needle in a haystack using hints and evolutionary computation: the case of Genetic Mastermind,” in Late breaking papers at the GECCO99, A. S. W. Scott Brave, Ed., 1999, pp. 184–192. [12] J. J. Merelo-Guervós, P. Castillo, and V. Rivas, “Finding a needle in a haystack using hints and evolutionary computation: the case of evolutionary MasterMind,” Applied Soft Computing, vol. 6, no. 2, pp. 170–179, January 2006. [Online]. Available: http://dx.doi.org/10.1016/j.asoc.2004.09.003 [13] B. Kooi, “Yet another Mastermind strategy,” ICGA Journal, vol. 28, no. 1, pp. 13–20, 2005. [14] D. E. Knuth, “The computer as Master Mind,” J. Recreational Mathematics, vol. 9, no. 1, pp. 1–6, 1976-77. [15] L. Berghman, D. Goossens, and R. Leus, “Efficient solutions for Mastermind using genetic algorithms,” Computers and Operations Research, vol. 36, no. 6, pp. 1880–1885, 2009. [16] R. W. Irving, “Towards an optimum Mastermind strategy,” Journal of Recreational Mathematics, vol. 11, no. 2, pp. 81–87, 1978-79. [17] E. Neuwirth, “Some strategies for Mastermind,” Zeitschrift fur Operations Research. Serie B, vol. 26, no. 8, pp. B257–B278, 1982. [18] A. Bestavros and A. Belal, “Mastermind, a game of diagnosis strategies,” Bulletin of the Faculty of Engineering, Alexandria University, December 1986. [Online]. Available: http://citeseer.ist.psu. edu/bestavros86mastermind.html [19] J.-J. Merelo and T. P. Runarsson, “Finding better solutions to the Mastermind puzzle using evolutionary algorithms,” in Applications of Evolutionary Computing, Part I, ser. Lecture Notes in Computer Science, C. di Chio et al., Ed., vol. 6024. Istanbul, Turkey: Springer-Verlag, 7 - 9 Apr. 2010, pp. 120–129, evoApplications2010 to be held in conjunction with EuroGP’2010, EvoCOP2010 and EvoBIO2010. To appear. [20] J. J. Merelo, C. Cotta, and A. Mora, “Improving and scaling evolutionary approaches to the MasterMind problem,” in Proceedings EvoApplica´ 2011, to appear. tions11, [21] J.-J. Merelo-Guervós, P.-A. Castillo, and E. Alba, “Algorithm::Evolutionary, a flexible Perl module for evolutionary computation,” Soft Computing, vol. 14, no. 10, pp. 1091–1109, 2010. [Online]. Available: http://www.springerlink.com/ content/8h025g83j0q68270/fulltext.pdf [22] B. Freisleben and P. Merz, “A genetic local search algorithm for solving symmetric and asymmetric traveling salesman problems,” in Evolutionary Computation, 1996., Proceedings of IEEE International Conference on. IEEE, 2002, pp. 616–621. [23] D. Goldberg and K. Deb, “A comparative analysis of selection schemes used in genetic algorithms,” Foundations of genetic algorithms, vol. 1, pp. 69–93, 1991.

2676