Using Meta-Genetic Algorithms to tune parameters ... - Semantic Scholar

2 downloads 0 Views 11MB Size Report
Using Meta-Genetic Algorithms to tune parameters of Genetic Algorithms to find lowest energy Molecular Conformers. Zoe Brain1 and Matthew Addicoat 2.
Using Meta-Genetic Algorithms to tune parameters of Genetic Algorithms to find lowest energy Molecular Conformers Zoe Brain1 and Matthew Addicoat 2 1

2

School of Computer Science, Australian National University Research School of Chemistry, Australian National University [email protected]

Abstract Determining the electronic structure of long chain molecules is essential to the understanding of many biological processes, notably those involving molecular receptors in cells. Finding minimum energy conformers and thus electronic structure of long-chain molecules by exhaustive search quickly becomes infeasible as the chain length increases. Typically, resources required are proportional to the number of possible conformers (shapes), which scales as O(3^L) where L is the length. An optimized genetic algorithm that can determine the minimum energy conformer of an arbitrary long-chain molecule in a feasible time is described, using the tool, PyEvolve. The method is to first solve a generic problem for a long chain by exhaustive search, then by using the pre-determined results in a look-up table, to make use of a Meta-GA to optimize parameters of a simple GA through an evolutionary process to solve that same problem. By comparing the results using the tuned parameters obtained by this method with the results from exhaustive search on several molecules of comparable chain length we have obtained quantitative measurements of an increase in speed by a factor of three over standard parameter settings, and a factor of ten over exhaustive search.

Single and Double excitations) model which scales to the sixth power, and with inclusion of iterative Triples i.e. CCSD(T) scales to the seventh power [7]. The computational resources required to determine the energies of all conformers of a general molecule are determined by the length L – and are typically O(3L). Beyond length 10, the problems become infeasible using B3LYP/6-31+g(d,p) and exhaustive search techniques[7]. An increase of efficiency of one order of magnitude would therefore allow either an increase in length of 2, or one higher level of theory, while consuming the same or less computational resources. The initial molecule chosen for experimentation is the dipeptide carnosine (Figure 1), as the exhaustive search results were already available from previous work. Further molecules were examined later. The landscape of conformer energies for a dipeptide of length 8 such as carnosine correspond to an 8-dimensional manifold, with occasional gaps due to some molecular configurations resulting in infeasibly small inter-atomic distances.

Introduction In computational chemistry, there is a requirement to determine minimal energy conformers (shapes) of molecules such as dipeptides using a high level of theory, in order to determine their molecular properties. Typically there are thousands of such possible shapes for any particular molecule, and the calculation of energy for each would take O(10e3) CPU-seconds at 500 GFlops for a relatively simple level of chemical theory, but O(10e7+) for successively more complex levels. Traditionally, the method has been to determine a good subset at one level of theory, then use these as candidates for the next level, then take a further, smaller subset at that level, and so on until the required level of theory had been reached. Various levels of theory are used to determine energies. These vary from the semi-empirical AM1 (Austin Model 1)[1] and PM3 (Parametrized Model 3)[2][3] methods often used on such computationally intensive problems, through the higher level B3LYP (Becke, three-parameter, Lee-Yang-Parr) density functional [4][5] whose formal scaling is to the fourth power, the MP2 (Møller–Plesset 2nd order) method [6] which scales to the fifth power, the CCSD (Coupled-cluster including

Proc. of the Alife XII Conference, Odense, Denmark, 2010

Figure 1. Carnosine Carnosine (D-alanyl-L-histidine) is a dipeptide found in several human tissues, particularly skeletal muscle, heart tissue and the brain [9]. Its functions in each of these tissues is not well understood, but studies have shown that it possesses potent antioxidant properties, protects against neuronal cell death and that its zinc salt promotes the healing of peptic ulcers [10]. Carnosine may be considered to have 8 rotatable bonds (labeled a..h in Figure 1). Work by Izgorodina et al [11] . indicates that when starting from a previously optimized structure, 120 degree resolution is generally sufficient to map

378

sigma bonds. At this resolution, this yields 3^8 = 6561 possible conformers (shapes) of carnosine, some of which may be inaccessible as the combination of rotations places atoms at or near the same coordinates. In addition, it is known that in neutral polypeptides, rather than adopting its normal shape (Structure A, Figure 2), the carboxylic acid hydrogen may point away from the carboxylic acid group to form an intramolecular hydrogen bond with the amide oxygen atom (Structure B, Figure 2). As this does not correspond to a 120 o rotation, these two structures are considered separately in this paper.

values for different classes of problem. Other methods used include statistical or theoretical analysis.[15][16] The use of Meta-GAs to optimize parameters and thus tune GAs was first proposed by Grefenstette [17] and continued by Friesleben and Hartfelder [18] in 1993. de Laangraaf [19] showed that the performance of Meta-GA optimized simple GAs was at least comparable to those of adaptive ones. We therefore use Meta-GAs to optimize the simple GAs that calculate the lowest-energy conformers.

Aim Our object was to provide computational chemists with little or no experience in the use of GAs with a “turnkey” method of determining minimum energy conformers of molecules. What was needed was a set of default parameters to set the GA to to have a reasonably well optimized computational factory for generating candidate low-energy conformers. As a single calculation for a dipeptide of length 8 using UB3LYP/6-31+g(d,p) takes approximately 25 minutes of CPU time at 0.5 TFLOPS on the National Computational Infrastructure in Australia, exhaustive search calculations beyond this level of theory become computationally infeasible; and even a single calculation of the energy of one length-8 conformer to the CCSD(T) level would take time on the limits of feasibility today. Previous computational studies by Diez [20] on carnosine have featured only two neutral or zwitterionic conformers, the only study to consider the full conformational landscape of carnosine was undertaken using the semi-empirical PM3 method [21].

Exhaustive Search Method Figure 2 Optimized A and B Structures of Carnosine Simple GAs were a plausible candidate for finding the minimum energy conformers, but the best parameter values to be used in them were unknown. While there is considerable heuristic knowledge about these values for particular problem domains, there has been very little systematic research investigating the interaction between the different parameters that are used to define GAs. Research has been mainly confined to modifying one or two of the parameter values, keeping all the others constant. Work has recently concentrated on optimizing a particular GA for a particular problem, injecting more and more domain-knowledge into the genetic representation, and making the GA more and more specialized. This has been found to be a very fruitful line of research, with large degrees of optimization having been achieved. Most knowledge we have is on the effect of varying population size and mutation rate parameters in isolation, with the rest having been assigned arbitrary values.[12][13]. Nannen's results [14] using 120 different combinations of Evolutionary Algorithm (EA) operators on 4 different generic problems using the generic information-theoretical metric of Shannon Entropy found the different components varied greatly in importance, but did not give practical optimum

Proc. of the Alife XII Conference, Odense, Denmark, 2010

In order to produce the exhaustive search results, both A and B carnosine structures were constructed. Their geometries were optimized using the UB3LYP density functional and the 6-31+g(d,p) basis set [4][5]. All calculations were undertaken using the Gaussian09 suite [8]. The optimized structure was denoted carnosinea1b1c1d1e1f1g1h1 and had an energy of -796.150527 hartree. From this structure, internal coordinates for each conformer were generated. Single-point energy calculations, also using UB3LYP/6-31+g(d,p) were undertaken and the energies saved. The optimized geometries of both A and B structures are shown in Figure 2 For the “A” structure of carnosine, the optimized structure was only the second lowest energy structure (∆E = 4.72 kJmol-1). The global minimum corresponding to a1b2c1d1e1f1g1h1 differed by a single rotation and had an energy of -796.152327 hartree. 597 possible conformers were excluded due to infeasibly small interatomic distances. The optimized “B” structure was also similarly low in energy (∆E 3.85 kJ mol-1) but was only the third lowest in energy. Again, the a1b2c1d1e1f1g1h1 conformer proved to be the global

379

minimum and the intervening conformer was the a2b2c3d1e1f1g1h1 conformer (∆E 2.44 kJ mol-1). In this case 619 conformers were excluded based on interatomic distances.

densities. That is, given 3 possibilities, A, B and C, and if the PD of A was 500, B was 900, and C was 600, then the highest (B in this case) would always be chosen.

The corresponding structures were also calculated using the HF/6-31g (Hartree-Fock)[22,23,24] model chemistry. This is computationally less intensive by a factor of 20. However, results differed significantly from those produced by the UB3LYP/6-31+g(d,p) model, indicating that this is a less desirable technique than using GAs for minimizing computational load. That we are close to the limits of computing feasibility is shown by the fact that this is the first time the conformational preference of the gas-phase structure of carnosine has been calculated to the UB3LYP/6-31+g(d,p) level of theory. The calculations for the conformers of carnosine-A took 2300 CPU hours on 2.93 GHz Intel Nehalem CPUs.

Table 2 - Genome Representation of GA Parameters

Meta-Genetic Algorithm Method Table 1: Canonical Parameters of a Simple GA Parameter

Values

Arguments

Crossover

OX Uniform Two Point One Point

Probability

Swap

Probability

Binary

Probability

Gaussian

Probability Mean Standard Deviation Minimum Maximum

Mutators

Parental Selection

Probability

Tournament

Tournament Size

Uniform Rank RouletteWheel Elitism (True or False)

Population Size

Positive binary

Range

Population Size

5-1000

Uniform X-Over 1-1000 Probability Density

A Meta-GA was used to tune the parameters of a simple GA, which in turn determined the energy of the conformers of the dipeptide carnosine. The parameters of a simple GA are shown in Table 1. Our Meta-GA genome was implemented as a 1D list of 12 integers between 1 and 1000, as shown in Table 2. The problem of dealing with multiple mutually exclusive choices was dealt with by using “winner take all” probability

Proc. of the Alife XII Conference, Odense, Denmark, 2010

1-1000 with Floor of 5

a

Exclusive with OX, 1-Pt, 2-Pt, None.

One-Point X-Over 1-1000 Probability Density Two Point X-Over 1-1000 Probability Density No X-Over 1-1000 Probability Density Binary Mutator 0.001-1.000 Probability

1-1000 thousandths

Swap Mutator Probability

1-1000 thousandths

0.001-1000

Roulette Selector 1-1000 Probability Density

Uniform

Survivor Selection

Parameter

Exclusive with Tournament, Uniform, Rank

Tournament 1-1000 Selector Probability Density Uniform Selector 1-1000 Probability Density Rank Selector 1-1000 Probability Density The swap mutator could be used on its own, or in addition to binary mutation. Tournament Size was left at the default value of 2. Mutator probability is defined in PyEvolve[25] as the proportion of the genome where a mutation is attempted, each mutation having a probability of the mutator probability. Thus a chromosome of length 4 and a mutator probability of 0.5 would have two of its genes selected randomly possibly mutated, each with a probability of 0.5. Elitism was enabled. No tuning of the Meta-Ga itself was attempted: the default parameters of the PyEvolve toolset were used. These are Parent Selector:Rank; Tournament Size:2; Swap:Enabled, Mutation rate:0.02, Population Size:80, Crossover:1 Point, Crossover rate: 0.5. The Meta-GA termination condition was initially set to 20 generations, but later increased to 100 to confirm convergence. This Meta-GA was run 100 times yielding 100 different optimized parameter sets, with the corresponding fitness (number of computations required using that set) for each.

380

The termination condition for the conformer determining GA was when the the minimum energy conformer (known apriori) had been found. The fitness for a GA with that set of parameter values was the number of computations required to obtain this minimum. For the GA to determine minimum energy conformers, the genome was encoded as a simple vector of eight trinary numbers (a,b,c...g,h) each value corresponding to one of the three allowed positions of the corresponding bond. In this case, the difference between Binary and Gaussian mutators were not examined – the differences between a flat distribution and a Gaussian distribution, are negligible in the range 1..3. OX crossover was not appropriate for this representation, so was not used. To determine the energy for each conformer was just a matter of using a look-up table on the values for energy previously determined by the exhaustive search method. This enabled experimentation to be performed using significant quantities of evaluations. To calculate these values any other way would have taken many orders of magnitude more time. On the supercomputer network used in the experiment one such calculation took 20-30 CPU minutes.

Some general conclusions about conformer stability can be drawn from Figure 3. The very highest energies, clustered around -791 a.u. occur in three sets of three, corresponding to genomes of the form a[1-3]b2c3d2e1f[1-3]g2h3. These conformers all have the imidazole ring in extremely close proximity to the terminal NH2 group. The second cluster of high energy structures, having energies of approximately -793.055 a.u. also place the imidazole and NH2 groups in close proximity. These conformers correspond to genomes of the form a[1-3]b1c3d3e3f[1-3]g2h3 or a[1-3]b3c2d3e3f[13]g2h3. In contrast, the lowest five energy structures all preserve the final 5 bits of the genome as their optimized (original) values – i.e. d1e1f1g1h1. These conformers span an energy range of 11.4 kJmol-1, only just over the 10 kJmol-1 range that is typically considered chemically relevant. The 10 lowest energy conformers of carnosine A are shown in Table 4. The conservation of this portion of the molecule is even more pronounced in the B structure of carnosine, where the 15 lowest energy structures all preserve the original histidine conformer, as shown in table 5.

After tuning of the GA parameters, each of the 100 fittest tuned GAs was run 100 times to gain some measure of reliability, as some of the associated parameters were probabilistic. Therefore the outcomes were stochastic not deterministic. Initial experiments [26] only looked at population size, mutation rate, and 1-point crossover rate with Elitism enabled, the other values being set to the PyEvolve defaults (Parent Selector :Rank; Tournament Size:2; Swap:Enabled). These were applied to carnosine, and then to other molecules of comparable size to evaluate the general applicability of the technique.

Figure 3 – Energies of All Carnosine A Conformers

Results Exhaustive Search Calculations Calculated energies represent the stabilization of the molecule compared to all of its constituent particles (nuclei, electrons) separated to infinity and thus are negative quantities. To use linear scaling within PyEvolve, positive raw scores are required, therefore the fitness of any given conformer is made equal to zero minus its energy and the normal chemical problem of minimization becomes a maximization problem within PyEvolve. Figure 3 shows the negative energy (0 - E) of the 5970 non-excluded carnosine A conformers, 1288 of these conformers have an energy within 0.05 a.u. of the global minimum. This energy range is shown expanded in Figure 4. In each figure, “Conformer ID” represents the encoded genome, minus the alpha characters (i.e. a1b1c1d1e1f1g1h1 => 11111111) and listed in numeric order. Thus the vertical series of points apparent in Figure 4 represent sets of conformers where the first five bonds (a—e) are conserved.

Proc. of the Alife XII Conference, Odense, Denmark, 2010

Figure 4. Energies below 796.1025 a.u. Scale shows 0-E This greater conservation is expected, given the stabilization provided by the intramolecular hydrogen bond present in carnosine B, which effectively fixes bonds (d—g).

381

Table 2. 10 Lowest Energy Conformers of Carnosine A E(UB3LYP/6-31+g(d,p)) -796.152327 -796.150527 -796.149450 -796.148919 -796.147980 -796.147800 -796.147574 -796.147441 -796.147369 -796.147337

Genome a1b2c1d1e1f1g1h1 a1b1c1d1e1f1g1h1 a2b2c3d1e1f1g1h1 a1b1c2d1e1f1g1h1 a3b2c1d1e1f1g1h1 a1b2c1d1e1f2g3h1 a1b2c1d1e1f2g2h1 a1b2c2d1e1f1g1h1 a2b1c1d1e1f1g1h1 A1b1c1d1e1f2g2h1

Table 3. 15 Lowest Energy Conformers of Carnosine B E(UB3LYP/6-31+g(d,p)) -796.157523 -796.156592 -796.156054 -796.155025 -796.155001 -796.153013 -796.152772 -796.152660 -796.152270 -796.151997 -796.151893 -796.151554 -796.151122 -796.150818 -796.150469

Genome a1b2c1d1e1f1g1h1 a2b2c3d1e1f1g1h1 a1b1c1d1e1f1g1h1 a1b3c2d1e1f1g1h1 a1b1c2d1e1f1g1h1 a1b2c2d1e1f1g1h1 a3b2c1d1e1f1g1h1 a2b1c1d1e1f1g1h1 a3b1c1d1e1f1g1h1 a2b2c1d1e1f1g1h1 a2b3c2d1e1f1g1h1 a3b1c2d1e1f1g1h1 a2b1c2d1e1f1g1h1 a1b1c3d1e1f1g1h1 a3b2c2d1e1f1g1h1

Initial Experiments – Population Size, Mutation rate, 1-D Crossover rate The top 5 tunings of the GA are shown in table 3. To verify the performance of the GA parameters, the top 5 sets were also used to determine the lowest energy conformer of the B structure of carnosine. Each set of parameters was used 100 times, results are shown in Table 4. All 5 GAs find the global minimum 100% of the time, the worst case required 1056 evaluations (16% of the 5942 conformers). The mean number of evaluations for all GAs was between 176 (2.7%) and 253 (3.9%).

Table 4. Results for Top 5 Parameter Sets Carnosine B Init Rank

Pop size

Mut rate

XOvr rate

Min Evals

Max Evals

Mean Evals

1 2 3 4 5

6 2 31 11 32

0.238 0.225 0.403 0.341 0.365

0.156 0.005 0.977 0.786 0.810

12 14 62 22 64

1116 886 899 682 1056

175.74 190.66 252.96 181.83 253.44

The close agreement of the two sets of mean evaluation counts, both between the different parameter sets, and the different molecules, suggests that the estimates of performance are reliable, and applicable to a broad range of molecular species.

Figure 4. Computational Efficiency as a Function of Population Size and Mutation Rate for Carnosine-A Hollow squares denote parameter sets that did not always find the global optimum. Figure 4 shows the computational requirements for each pair (p,m) of population size and mutation rate. Crossover rate was not found to affect the GA's fitness. All of the pairs generated a global optimum energy in all 100 runs (success rate 1.00) except for the three points marked as hollow squares. Table 5. Partly Unsuccessful Parameters Population Size

Mutation rate

Success Rate

Crossover Probability

Mean Number of Evaluations (Successful Runs only)

28

0.172

0.29

0.474

127.45

16

0.168

0.30

0.873

82.13

36

0.179

0.31

0.050

157.94

Table 3. Results for Top 5 Parameter Sets Carnosine A Init Rank

Pop size

Mut rate

XOvr rate

Min Evals

Max Evals

Mean Evals

1 2 3 4 5

6 2 31 11 32

0.238 0.225 0.403 0.341 0.365

0.156 0.005 0.977 0.786 0.810

12 12 62 33 64

888 1154 930 946 1056

218.52 220.96 242.73 245.3 256

Proc. of the Alife XII Conference, Odense, Denmark, 2010

A variety of different molecules were downloaded from the Cambridge Structural Database [28]. Molecules were selected to be close to the largest size where exhaustive search was considered feasible (approximately 50 atoms) but to contain a wide variety of structural motifs (linear, branching, planar regions) and chemical functional groups. Using the same technique on a variety of other molecules suggested that the optimum parameters of population size and mutation rate

382

were valid in general for molecules of similar size. Three test molecules are illustrated in figures 5-7 and the corresponding results in figures 8-10. In figures 8-10, hollow squares denote parameter sets that did not always find the global optimum.

Elitism strongly affected the efficiency of the GA. Without employing swap, when elitism was employed, the mean minimum, maximum and mean number of evaluations were 147.82, 2800.10 and 794.99 respectively. Not employing elitism, raised these numbers to 157.25, 3457.90 and 991.76 respectively.

Figure 5 – Optimized Structure of Dawmoe

Figure 8. Computational Efficiency as a Function of Population Size and Mutation Rate for Dawmoe.

Figure 6 optimized Structure of Exuduy

Figure 9. Computational Efficiency as a Function of Population Size and Mutation Rate for Exuduy.

Figure 7 Optimized Structure of Ifevoe

Subsequent Experiments – Tuning all parameters The use of the swap mutator was found to be strongly deleterious to the reliability of the GA, without any compensatory increase in efficiency. When Elitism and swap were both used, only 10 of the GAs found the global minimum 100% of the time. The use of elitism did not have a significant effect on reliability with only 11 GAs being 100% successful when swap was employed without elitism. 16 GAs were less than 20% reliable.

Proc. of the Alife XII Conference, Odense, Denmark, 2010

Figure 10. Computational Efficiency as a Function of Population Size and Mutation Rate for Ifevoe.

383

After running each parameter set 100 times, the 100 GA parameter sets were ranked according to their efficiency, such that the highest rank (100) has the lowest mean evaluation score, i.e. is the most fit parameter set. These rankings were then graphed against selector and crossover methods to determine which methods typically fared well, or conversely, which methods decreased the efficiency of the GA. These inverse rankings are shown in Figures 11 to 14 for the Tournament, Roulette, Uniform and Rank selectors respectively. Figure 15 compares all four selectors. The top 10 GA tunings were applied 100 times to A- and B-carnosine datasets and the mean evaluations are shown in Table 6. The results appear consistent, suggesting the general applicability of these tunings .

Often it is not only the minimum energy conformer that is of chemical interest, but all conformers within a given energy range, say 10 kJmol-1. With this in mind, an alternate termination criterion was trialled, whereby the five lowest energy conformers were required to exist in the population. This fared very poorly, with success rates of only a few percent and the original termination criteria based on a single raw score was restored.

Figure 14 Inverse Rankings of Rank parent Selector for different Crossover Methods

Figure 11 Inverse Rankings of Tournament Parent Selector for different Crossover Methods

Figure 15 Comparison of all Parent Selectors for different Crossover Methods Table 6. Performance of GAs with same parameter sets on different datasets Figure 12 Inverse Rankings of Uniform Parent Selector for different Crossover Methods

Figure 13 Inverse Rankings of Roulette Wheel Parent Selector for different Crossover Methods

Proc. of the Alife XII Conference, Odense, Denmark, 2010

GA(ranked) A-Carnosine Dataset

B-Carnosine Dataset

1

651.82

627.64

2

270.64

241.06

3

322.07

253.54

4

320.12

301.57

5

372.40

353.15

6

537.68

576.40

7

439.56

495.00

8

555.66

497.70

9

566.26

557.98

10

492.65

438.37

384

[1] Michael J. S. Dewar, Eve G. Zoebisch, Eamonn F. Healy, James J. P. Stewart (1985) Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model Journal of the American Chemical Society 1985 107 (13), 3902-3909 [2] James J. P. Stewart (1989) Optimization of parameters for semiempirical methods I. Method Journal of Computational Chemistry Volume 10, Issue 2, Date: March 1989, Pages: 209-220

[3] James J. P. Stewart(1989) Optimization of parameters for semiempirical methods II. Applications Journal of Computational Chemistry Volume 10, Issue 2, Date: March 1989, Pages: 221-264 [4] Becke A.D. (1993), Density-functional thermochemistry. III. The role of exact exchange, J. Chem. Phys. 98 1993 5648-5652 [5] Stephens P.J. , Devlin F.J. , Chabalowski C.F. , Frisch, M.J. (1994) Ab initio calculation of vibrational absorption and circular dichroism spectra using density functional force fields, J. Phys. Chem. 98 1994 11623-11627. [6] Møller C., Plesset M.S. (1934). "Note on an Approximation Treatment for Many-Electron Systems" (abstract). Phys. Rev. 46: 618–622. [7] Cizek, J. (1966) On the correlation problem in atomic and molecular systems" J. Chem. Phys. 45, 4256 1966 [8] Frisch, M. J. et al (2004) Gaussian 09, Revision A.01, Gaussian, Inc., Wallingford CT, 2004. [9] Quinn, PJ; Boldyrev, AA; Formazuyk(1992) VE Mol Aspects Med Vol. 13 379 1992 [10] Matsukura, T; Tanaka, H (2000) Biochemistry (Moscow) Vol. 65 961 2000 [11] Izgorodina E., Lin L., and Coote M.L. (2007) “Energy-Directed Tree Search: An Efficient Systematic Algorithm for Finding the Lowest Energy Conformation of Oligomeric Molecules”, Phys. Chem. Chem. Phys., 2007, 9, 2507-2516 [12] Yu-an Zhang, Makoto Sakamoto, Hiroshi Furutani, (2008) “Effects of Population Size and Mutation Rate on Results of Genetic Algorithm," Fourth International Conference on Natural Computation, vol. 1, pp. 70-75 [13] Wolpert D. and Macready W (1997) “No free lunch theorems for optimisation”. IEEE Transactions on Evolutionary Computation, 1(1):67-82, 1997 [14] Nannon V., Smit S.K., Eben A.E. (2008) “Costs and Benefits of Tuning Parameters of Evolutionary Algorithms” Parallel Problem Solving from Nature 2008 [15] Grefenstette J (1986), Optimization of control parameters for genetic algorithms, IEEE Transactions on Systems, Man and Cybernetics, v.16 n.1, p.122-128, Jan./Feb. 1986 [16] Nakama T. (2008), “Theoretical analysis of genetic algorithms in noisy environments based on a Markov Model”. Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation GECCO08 pp1001-1008. [17] Smit, S. K. and Eiben, A. E (2009) Comparing parameter tuning methods for evolutionary algorithms Proceedings of the Eleventh conference on Evolutionary Computation CEC09 pp 399-406 [18] Friesleben, B., Hartfelder, M.(1993): “Optimization of Genetic Algorithms by Genetic Algorithms”. In: Albrecht, R.F., Reeves, C.R., Steele, N.C. (eds.) Artificial Neural Networks and Genetic Algorithms, pp. 392–399. Springer, Heidelberg 1993 [19] De Landgraaf W.A. (2006), Parameter Calibration Using MetaAlgorithms, Master's Thesis, Artificial Intelligence Vrije Universiteit Amsterdam 2006 [20] Diez, R.P., Baran, E..J. (2003) Journal of Molecular Structure – Theochem Vol. 621(3) 245-251 2003 [21] Klyuev, S. A. (2006) BIOFIZIKA Vol. 51(4)669-672 2006 [22] Hartree D. R. (1928), Proc Cambridge Phil Soc 24 ,89,111,246 [23] Fock V(1930) , Z Phyzik, 61, 126 [24] Slater J.C(1930) , Phys Rev, 35, 210 [25] Perone C.S.(2009) PyEvolve a Python Open-Source Framework for Genetic Algorithms ACM SIGEvolution Vol 4 Issue 1 2009 [26] Addicoat, M.A, Brain Z.E. (in press) Using a Meta-GA for Parametric Optimization of Simple GAs in the Computational Chemistry Domain To appear in the Proceedings of the Genetic and Evolutionary Computation Conference 2010 GECCO10 [27] Eshelman L.J., Schaffer J.D. (1997) Preventing premature convergence in genetic algorithms by preventing incest”, in Foundations of Genetic Algorithms 4. pp 115-122 Eds R.K.Belew, M.D.Vose - Morgan Kaufman, San Francisco 1997 [28] Fletcher D.A., McMeeking, R.F., Parkin D, (1996) “The United Kingdom Chemical Database Service", J. Chem. Inf. Comput. Sci. 1996, 36, 746-749.

Proc. of the Alife XII Conference, Odense, Denmark, 2010

385

CONCLUSIONS Unreliable parameter sets were found when mutation rate and population size were both low. This suggests that the algorithm degenerates into simple stairclimbing in those regions, and success depends on initial conditions. If the initial population contains values near the global optimum, performance is very good, but if not, the low mutation rate means that the result may be stuck in a local optimum. Both Rank and Tournament parental selection far outperformed Uniform selection and Roulette-Wheel selection. Tournament selection plus uniform crossover appeared to be the most reliable. Tournament selection plus no crossover performed poorly : but Rank selection performed well with no crossover. Optimum population size was less than 100, and usually less than 50: optimum mutation rate was less 0.7 and usually less than 0.5. Examination of the populations revealed many duplicates of the lowest energy conformer. This suggests that incest-prevention is required in order to obtain results containing sets of near-optima for this method. No improvement on default values was observed using optimized selection/crossover/mutation values for the bestperforming mutation rate/population size combinations, except for replacing the swap mutator with the binary mutator. PyEvolve using default values, except for population size ~30, mutation rate of ~0.35, elitism and binary mutator as described in [26] is therefore recommended for use by computational chemists to locate global minimum conformers. Without effective removal of duplicates or Incest prevention [27], a future implementation could work-around the problem of finding sets of lowest energy conformers by searching for the lowest energy conformer, then once that is identified, excluding it and looking for the next lowest energy conformer until the desired number of conformers were identified. A lookup table with the results of each energy calculation, means that later runs would undertake far fewer of these calculations.

Acknowledgments This work was based on initial research funded in part by the Australian Government under the Auspices of the CoOperative Research Centre for Advanced Automotive Systems (Auto-CRC). Supercomputer facilities were provided by the National Computing Infrastructure (NCI).

References