A Survey of Recent Work on Evolutionary ... - Semantic Scholar

1 downloads 0 Views 155KB Size Report
by R . The sequence of amino groups, C carbons and carboxyl groups is ..... 11 S. Sun, P. D. Thomas, and K. A. Dill. A sim- ... 32 S. Miyazawa and R. Jernigan.
A Survey of Recent Work on Evolutionary Approaches to the Protein Folding Problem. Garrison W. Greenwood Dept. of Computer Science Western Michigan University Kalamazoo, MI 49008

Jae-Min Shin Laboratory of Molecular Biology National Cancer Institute Bethesda, MD 20892

Byungkook Lee Laboratory of Molecular Biology National Cancer Institute Bethesda, MD 20892

Gary B. Fogel Natural Selection, Inc. 3333 North Torrey Pines Court La Jolla, CA 92037

Abstract- A problem of immense importance in computational biology is the determination of the functional conformations of protein molecules. With the advent of faster computers, it is now possible to use rules to search conformation space for protein structures that have minimal free energy. This paper surveys work done in the last ve years using evolutionary search algorithms to nd low energy protein conformations. In particular, a detailed description is included of some work recently started at the National Cancer Institute, which uses evolution strategies.

a background many in the evolutionary computation (EC) community do not share. While this paper is not intended to be a tutorial, it hopefully presents sucient information so that those in the EC community can understand the problem, appreciate its diculty, and understand the details of how EAs have been employed. The survey is, for the most part, restricted to work performed in the last 2-3 years (see [1] for surveys on earlier work). Readers totally unfamiliar with this topic area may rst want to review some excellent tutorials available in the general science literature [2, 3]. The second purpose of this paper is to present preliminary results of some work being conducted at the Laboratory of Molecular Biology at the National Cancer Institute, Bethesda, MD.

1 Introduction Biological organisms contain thousands of di erent types of proteins. Proteins are responsible for transporting small molecules (e.g., hemoglobin transports O2 in the bloodstream), catalyzing biological functions, providing structure to collagen and skin, regulating hormones, and many other functions. Each protein is a sequence of amino acids, bound into linear chains, that adopts a speci c folded three-dimensional shape. Each shape provides valuable clues to the protein's function. Indeed, this information is vital to the design of new drugs capable of combating disease. Regrettably, ascertaining the shape of a protein is a dicult, expensive task which explains why few proteins have been categorized in this regard. Virtual protein models, created on computers, may provide a cost effective solution to accurate prediction of protein shapes. Unfortunately, the protein folding problem|trying to predict the structure of a protein given only the protein's sequence of amino acids|is a combinatorial optimization problem, which so far has eluded solution because of the exponential number of potential solutions. The purpose of this paper is two-fold. Previous papers have surveyed the use of evolutionary algorithms (EAs) in protein folding problems, but they have been written primarily from the perspective of biophysicists|

2 Folding a Protein This section reviews the basic structure of proteins followed by a discussion of how EAs have been used to solve three protein folding subproblems: minimalist models, side-chain packing, and docking. A discussion of the ef cacy of these approaches will be deferred until Section 3. 2.1 Protein Composition An amino acid consists of a central carbon atom (denoted by C ) which is bonded to an amino group (NH2 ) and a carboxyl group (COOH) and a side-chain (denoted by R). The sequence of amino groups, C carbons and carboxyl groups is called the protein backbone. The difference between two amino acids is the structure and composition of the side-chains. Twenty amino acids are found in biological systems and 19 of them have the structure shown in Figure 1. Proteins di er only in the number of amino acids linked together and the sequential order in which these amino acids occur. Amino acids link together when the

amino group

carboxyl group

H O

H N



H

C O

H

R Figure 1: General structure of 19 of the 20 isolated amino acids. (The remaining amino acid proline has bonding between the side-chain and the amino group.) R denotes the location of the side-chain, which bonds to a central carbon atom denoted by C . carboxyl carbon of one amino acid binds with the amino nitrogen of the next amino acid. This binding releases a water molecule and the resulting bond is called a peptide bond. The joined amino acids are referred to as residues or peptides. The CO{NH group is planar and, when combined with the bordering C atoms, forms a peptide group. The condensation of two amino acids is called a dipeptide. A third amino acid would condense to form a tripeptide, and so on. Longer chains are called polypeptide chains. Each polypeptide can fold into a speci c shape. Protein molecules can contain one or more of these chains. The primary structure of a protein's polypeptide chain is the sequence of amino acids. Di erent regions of this sequence tend to form regular, characteristic shapes called secondary structures. Three main categories of secondary structures are the -helix, the -strand or sheet, and the turns or loops which connect the helices and strands. Some studies suggest that certain residues will appear more often in helices than in strands, which implies a correlation between amino acid sequence and shape. Nevertheless, variations do exist so this correlation is not completely understood. An aggregate of all of these localized secondary structures forms the tertiary structure which is of prime interest|a protein's function is heavily in uenced by its tertiary structure. Tertiary structures can also be combined as sub-units to form a larger quaternary structure. Stability of a protein conformation is in uenced by a number of factors which can include Van der Waals interactions, hydrogen bonding, and hydrophobic e ects [4]. The polypeptide chain transforms from a disordered, nonnative state to an ordered, native state. This process is referred to as protein folding. The function of a protein is tied to its structure so being able to quickly specify a structure from its amino acid sequence is of immense interest. Unfortunately, nding the nal structure remains ellusive, in part due to the astronomical number of possible conformations. X-ray crystallography (XC)

and nuclear magnetic resonance (NMR) are invaluable tools but only a relatively small number of proteins have been studied in this manner1. 2.2 Minimalist Models Ab initio methods try to predict the fold without using structure information from any other protein for comparison. These methods explore an energy hypersurface ( tness landscape) for a minimal energy conformation, which is believed to correspond to the native state. Unfortunately, the enormous size of the energy hypersurface complicates the search process, which has led some researchers to use minimalist protein models. The most primitive models do not consider the primary structure of the protein; residues are merely classi ed as hydrophobic, which mix poorly with water, or hydrophilic, which attract water molecules. Additionally, all residues are forced to occupy sites on a 2D square lattice with no more than one residue at any given site. This restriction means the polypeptide chain must form a selfavoiding walk on the lattice. The protein is folded if, at each point in the polypeptide chain, the next point may turn 0 or 90 . Many of the conformations created in this manner are discarded because they produce undesirable steric clashes. Greenwood [7] has shown that a far larger number of self-avoiding walks can be rendered by using a 2D torus rather than a 2D lattice. The 2D lattice or torus is adequate to investigate the propensity to form a hydrophobic core, but a 3D cubic lattice is required to investigate secondary structures. EAs are normally used with o -lattice models where residues are not required to occupy xed, equally spaced sites and the side-chain is modeled only as a single, virtual atom. Bond angles and bond lengths are typically xed, the  and dihedral angles are adjustable, and normally only the trans (! = 180 ) conformation for the peptide bond is considered (see Figure 2 for the definition of these angles). A common practice used with genetic algorithms (GAs) is to encode  , pairs as bit strings, with each pair restricted to values taken from a small dihedral library [8, 9]. This was the approach taken by Pedersen and Moult [10], although they also used library data for the side-chain dihedral angles (). Other approaches include allowing random changes in the dihedral angles of up to 5 [11]. Fitness is often measured in terms of the r.m.s. deviation from a structure determined by XC or NMR methods. For moderate length polypeptide chains, a r.m.s. deviation of 2.0  A or less is generally considered excellent. 1 The Protein Data Bank [5] contains a little over 7,000 proteins and peptides determined by XC and NMR. In contrast, it is estimated that there are some 80,000 genes in a human being, most of which code for a distinct protein. In addition, there are a similarly large number of proteins from other organisms, such as mouse and bacteria.

2.3 Side-chain Packing The greatest amount of success in protein folding studies uses the known structure of a homologous protein (i.e., similar in structure). Indeed, as the number of known structures increases, the probability of resolving the conformation of other unsolved proteins will likewise increase. Wilson et al. [12] lists the three major aspects to homology-based modeling: (1) amino acid sequence alignment; (2) generating loop conformations as needed; and (3) predicting the conformations of the side-chains. The side-chain packing problem (SCPP) deals only with the last part of this modeling procedure. Because of the high density of side-chains in the protein interior, sidechain conformations are determined by packing considerations, which involve interactions of many side-chains at a time. But homology modeling is not the only reason this problem is of interest. An ecient solution to this problem is essential for most ab initio protein structure prediction procedures. It will also aid in determining permissible mutation sites for increased protein stability and for a large scale protein re-design. Figure 2 shows the torsion angles found in the backbone. The  , torsion angles determine the polypeptide chain fold, while the  torsion angles determine the packing of the side-chains. The principle diculty in solving SCPP is the requirement to search over an extremely large solution space. The combinatorial nature of this problem makes complete enumeration infeasible. Usually some simplifying assumptions are made to help reduce the computational complexity. For example, typically all bond lengths and bond angles are xed so that the only degrees of freedom are in choosing the torsion angles. The complexity is further reduced by xing the  , torsion angles|which produces a rigid backbone conformation|and then concentrating just on predicting the side-chain conformation by choosing the  torsion angles. Unfortunately, this subproblem is not necessarily easy to solve. If on average there are r rotamer states per residue, an n-residue peptide chain has rn possible conformations. More speci cally, consider the side-chain Tyr in Figure 2(b). If each side-chain torsion angle were treated as a rigid rotation divided into 10o steps (so  has 360o/10o=36 states), then Tyr has 363=46,656 possible structures! And, this number does not even consider any other residues in the peptide chain. Fortunately, things are not quite as bad as they may appear. Studies do indicate that side-chain dihedral angles tend to cluster around particular  values (e.g., see [13]). Moreover, these studies show a relationship also exists with the backbone conformation. This has led to the development of several rotamer libraries. Despite the extreme importance of SCPP, there has been only limited interest in this problem from the EC community|most homology work uses GAs to perform sequence alignments (e.g., see [14]). This is not to say no

H

H

R Cα

φ

N

ψ

C



ω

H

R

O (a) H

H

O

χ3 H

C

C H

C

C C

C H

H

χ

2

C

H

χ1 (b)



Figure 2: Torsion angles for (a) a residue and (b) Tyrosine (Tyr) side-chain. Residues usually adopt the trans con guration (! = 180 ). The torsion angle 1 indicates rotation about the bond connecting the side-chain to the C carbon; it is present in all residues except Pro and Gly. The exact number of  torsion angles depends on the chemical makeup of the side-chain.

work has been done. Desjarlais and Handel [15] used a GA to search for low energy hydrophobic core sequences and structures, using a custom rotamer library as input. Each core position was allocated a set of bits within a binary string and the bit values encoded a speci c residue type and set of torsion angles as speci ed in the rotamer library. The input rotamer library for each core position is thus a list of residue/torsion possibilities for the string location corresponding to the core position. Reproduction was performed with standard recombination and mutation operators, but an inversion operator was added to establish genetic \linkage" between pairs of bits. This GA was recently used in the de novo design of a small peptide chain [16]. 2.4 Docking Docking problem solutions predict how two organic molecules will energetically and physically bind together. One molecule, called the receptor, contains \pockets" that form binding sites for the second molecule, which is called the ligand. Any solution must therefore describe both the shape of the receptor and the ligand as well as their anity. An ecient means of solving the docking problem would be of enormous bene t. For example, replication of the AIDS virus depends on the HIV protease enzyme. If one can nd a small molecule that will permanently bind to the active site in the HIV protease, the normal function of that enzyme can be prevented. Autodock [17] is an existing docking software package, which is designed to investigate dockings between a macromolecule and a small substrate molecule. Rosin et al. [18] used a GA in conjunction with the Autodock software package. Two interesting aspects of their GA implementation are the use of Cauchy deviates for mutation and the incorporation of Lamarckian learning (local search with replacement) on a small fraction of the population each generation. Mutation was not used to conduct local searches, but was reserved for the more conventional role of replacing lost alleles. Instead, local searchers were conducted using a randomized hillclimber with an adaptive step size [19]. Individuals in the search space encoded Cartesian coordinates of the small molecule, a 4-dimensional quaternion2 indicating global orientation, and a set of torsion angles re ecting

exibility in the small molecule. In a direct competition, the authors were able to show that the GA, augmented with local search, signi cantly outperformed simulated annealing in all test cases. Jones et al. [20] describe a GA-based ligand docking program, which permits a full range of ligand conformations exibility with partial exibility of the protein. Each chromosome encodes the internal coordinates of 2 This is a vector giving an axis of rotation and the rotation angle.

both the ligand and the protein active site, and a mapping between hydrogen-bonding sites. Internal coordinates were encoded as bit strings where each byte represented a torsion between 180 in step-sizes of 1.4 . Integer strings were used to identify possible hydrogen bonding sites. Fitness was determined by summing (1) the hydrogen-bonding energy between ligand and protein; (2) the pairwise interaction energy between ligand and protein atoms; and (3) the ligand steric and torsional energies. The GA used an island model where several small distinct populations were evolved, instead of evolving one large population. Reproduction operators included crossover, mutation, and a migration operator to share genetic material between populations. The authors claim this island model increases e ectiveness|but not eciency|of the GA. One hundred complexes from the Brookhaven Protein DataBank [5] were used as test cases with a 71% success rate in identifying the experimental binding mode. Raymer et al. [21] used a GA in an entirely di erent manner. Water molecules within the binding site will either be displaced by the ligand or remain bound. In either case, the energetics of the interaction are a ected. The goal of Raymer et al. was to predict the conserved or displaced status of water molecules upon ligand binding. A k-nearest-neighbor (knn) classi er predicted the displaced status, and the GA determined optimal feature weight values for the classi er. Population members encoded feature weights as binary strings and twopoint crossover was the primary reproduction operator. Fitness was based on the percent of correct predictions by the knn classi er. It is important to note that the GA was not involved in the prediction of water molecule status|its sole responsibility was to train the classi er that was responsible for the prediction. Docking problems are one area where other types of evolutionary algorithms have been tried. Gehlhaar et al. [22] used evolutionary programming (EP) to attempt docking the inhibitor AG-1343 into HIV-1 protease. No assumptions were made regarding likely ligand conformations, nor ligand-protein interactions. The ligand was required to remain within a parallelepiped that included the active site plus a 2.0  A cushion; an energy penalty was assessed to each ligand atom outside of this box. Each member of the population of candidate ligand conformations encoded six rigid body coordinates and the dihedral angles. AG-1343 has a large number of rotatable bonds, so a population size of 2000 was used. New members were created by additive Gaussian noise with a lognormal self-adaptation of the strategy parameters. The tness function was the pairwise potential summed over all pairs of ligand-protein heavy atoms. These atoms interact through steric (van der Waals) and hydrogen-bonding potentials. The same functional form

was used for both potentials, but di erent coecients were used because a single hydrogen bond should have a larger weight than a single steric interaction. The authors were able to reproduce the crystal structure to within 1.5  A in 34 out of 100 runs.

3 Discussion There have some notable successes in homology modeling in recent years|r.m.s. errors of under 1.0  A can be achieved over large protein sets. This success has led some to believe that the combinatorial problem in SCPP is only minor [23]. Although there may be some justi cation to these claims, studies indicate that neglecting combinatorial packing leads to higher r.m.s. deviations|and higher energy values|that in a small number of cases produce completely unsatisfactory solutions [24]. It is probably therefore prudent to continue incorporating packing optimization. It cannot be emphasized enough how important the energy ( tness) function is to the ecacy of the nal results. A poorly de ned energy hypersurface may be eciently explored by an evolutionary algorithm, but if the conformations with the lowest energy are nothing like the native structure, then it is not clear what one should infer from the results. For example, one study [25] concluded that a GA search was successful|but none of the conformations generated were similar to the native structure! Ab initio predictions of this sort only demonstrate the ecacy of EAs. Nevertheless, one should not conclude that only elaborate tness functions are capable of leading to accurate predictions [11]. A couple of examples will illustrate the importance of a good tness function. Rosin et al. [18] point out that some local minima on their energy hypersurface had better r.m.s. deviations from the crystal structure than did conformations with lower energy values. Hence, further optimization would not have produced more accurate structures. Jones et al. [20] are to be commended on their thorough analysis of the cases in which their docking program failed. One of the reasons for failure was an enforced requirement that the ligand must be hydrogen bonded to the binding site, which was a case not always observed. But the second reason cited was the tness function underestimated the hydrophobic contribution to binding. It is apparent that GAs are still the predominant EA used in protein folding studies. This fact seems somewhat surprising since a number of researchers have pointed out that crossover|the primary reproduction mechanism used in GAs|is largely ine ective for protein folding studies. Crossover is particularly ine ective as the protein structure becomes more compact [10]. Moreover, several researchers comment on the inability of crossover to recombine the so-called building blocks

that should produce better solutions [26, 27]. Evolution strategies (ES) and EP place emphasis on mutation as a reproduction mechanism. Perhaps a detailed look in the use of these paradigms is long overdue.

4 Recent Work at NIH The Laboratory of Molecular Biology at the National Cancer Institute has recently begun to explore the ability of an ES to solve instances of SCPP. Our approach is to rst develop an ES predictor that can nd a reasonable backbone conformation and then attack SCPP. The rst version of our ES predictor is running and the results are quite encouraging. This predictor uses a polypeptide model that has all backbone atoms represented plus the C atom from the side-chain3. Energy is measured by r.m.s. deviation from a known crystal structure. This tness function has an advantage in testing search algorithms because the function is exact in the sense that the global energy minimum exactly corresponds to the native structure. The genome for our ES predictor is an integer array, which describes the  , torsion angles of each residue. Search operations are conducted with a ( + t )-ES, where, at generation t,  parents produce t o springs and parents compete equally with o springs for survival. However, reproduction is done in a di erent manner from the conventional ( + )-ES. Each parent may generate up to 200 o springs, but culling of low t progeny is done immediately after their genesis. This means the total number of o springs that can be candidates for survival varies from parent to parent. Furthermore, k may not necessarily equal m for m 6= k. It is for this reason we adopt the notation ( + t ). Conventional truncation selection chooses the  parents for the next generation. (; ) Probability (-65, -42) 0.370 (-87, -3) 0.147 (-123, 139) 0.274 (-70, 138) 0.139 (77, 22) 0.045 (107, -174) 0.025 Table 1: The discrete set of  , angles and their probability of occurrence. Only the trans (! = 180 ) con guration is supported. New conformations are generated by stochastic modi cation of selected residue torsion angles (see below). Mutation is now the only reproduction operator because in earlier tests we found that generally recombination 3 The C atom in the side-chain is directly bonded to the C atom.

was of little value, particularly as the structures become more compact. Mutation is performed over a randomly positioned set of k consecutive residues, typically set to three. The r.m.s. deviation from the crystal structure| which is proportional to tness|is computed only over a window of consecutive residues, which includes the k residues that are mutated. The window size w is an adapted strategy parameter that will grow until it equals the full length of the polypeptide chain.

The rst change is to model just a virtual backbone| composed only of C atoms|to properly orient the backbone. The bond angles and torsion angles are virtual in the sense they do not directly correspond to the  , angles in the current model (see Figure 2(a)). A C chain model is shown in Figure 4. θi

Cα i-1 γ



Cα i

i+1

γ

i+1

i

Figure 4: A virtual backbone model.

Figure 3: A ribbon diagram of Streptococcal Protein G (left) and the ES predicted structure (right). The single -helix is clearly visible. Arrowheads on the sheets indicate orientation from the N -terminal to the C -terminal. Figure was generated by MolScript v2.1.1, Copyright (C) 1997-1998 [29]. In the more frequently encountered trans-peptide conformation, the torsion angles tend to cluster into six distinct regions in the Ramachandran ( , , !) map [28]. Hence, the the ES predictor mutates a torsion angle by choosing a di erent angle from a discrete set. However, these angles were not chosen with equal probability but with a bias derived from a protein library [8]. The discrete set of angles and the probability of selection are shown in Table 1. Fitness is measured by the negative of the r.m.s. deviation from the crystal structure with a penalty term for steric clashes. The test case for our ( + t ) ES predictor was a Streptococcal Protein G, which has 61 residues. This protein contains one -helix and four -strands. Each generation manipulated a population of size  = 200 with t = 200 (maximum). The initial window size was w = 5. The predictor consistently founds conformations with an r.m.s deviation around 1.8  A. Figure 3 depicts the results.

5 Future Work The current version of our ES predictor, although rudimentary, is capable of producing good results. However, its current form is not adequate to solve instances of SCPP and so a number of changes have been planned.

Proper placement of the backbone is crucial to the accurate prediction of side-chain conformations. It may appear a C chain model is a step backwards since a C chain has less detail than the backbone model. Actually, the C chain model is the appropriate level of detail needed to search for accurate backbone conformations. As the protein becomes more compact, the likelihood of steric clashes increases|making many conformations produced by a stochastic search invalid. Elofsson et al. [30] showed that local moves in torsion space improves the eciency of search algorithms. Essentially, a local move changes torsion angles within a sequential window while leaving the positions of all C atoms outside of the window unchanged. We intend to analyze and then incorporate local moves into our ES predictor. Once the C chain is placed properly, the chain can be decorated with the peptides. This permits introducing some exibility into the backbone. A simple method of introducing exibility only requires changing the peptide orientation angle (Figure 5), which seems to provide a good compromise between computational speed and the production of realistic folded structures [31].

P

Cα i O

α2

C Cα i+1

Cα i+3

Cα i+2 N

Q

H

Figure 5: The bonds forming the peptide group are shown by thick lines. These atoms lie in one plane (P ) while the C i+1 , C i+2 and C i+3 atoms connected by the heavy dotted line lie in the other plane (Q). The peptide orientation angle 2 is the angle between the two planes. The bond lengths are not to scale.

The next anticipated change is full modeling of the side-chains. The bond angles and bond lengths will remain xed so the only degree of freedom will be the  torsion angles. Initially a discrete set of torsion angles (10 increments) will be used for the  , in the virtual backbone, the peptide orientation angle, and the  torsion angles. Eventually this discrete set will be replaced by continuous values. During each generation the ES predictor will change the  , angles, but prior to computing the energy, the peptide orientation angles and  torsion angles will be optimized. Hence the ES will become hierarchical. Finally, the tness function used in our original ES predictor|r.m.s. deviation from a crystal structure| must be replaced before ab initio prediction can be performed. A number of widely available energy functions do exist and we will investigate which one is most appropriate. A reasonable rst choice would be a statistical potential [32, 33]. These changes should produce a predictor that is both e ective and ecient.

Acknowledgements The research of GWG was sponsored in part by NSF Grant ECS-1913449.

Bibliography [1] D. E. Clark and D. R. Westhead. Evolutionary algorithms in computer-aided molecular design. J. Comput.-Aided Mol. Des., 10:337{358, 1996. [2] F. M. Richards. The protein folding problem. Sci. Amer., 264:54{63, 1991. [3] H. Chan and K. Dill. The protein folding problem. Physics Today, 46:24{32, 1993. [4] N. D. Socci, W. Bialek, and J. N. Onuchic. Properties and origins of protein secondary structure. Phys. Rev. E, 49:3440{3443, 1994. [5] E. E. Abola, J. L. Sussman, J. Prilusky, and N. O. Manning. Protein Data Bank Archives of ThreeDimensional Macromolecular Structures. In: Methods in Enzymology (C. W. Carter Jr. and R. M. Sweet, eds.), Vol. 277, Academic Press, San Diego, 1997. [6] A. Robertson and K. Murphy. Protein structure and the energetics of protein stability. Chem. Rev., 97:1251{1267, 1997. [7] G. W. Greenwood. Ecient construction of selfavoiding walks for protein folding simulations on a torus. J. Chem. Phys., 108:7534{7537, 1998.

[8] M. J. Rooman, J-P. Kocher, and S. J. Wodak. Extracting information on folding from the amino acid sequence: accurate predictions for protein regions with preferred conformation in the absence of tertiary interactions. Biochem., 31:10226{10238, 1992. [9] J. R. Gunn. Sampling protein conformations using segment libraries and a genetic algorithm. J. Chem. Phys., 106:4270{4281, 1997. [10] J. Pedersen and J. Moult. Protein folding simulations with genetic algorithms and a detailed molecular description. J. Mol. Biol., 269:240{259, 1997. [11] S. Sun, P. D. Thomas, and K. A. Dill. A simple protein folding algorithm using a binary code and secondary structure constraints. Protein Engr., 8:769{778, 1995. [12] C. Wilson, L. Gregoret, and D. Agard. Modeling side-chain conformation for homologous proteins using an energy-based rotamer search. J. Mol. Biol., 229:996{1006, 1993. [13] N. Summers, W. Carlson, and M. Karplus. Analysis of side-chain orientations in homologous proteins. J. Mol. Biol., 196:175{198, 1987. [14] C. Notredame, L. Holm, and D Higgins. COFFEE: an objective function for multiple sequence alignments. Bioinfo., 14:407{422, 1998. [15] J. Desjarlais and T. Handel. De novo design of the hydrophobic cores of proteins. Protein Sci., 4:2006{ 2018, 1995. [16] G. Ghirlanda, J. Lear, A. Lombardi, and W. DeGrado. From synthetic coiled coils to functional proteins: automated design of a receptor for the calmodulin-binding domain of calcineurin. J. Mol. Biol., 281:379{391, 1998. [17] G. Morris, D. Goodsell, R. Huey, and A. Olson. Distributed automated docking of exible ligands to proteins: parallel applications of Autodock 2.4. J. Comp.-Aid. Mol. Des., 10:293{304, 1996. [18] C. Rosin, R. Halliday, W. Hart, and R. Belew. A comparison of global and local search methods in drug docking. Proc. 7th ICGA, pages 221{228, 1997. [19] F. Solis and R. Wets. Minimization by random search techniques. Math. Ops. Res., 6:19{30, 1981. [20] G. Jones, P. Willett, R. Glen, A. Leach, and R. Taylor. Development and validation of a genetic algorithm for exible docking. J. Mol. Biol., 267:727{ 748, 1997.

[21] M. Raymer, P. Sanschagrin, W. Punch, S. Venkataraman, E. Goodman, and L. Kuhn. Predicting conserved water-mediated and polar ligand interactions in proteins using a k-nearestneighbors genetic algorithm. J. Mol. Biol., 265:445{464, 1997. [22] D. Gehlhaar, G. Verkhivker, P. Rejto, C. Sherman, D. B. Fogel, L. J. Fogel, and S. Freer. Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally exible docking by evolutionary programming. Chem. Biol., 2:317{324, 1995. [23] F. Eisenmenger, P. Argos, and R. Abagyan. A method to con gure protein side-chains from the main-chain trace in homology modeling. J. Mol. Biol., 231:849{860, 1993. [24] M. Vasquez. An evaluation of discrete and continuum search techniques for conformational analysis of side chains in proteins. Biopolymers, 36:53{70, 1995. [25] S. Schulze-Kremer and U. Tiedemann. Parameterizing genetic algorithms for protein folding simulation. Proc. 27th Ann. Hawaii Int'l Conf. on Sys. Sci., pages 345{354, 1994. [26] A. H. Kampen and L. M. Buydens. The ine ectiveness of recombination in a genetic algorithm for the structure elucidation of a heptapeptide in torsion angle space: a comparison to simulated annealing. Chem. & Intell. Lab. Sys., 36:141{152, 1997. [27] N. Krasnogor, D. Pelta, P. E. M. Lopez, and E. de la Canal. Genetic algorithm for the protein folding problem, a critical view. Proc. Engr. of Intell. Sys. 98, pages 345{352, 1998. [28] G. Ramachandran and V. Sasisekharan. Conformation of polypeptides and proteins. Adv. Protein Chem., 23:283{437, 1968. [29] P. J. Kraulis. Molscript|a program to produce both detailed and schematic plots of protein structures. J. Appl. Cryst., 24:946{950, 1991. [30] A. Elofsson, S. M. Le Grand, and D. Eisenberg. Local moves: an ecient algorithm for simulation of protein folding. Proteins, 23:73{82, 1995. [31] Y. Wang, H. Huq, X. de la Cruz, and B. Lee. A new procedure for constructing peptides into a given C chain. Folding & Design, 3:1{10, 1997. [32] S. Miyazawa and R. Jernigan. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules, 18:534{552, 1985.

[33] M. Sippl. Calculation of conformational ensembles from potentials of mean force. an approach to the knowledge-based prediction of local structures in globular proteins. J. Mol. Biol., 213:859{883, 1990.