Computational optimization and biological evolution - Semantic Scholar

1 downloads 0 Views 100KB Size Report
Definitions of objective function, fitness and co-evolution, although they differ between ... experimental data, and some biochemical predictions, optimization and ...
1206

Biochemical Society Transactions (2010) Volume 38, part 5

Computational optimization and biological evolution Igor Goryanin1 School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, Scotland, U.K., and Okinawa Institute of Science and Technology 1919-1, Onna, Onna-son, Okinawa 904-0412, Japan

Abstract Modelling and optimization principles become a key concept in many biological areas, especially in biochemistry. Definitions of objective function, fitness and co-evolution, although they differ between biology and mathematics, are similar in a general sense. Although successful in fitting models to experimental data, and some biochemical predictions, optimization and evolutionary computations should be developed further to make more accurate real-life predictions, and deal not only with one organism in isolation, but also with communities of symbiotic and competing organisms. One of the future goals will be to explain and predict evolution not only for organisms in shake flasks or fermenters, but for real competitive multispecies environments.

If one way be better than another, that you may be sure is Nature’s way. Aristotle

Introduction Traditionally, biological models are associated with the experimental data they describe. The success of the modelling is determined by criteria such as how well the model describes experimental data or, in the best-case scenario, how well the model predicts de novo experimental observations, or behaviours under unexplored environmental conditions. This is true for all engineering, technical and financial disciplines. Unfortunately, it is not enough for modelling in modern biology. In comparison with models in physics and chemistry where the main question is ‘how?’, models in biology have always required additional considerations. Most of them are related to the major biological question ‘why?’, and should be viewed in an evolutionary context. Most often, this question is associated with biological functions of an organism. The translation of evolutionary context and biological function to mathematical or computational notation could be explored in the optimization framework by introduction of constraints, fitness criteria and objective functions [1].

Optimization in computer science and in biology The term ‘objective function’ is defined in computational sciences as the performance index which quantifies the quality of a solution defined by a set of decision variables, and which can be maximized or minimized. Constraints are defined as requirements that must be met. Key words: co-evolution, computational optimization, evolution, modelling, objective function. Abbreviation used: AA, arachidonic acid. 1 email [email protected]

 C The

C 2010 Biochemical Society Authors Journal compilation 

In more simple jargon, say in linear programming, the objective is a linear function that one wishes to optimize. (http://glossary.computing.society.informs.org). The essence of biological optimization is similar to more rigorous definition from mathematics; it is to calculate the most efficient solution to a given problem, and then to test the prediction. The concept has already revolutionized some aspects of biology, but it has the potential for much wider applications [2]. David Lack [3] pioneered the use of objective functions in biology with his concept of the optimal clutch size: the number of eggs that would produce the greatest number of offspring. The Watson and Crick prediction of the triplet code as the most optimal way of coding 20 amino acids using the A, T, C and G bases is another successful example of the optimization methodology. In molecular biology, the most noticeable example is optimization of metabolic processes in bacteria. It was possible to predict the growth rate of Escherichia coli on glycerol after hundreds of generations [4], whereas similar objective functions based on concentration of a particular metabolite (lactic acid) were used by Fong et al. [5] to design the optimal E. coli strain in silico. The objective functions used maximization of the growth rate, energy effectiveness and minimization of internal futile cycles. Living organisms are too complex to pursue only one goal (or to have one objective function). In most biological cases, the goal is multi-objective [6] and includes strategies for minimization/maximization of different objective functions simultaneously or consequently by developing a strategy for survival, adaptation, proliferation or death. For better understanding, some questions can be reformulated in reverse: ‘which are the objective functions being optimized in these cellular networks?’ [7], or ‘which objective functions were optimized in the past during the evolution?’ Biochem. Soc. Trans. (2010) 38, 1206–1209; doi:10.1042/BST0381206

Systems Analysis of Metabolism

A considerable strength of using optimization is that, once we understand why organisms are as they are, then it should be possible to understand how they will respond to new conditions.

Some optimization problems are quite unique to biology The concept of optimized self-destruction or apoptosis is quite interesting. The system should keep intact the key process and players to make sure that the process will succeed. Another interesting problem for optimization is pathogen invasion, to keep a host under control and, when the time comes, to switch to the killing programme with another objective function. Overall, all biological concepts can be expressed in terms of computational optimization (cost function and constraints), fitness, adaptation, homoeostasis, evolutionary pressure, etc. Such functions can be expressed as multi-objective cost functions both in parameter and variable space On the other hand, sometimes concepts of evolution borrowed from biology with no proper attention to biological meanings (genetic algorithms, membrane computations, etc.), although successful, can cause confusion in establishing an interdisciplinary dialogue. Several emerging and old disciplines are targeting such interdisciplinary areas: evolutionary computing, artificial life, evolutionary biology, systems biology, etc., but there are no co-ordinated efforts to put together at least terminology, tools available and common goals. So far, systems biology research, which originates from biochemistry and molecular biology, has paid a lot of attention to model construction, storage (http://www. biomodels.org), validation, standards (http://www.SBML. org, http://www.SBGN.org) and links to bioinformatics, but no objective functions have been catalogued and defined in a rigorous manner. Kinetic parameter values, which can be used as constraints for objective functions, can be found in quantitative biochemistry databases [8], whereas enzyme kinetic data can be obtained from BRENDA [9] or SABIO-RK [10]. Almost all systems biology models have been created for scientific research purposes, when often objective functions are not defined or are defined implicitly. During the process of model construction, a modeller has some goal or criterion in mind, if not explicitly then implicitly. The goal comes from the initial biological question. What is the purpose of the modelling exercise? Why should the model be developed? What is the purpose of the engineered or evolved biological system? What could be the optimum parameter set to describe the available experimental data? These are the reasons optimization becomes more important in modern computational biology research.

Classification of objective functions To the best of our knowledge, there is no classification of objective functions in biology. Additional efforts are required

from the community to develop standards, descriptive language and ontologies for objective functions. Previous reviews have provided information on different methods to use in optimization in bioinformatics and systems biology [1], using qualitative and quantitative constraints [11]. Some time ago, Mendes and Kell [12] highlighted two types of objective function: how to rationally design and improve metabolic pathways to maximize the flux of interesting products and minimize the production of undesired by-products (metabolic engineering, biochemical evolution and synthetic biology), and how to calibrate the model so as to reproduce the experimental results in the best possible way [12a]. Later, Banga [13] summarized different objective functions involved in inferring cellular networks, such as transcriptional regulatory networks [14], gene regulatory networks [15], signalling pathways and protein interaction networks [16]. Constraint-based optimization and associated objective functions have also received great attention. There are several classes of constraints integrated to objective function: experimental constraints, which usually come from experimental protocols; physicochemical constraints, which comprise laws of physics such as conservation of mass and energy, space and size, time scale, laws of thermodynamics; and biological phenotypic constraints incorporating the evolutionary questions ‘why?’

Objective functions in biomedical modelling Generally speaking, it is difficult to define the main biochemically sound objective function for vertebrates, and humans in particular, as an organism. Most probably, objective functions are organ-specific. So, for drug design problems, the main challenge is to define objective functions not only in normal states, but also in disease states, and for different therapeutic areas. For infectious diseases, the strategy is pretty clear: the objective function is to kill the pathogens and not to kill human cells. This concept is explored heavily by the pharmaceutical industry, trying to find targets, i.e. proteins with different sequence, structure and function in a host and a pathogen. In PK (pharmacokinetics)/PD (pharmacodynamics) modelling, scientists are using the AUC (area under the curve) as a criterion for optimization of drug efficacy and extension of patients’ exposure to drugs [17]. The disease state, the normal healthy state and the transition between them could be reformulated as an optimization problem [18] in which objective function assumes that both normal and disease states differ in the production of particular metabolites. The process of such optimization is not fully automated and usually has been done manually. In the AA (arachidonic acid) pathway [19], the authors found optimal combinations of AA pathway COX (cyclooxygenase) inhibitor-selective and non-selective NSAIDs  C The

C 2010 Biochemical Society Authors Journal compilation 

1207

1208

Biochemical Society Transactions (2010) Volume 38, part 5

(non-steroidal anti-inflammatory drugs). Recently, resistance to inhibition in cost function has been defined in a recursive genetic algorithm [20].

Evolution in Nature and computers Biological evolution is the change in the inherited traits of a population of organisms through successive generations. These traits are controlled by genes, and the complete set of genes within an organism’s genome is called its genotype. Computer programs can simulate the evolution process, and genes and genotypes are represented by objects in computer memory. The complete set of observable traits that make up the structure and behaviour of an organism is called its phenotype. In computers, phenotypes are represented by objective functions which comprise quantified biological behaviour. Natural selection is the process by which genetic mutations become, and remain, more common in successive generations. Selection processes in biology and in silico happen spontaneously, usually employing models by Monte Carlo methods which mimic random mutagenesis. For example, through evolution [21], micro-organisms have evolved in such a way that their metabolic phenotypes ensure the most efficient conversion of carbon and energy to produce more cells.

Survival of the fittest Natural selection is the process by which heritable traits that make it more likely for an organism to survive and successfully reproduce become more common in a population over successive generations. It is a key mechanism of evolution. The process of biological evolution is the equivalent of the process of multi-objective computer optimization with constraints in silico. The concept of fitness is central to natural selection in biology and in computer optimization. Broadly speaking, individuals which are more ‘fit’ have better potential for survival, as in the well-known phrase ‘survival of the fittest’. There are many definitions of biological fitness [22]. In general, fitness involves the ability of organisms to survive and reproduce in the environment [23]. In computer algorithms, these features are implemented as penalty and cost functions. However, there are some differences. In evolutionary biology, fitness landscapes are used to visualize the relationship between genotypes (or phenotypes) and reproductive success. It is assumed that every genotype has a well-defined replication rate (often referred to as fitness). This fitness is the ‘height’ of the landscape. In contrast, in evolutionary computational problems, fitness (or optimization) landscapes are evaluations of an objective function for all solutions. Therefore taking the inverse of an objective function turns it into a fitness function, and vice versa (Figure 1). In a genetic algorithm, a fitness function is a particular type of objective function that prescribes the optimality of a solution (that is, a chromosome) so that that particular chro C The

C 2010 Biochemical Society Authors Journal compilation 

Figure 1 Fitness and objective function Upper chart: evolutionary biology. 2, Global fitness optimum; 1, 3, local fitness optima. Fitness needs to decrease to move between optima. Lower chart: evolutionary computations. 2a, Global minimum of the objective function; 1a and 3a, local minima. Monte Carlo-based random methods need to move between minima.

mosome may be ranked against all the other chromosomes. Chromosomes which are more optimal are allowed to breed and mix their datasets by any of several techniques, producing a new generation that will be even better. Novel methods include IEC (interactive evolutionary computation) or aesthetic selection where human evaluation is used. Usually human evaluation is necessary when the form of fitness function is not known (for example, visual appeal [24], taste or attractiveness) or the result of optimization should fit particular non-quantifiable datasets (for example, the transition between disease and normal states). Finally, living organisms cannot be considered in isolation, so the co-evolution concept is being explored both in biology and in computer sciences. In biology, co-evolution happens when two (or more) species influence each other’s evolution. The concept was described by Charles Darwin in On the Origin of Species [25] and in Fertilization of Orchids [26]. In a broader sense, coevolution can be defined as a change of a biological (or computational) object triggered by the change of a related object. Co-evolution does not imply mutual dependence. The host of a parasite, or prey of a predator, does not depend on its enemy for survival. Among well-known examples in biology are mitochondria within eukaryotes, chloroplasts in plant cells [27] or the gut microbiome [28], and these are adapted for co-operation in the symbiotic bacterium and insect [29]. At the same time, co-evolutionary algorithms in computer sciences are a class of algorithms used for generating artificial life as well as for optimization, game learning and machine learning; examples include co-evolved sorting networks [30], and co-evolved in silico creatures [31].

Further cross-fertilization is needed The main biological goal since Darwin is how to explain evolution, so the overall goal for modelling is to explain

Systems Analysis of Metabolism

the history of evolution, why we are on planet Earth, and predict the future, and this cannot be done without a better understanding of evolution of biological networks from cellular to organism to population levels in changing environments. Finally, it should be stressed that modern optimization methods are sometimes insufficient for gaining deeper understanding regarding certain aspects of biology. During the evolution process, the biological systems could cause changes in the environment or in host organisms. There are evolutionary feedbacks (co-evaluation) that must taken into account by novel computational methods. In the process, some organisms could de-evolve and lose some biological functions for the sake of the more optimal and fittest community or ecosystem. Known as biological altruism, this is distinct from traditional notions of altruism, because such actions appear to be evolutionary adaptations to increase overall fitness, and should be computationally reproducible. Maynard Smith [32] writes, “paradoxically, it has turned out that game theory is more readily applied to biology than to the field of economic behaviour for which it was originally designed”. Evolution, from the computer science point of view, is the process of adjusting biological processes to constantly changing objective functions and changing constraints in changing environments. New computational concepts combining evolutionary computation, game theory and novel optimization techniques are required. Computational methods should incorporate biological variability, not only on the genomic level, but on epigenetic and phenotypic levels, recursive optimization, hierarchical optimization and simultaneous fitting to different weighted objective functions.

9

10

11

12

12a

13 14

15

16

17

18

19

20

21

22

Funding Supported by the University of Edinburgh and the Okinawa Institute of Science and Technology.

23 24 25

References 1

2 3 4

5

6

7 8

Handl, J., Kell, D. and Knowles, J. (2007) Multiobjective optimization in computational biology and bioinformatics. IEEE Trans. Comput. Biol. Bioinform. 4, 279–292 Sutherland, W.J. (2005) The best solution. Nature 435, 569 Lack, D. (1947) The significance of clutch-size (part I–II). Ibis 89, 302–352 Ibarra, R.U., Edwards, J.S. and Palsson, B.Ø. (2002) Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420, 186–189 Fong, S.S., Burgard, A.P., Herring, C.D., Knight, E.M., Blattner, F.R., Maranas, C.D. and Palsson, B.Ø. (2005) In silico design and adaptive solution of Escherichia coli for production of lactic acid. Biotechnol. Bioeng. 91, 643–648 Schuetz, R., Kuepfer, L. and Sauer, U. (2007) Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Mol. Syst. Biol. 3, 15 Nielsen, J. (2007) Principles of optimal metabolic network operation. Mol. Syst. Biol. 3, 58 Goldberg, R.N., Tewari, Y.B. and Bhat, T.N. (2004) Thermodynamics of enzyme-catalyzed reactions: a database for quantitative biochemistry. Bioinformatics 20, 2874–2877

26

27 28

29

30 31 32

Chang, A., Scheer, M., Grote, A., Schomburg, I. and Schomburg, D. (2008) BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009. Nucleic Acids Res. 37, D588–D592 Wittig, U., Golebiewski, M., Kania, R., Krebs, O., Mir, S., Weidemann, A., Anstein, S., Saric, J. and Rojas, I. (2006) SABIO-RK: integration and curation of reaction kinetics data. Lect. Notes Comput. Sci. 4075, 94–103 Batt, G., Yordanov, B., Weiss, R. and Belta, C. (2007) Robustness analysis and tuning of synthetic gene networks. Bioinformatics 23, 2415–2422 Mendes, P. and Kell, D.B. (1998) Non-linear optimisation of biochemical pathways: applications to metabolic engineering and parameter estimation. Bioinformatics 14, 869–883 Rodriguez-Fernandez, M., Egea, J.A. and Banga, J.R. (2006) Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems. BMC Bioinformatics 7, 483 Banga, J.R. (2008) Optimisation in computational systems biology. BMC Syst. Biol. 2, 47 Wang, R.S., Wang, Y., Zhang, X.S. and Chen, L. (2007) Inferring transcriptional regulatory networks from high-throughput data. Bioinformatics 23, 3056–3064 Yeung, M.K.S., Tegner, J. and Collins, J.J. (2002) Reverse engineering gene networks using singular value decomposition and robust regression. Proc. Natl. Acad. Sci. U.S.A. 99, 6163–6168 Han, S., Yoon, Y. and Cho, K.H. (2007) Inferring biomolecular interaction networks based on convex optimization. Comput. Biol. Chem. 31, 347–354 Caldwell, G.W., Ritchie, D.M., Masucci, J.A., Hageman, W. and Yan, Z. (2001) The new pre-preclinical paradigm: compound optimisation in early and late phase drug discovery. Curr. Topics Med. Chem. 1, 353–366 Yang, K., Bai, H., Ouyang, Q., Lai, L. and Tang, C. (2008) Finding multiple target optimal intervention in disease-related molecular network. Mol. Syst. Biol. 4, 228 Goltsov, A., Maryashkin, A., Swat, M., Kosinsky, Y., Humphery-Smith, I., Demin, O., Goryanin, I. and Lebedeva, G. (2009) Kinetic modelling of NSAID action on COX-1: focus on in vitro/in vivo aspects and drug combinations. Eur. J. Pharm. Sci. 36, 122–136 Mohamad, M.S., Omatu, S., Deris, S. and Yoshioka, M. (2009) A recursive genetic algorithm to automatically select genes for cancer classification. IEE J. Trans. Electr. Electron. Eng. 4, 725–730 Price, N.D., Reed, J.L. and Palsson, B.Ø. (2004) Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat. Rev. Microbiol. 2, 886–897 Barker, J.S.F. (2009) Defining fitness in natural and domesticated populations. in Adaptation and Fitness in Animal Populations (van der Werf, J., Graser, H.-U., Frankham, R. and Gondro, C., eds), pp. 3–14, Springer, Heidelberg Orr, H.A. (2009) Fitness and its role in evolutionary genetics. Nat. Rev. Genet. 10, 531–539 Dawkins, R. (2006) The God Delusion, Houghton Mifflin, Boston Darwin, C. (1859) On the Origin of Species, 1st edn., John Murray, London Darwin, C. (1862) On the Various Contrivances by which British and Foreign Orchids are Fertilised by Insects, and on the Good Effects of Intercrossing, John Murray, London Rand, D.M., Haney, R.A. and Fry, A.J. (2004) Cytonuclear coevolution: the genomics of cooperation. Trends Ecol. Evol. 19, 645–653 Nicholson, J.K., Holmes, E. and Wilson, I.D. (2005) Gut microorganisms, mammalian metabolism and personalized health care. Nat. Rev. Microbiol. 3, 431–438 Thomas, G.H., Zucker, J., Macdonald, A.J., Sorokin, A., Goryanin, I. and Douglas, A.E. (2009) A fragile metabolic network adapted for cooperation in the symbiotic bacterium Buchnera aphidicola. BMC Syst. Biol. 3, 24 Hillis, W.D. (1990) Co-evolving parasites improve simulated evolution as an optimisation procedure. Physica D 42, 228–234 Sims, K. (1994) Evolving 3D morphology and behavior by competition. Artif. Life Arch. 1, 28–39 Maynard Smith, J. (1982) Evolution and the Theory of Games., Cambridge University Press, Cambridge

Received 30 June 2010 doi:10.1042/BST0381206  C The

C 2010 Biochemical Society Authors Journal compilation 

1209