Complex Dynamics of Molecular Evolutionary ... - Semantic Scholar

8 downloads 0 Views 1MB Size Report
Peter Schuster suggested a partition of evolutionary dynamics into three such ... Peter Schuster proposed an interac- ... inition we refer to Waterman 43]).
Proceedings of the

International Conference on Complex Systems

Nashua, NH, 21{26 Sept. 1997. Y. Bar-Yam (ed.), New England Complex Systems Institute (1997) . Also on-line in the InterJournal.

Complex Dynamics of Molecular Evolutionary Processes Christian V. Forst

Inst. of Molecular Biotechnology, Beutenbergstr. 11, 07745 Jena, Germany



e-mail: Forst [email protected]

Evolution is an extreme complex dynamical phenomenon which has to be partitioned into simpler systems for an adequate description. Peter Schuster suggested a partition of evolutionary dynamics into three such subsystems [35]: Population Dynamics Population Support Dynamics Genotype{Phenotype Mapping In this paper we present an adequate description of all three systems and a structural and dynamical characterisation of their combination. Describing genotype{phenotype relations by RNA sequence{structure maps is the most delicate part in this approach. Such a relation plays a central role in evolution, because genetic mutation acts upon genotype but selection operates on phenotypes. The set of all sequences which map into a particular structure is modelled as random graph in sequence-space. Population-support dynamics by means of a catalytic network is realized as a (large) random digraph and secondary structures as vertex set. Studying a population of catalytic RNA-molecules exploring and utilising a large catalytic network shows signi cantly di erent behavior compared to a deterministic description: hypercycles are able to coexist and survive resp. a parasite with superior catalytic support. A switching between di erent dynamical organisations of the network can be reported. Evolutionary dynamics in its full capability of describing interacting evolving species provides a powerful tool for molecular evolutionary biology. Not only evolutionary experiments yield better models but also these models are capable for the design of optimal experimental strategies. This synthesis between theoretical model and experimental setup is the key of a comprehensive and complete description of molecular evolution.

1 Introduction

them accessible to an analysis by the conventional methods of physics and chemistry. Evolutionary dynamics itself is a highly complex process. Therefore we omit additional diculties in considering spatiotemporal patterns and introduce a comprehensive model which tries to account for most of the relevant features of molecular evolution. Peter Schuster proposed an interaction of three processes described in three di erent abstract metric spaces [35] as essential building blocks of evolutionary dynamics:  the sequence space of genotypes being DNA or RNA sequences,  the shape space of phenotypes, and  the concentration space of biochemical reaction kinetics.

Dynamics of molecular-biological, such as cellular metabolisms, are easily interpreted as complex and non-linear bio-chemical system. Any comprehensive understanding of such biological phenomena requires an interpretation in evolutionary terms as Theodosius Dobzhansky [6] formulated: \Nothing in biology makes sense expect in the light of evolution". This sentence, rephrasing Galilei's famous quote [18, 28], is much stronger since it postulates the existence of a formal language to describe and explain observations in nature. Providing the ignition spark by outstanding discoveries made by Francis Crick and James Watson in the year 1953 the research eld of molecular biology was born [44]. Initial steps in the direction of a molecular evolutionary biology have already been performed in the late sixties by the pioneer work of Sol Spiegelman [40] who developed serial transfer experiments as a new method of molecular evolution in the test tube. Manfred Eigen [7] at about the same time formulated a kinetic theory of molecular evolution. Since then studying evolution of molecules in laboratory systems has become a research area of its own. This approach simpli es evolutionary systems as much as possible and makes

Both a description of each subsystem and the characterisation of properties emerged by combination of all three systems is the main goal of this paper. We rst give an introduction about graph-topology in sequence-space induced by genotype{phenotype maps (x2). By combining these genotype{phenotype relationships with a Darwinian \hillclimbing" scenario in x3 we observe new characteristics of  now at: University of Illinois, Beckman Institute, 405 N. Mathews the optimisation process. But formulating molecular evoAve., Urbana, IL 61801 lution in terms of evolutionary dynamics has to take into 0

consideration not only approaching a steady state but also nonlinear dynamical phenomena like oscillations or chaotic behavior in space and time. In x4 we construct both small \deterministic" and large random catalytic networks and report stability of hypercycles in competition with a superior parasite and emergence and optimisation of hypercycles in a large system.

RNA molecules are their capability of relatively simple evolutionary experiments in the test-tube. Here genotype (the sequence) and phenotype (spatial structure) are two features of the same molecule [40]. Secondary structure: .(((...))) UGC

2 Biomolecules

Sequences corresponding to unpaired part of structure:

The function of biomolecules, especially peptides and nucleic acids is predetermined signi cantly by their tertiary structure in space. Active residues of these molecules are kept in precise position by a huge spatially organised framework of interacting residues and backbone. As conserved active residues in e.g. catalytically active sites are as exible is the structural framework. Here complete motives can be omitted maintaining (almost) unperturbed functionality. Thus a relevant structure of biopolymers in a given context is seldom described with atomic resolution. In order to detect phylogenetic relations, e.g., structures of proteins are often considered to be similar when polypeptide backbones coincide roughly. A large fraction of amino-acid residues can be exchanged without changing these coarse-grained structures that are apparently relevant in an evolutionary context. Rost and Sander investigated that 25% pairwise sequence identity of residues are sucient for folding into the same structure [33]. I.e. 75% sequence dissimilarity (in best cases) is compatible with conserved structures. Similar results are known for RNA structures. Here an adequate coarse-graining is represented by the secondary structure. They commonly are understood as list of Watson-Crick (A=U and GC) and Wobble (G?U) basepairs which are compatible with unknotted and pseudoknot-free two-dimensional graphs (for a precise formal definition we refer to Waterman [43]). The relevance of RNA secondary-structures for biomolecular function is signi cantly re ected in viral life cycles [46]. Replication of RNA molecules in the Q -system1 depends exclusively on the structural feature of a hairpin at the 5'-end [3]. Especially kinetics of RNA replication by Q -replicase and the dependence of structural features have been studied [4]. A di erent system | which is extensively examined | is the internal initiation of translation for speci c +-strand RNAviruses. This so called IRES-region (Internal Ribosomal Entry Site) | a highly structured region close to the 5'end of the virus genome | is responsible for the success of the genome translation in the host cell [29].

Sequences corresponding to pairs of structure: A UGC GU U AC GUG

A

AAA UUU AAU UUA

A A

A

A

U

U

U

A

AAG UUC

A A

AAAA AAAC AAAG AAAU AACA UUUG UUUU

AAC UUG AAG UUU AAU UUG UUG GGU UUU GGG

Figure 1: Compatible sequences with respect to a xed secondary structure. A sequence is called compatible with a given secondary structure if for all base pairs in the structure there are pairs of matching bases in the sequence. Sequences compatible to a structure do not fold in general into this structure. However the structure will always be found as result of suboptimal folding.

De ning secondary structures independently of chemical or physical restrictions yields a general description based on contacts with respect to arbitrary alphabets A with arbitrary pairing rules. A pairing rule on A is given as a set of pairs of letters from the given alphabet. As an extension of secondary structures a general contact structure c is determined by a set of contacts of c omitting the trivial contacts due to adjacent letters in the succession of the sequence [26]. A relevant concept in studying sequence { structure relation is how sequences has to be composed to ful l necessary conditions for folding into a desired structure. In the following we de ne compatibility of a sequence to a given structure: A sequence x is said to be compatible to a structure s if all base-pairs required by s can be provided by xi and xj 2 x with respect to the pairing rule for each base pair (Fig. 1). C(s) is the set of all sequences which are compatible to structure s. The number of compatible sequences is readily computed for secondary structures (with nu unpaired bases and np base pairs this evaluates to 4nu  6np ).

2.2 Covering Sequence-Space

The relation between RNA sequences and secondary struc2.1 RNA Secondary Structures and Com- tures is understood as a surjection fn from sequence space patible Sets Qn into shape space Sn . Essential insights of a graph-

RNA secondary structures and the induced sequence { theoretical approach characterising generic properties of structure relationship are a suitable and generic descrip- such maps are presented here. A mathematical framework tion for genotype { phenotype mapping which is important with proofs can be found elsewhere [32]. in molecular evolutionary biology. One great advantage of The set of all sequences folding into a given structure isn denoted as neutral network ?n (s) with respect to s. Q 1 Q -phage is a virus which infects bacteria denotes the generalised hypercube of dimension n over an 1

alphabet A of size (i.e. the number of letters in A is ), and s 2 Sn is a xed secondary structure. Mathematically ?n (s) refers to the induced subgraph of fn?1 (s) in C(s) (fn?1 (s) indicates the preimage of a xed structure s w.r.t the mapping fn ). A sketch of these embeddings is shown in Fig. 2. Remark 1 The graph of compatible sequences C (s) to a xed secondary structure s is (1) C (s) = Qn u  Qn p is the number of di erent nucleotides, and is the number of di erent types of base pairs that can be formed by di erent nucleotides.

n

n

Hypercube ... Q α: i.e. 4 n ... length of sequence α ... alphabet length compatible set to structure s ............ nu ... # unpaired bases np ... # paired bases n = nu + n p

Once we know how to construct a neutral network ?n (s) for a single structure we extend the description of a \folding landscape" towards many structures. Given an ordered set of secondary structures Sn and de ne a complete mapping by iterating the construction process of the corresponding neutral network w.r.t. the ordering. The preimage for the structure with highest rank s1 is assigned independently. For all other structures si , i > 1 the mapping depends on all previous assignments. Note that the given ranking is arbitrary. The actual ordering into common and rare structures is essentially dependant on jC(s)j, the rank and the choosing probabilities for accepting a sequence as element of the preimage to a given structure.

3 Evolutionary Dynamics A canonical approach studying molecular evolution is a combination of a genotype{phenotype mapping with a reaction scheme. Thus we consider such a (bio)chemical reaction system and study induced dynamics of a population of individuals with genotypes and phenotypes (with a distinct relationship characteristic for biopolymers) living in an arti cial world. Error-prone autocatalytic and catalyzed replication, unspeci c dilution due to limited recourses, speci c dilution due to predation, alteration due to reaction between individuals happen and change the composition of the population in time. As an example a possible scenario is shown in Fig. 3: On the phenotypic level a reaction system

Sequences

..... C(s) = Qnuα×Qnpβ: 4nu×6np Sequences

Γ(s): λu×λp×4nu×6np Sequences on the average ... "Neutral Net " (Random Graph) λ ... fraction of neutral neighbors (0