Transposable Elements and Factors Influencing ... - Semantic Scholar

8 downloads 0 Views 339KB Size Report
Aug 7, 2009 - ... to Ellen J. Pritham at the address above, or e-mail: pritham@uta.edu. ... Class 2 or DNA transposons transpose via a DNA ..... 110:70–90.
Journal of Heredity 2009:100(5):648–655 doi:10.1093/jhered/esp065 Advance Access publication August 7, 2009

Ó The American Genetic Association. 2009. All rights reserved. For permissions, please email: [email protected].

Transposable Elements and Factors Influencing their Success in Eukaryotes ELLEN J. PRITHAM Department of Biology, University of Texas, Arlington, 501 S. Nedderman Drive, Arlington, TX 76019. Address correspondence to Ellen J. Pritham at the address above, or e-mail: [email protected].

Abstract Recent advances in genome sequencing have led to a vast accumulation of transposable element data. Consideration of the genome sequencing projects in a phylogenetic context reveals that despite the hundreds of eukaryotic genomes that have been sequenced, a strong bias in sampling exists. There is a general under-representation of unicellular eukaryotes and a dearth of genome projects in many branches of the eukaryotic phylogeny. Among sequenced genomes, great variation in genome size exists, however, little difference in the total number of cellular genes is observed. For many eukaryotes, the remaining genomic space is extremely dynamic and predominantly composed of a menagerie of populations of transposable elements. Given the dynamic nature of the genomic niche filled by transposable elements, it is evident that these elements have played an important role in genome evolution. The contribution of transposable elements to genome architecture and to the advent of genetic novelty is likely to be dependent, at least in part, on the transposition mechanism, diversity, number, and rate of turnover of transposable elements in the genome at any given time. The focus of this review is the discussion of some of the forces that act to shape transposable element diversity within and between genomes. Key words: gene mapping, genome evolution, transposable element

Despite great variation in genome size, little difference in the total number of cellular genes is observed among eukaryotes (Gregory 2005; Feschotte and Pritham 2007b). In many instances, the cellular genes may represent just a small sliver of the total genomic space. These genic regions are the most stable part of an organism’s genome, largely due to purifying selection acting to preserve gene function. For many eukaryotes, the remaining genomic space is extremely dynamic and predominantly composed of a menagerie of populations of transposable elements (TEs) (for review Kidwell and Lisch 2001; Craig et al. 2002). The TEs can be viewed as genomic squatters, shacking up in the genome without providing any direct benefit to the host. TE persistence is then a reflection of both the ability to replicate faster than the host cell and of the balance between TE reinfection through outcrossing, horizontal transfer and TE loss through excision, sequence erosion, selection and drift (Craig et al. 2002). Given the dynamic nature of the genomic niche filled by TEs, it is evident that these elements have played an important role in genome evolution (Kidwell and Lisch 2002; Wessler 2006; Feschotte and Pritham 2007a). The contribution of TEs to genome architecture and to the advent of genetic novelty is likely to be dependent, at least in part, on the transposition mechanism, diversity, number and rate of turnover of TEs in the genome at any given time. The focus of this review is

648

the discussion of some of the forces that act to shape TE diversity within and between genomes.

Transposition Mechanisms and TE Classification TEs are classified based on the nature of the transposition intermediate (RNA or single or double-stranded DNA [ssDNA or dsDNA]), their structural features, and by homology to known elements (Feschotte and Pritham 2007a; Wicker et al. 2007). Broadly, TEs are divided into 2 classes based on whether the transposition intermediate is RNA (class 1) or DNA (class 2) (Figure 1). Class 1 TEs or retrotransposons all require reverse transcriptase to copy their RNA into DNA and can be subdivided into 3 groups based upon mechanism of integration (Figure 1) (for review Eickbush and Malik 2002). The reverse transcriptase encoded by each of these groups shares 7 blocks of conserved sequences suggesting that they are related by descent and share a common ancestor, although in the very distant past (Xiong and Eickbush 1990). The diversity of structures and integration mechanisms of retrotransposons is a testament to their ability to adapt and change. The long terminal repeat (LTR) and the tyrosine recombinase (YR) retrotransposons are both flanked by LTRs but they differ

Pritham  TEs and Genome Evolution

Figure 1. TEs are broadly classified into 2 classes. Class 1 retrotransposons move using an RNA intermediate, and class 2 the DNA transposons utilize a DNA intermediate. The presence and orientation of repeated DNA structures flanking or within the TEs are indicated by arrows. The black arrows are TIRs. The gray arrows are direct repeats. The striped arrows indicate a repeated sequence or palindrome within the DNA. The gray-hatched boxes indicate the proteins encoded by autonomous TEs, the number of ORFs is so indicated.

in the mechanism of integration (for review Cappello et al. 1984; Eickbush and Malik 2002; Goodwin and Poulter 2004; Poulter and Goodwin 2005). The LTR retrotransposons utilize an integrase, which is evolutionarily related to the transposase encoded by cut-and-paste DNA transposons, whereas the YR retrotransposons utilize a tyrosine recombinase. The non-LTR and probably also Penelope-like retrotransposons transpose via a process termed targetprimed reverse transcription (TPRT) and integration is mediated by either an apurinic/apyrimidinic or a restrictionlike endonuclease (Eickbush and Malik 2002; Evgen’ev and Arkhipova 2005). Class 2 or DNA transposons transpose via a DNA intermediate. ‘‘Classic’’ DNA transposons are excised as dsDNA intermediate and reintegrated elsewhere in the genome (for review Feschotte and Pritham 2007a). These ‘‘cut-and-paste’’ transposons are exemplified by a relatively simple structure typically consisting of a single ORF, encoding a transposase, flanked by terminal inverted repeats (TIRs) and they are usually less than 5 kb in size (Figure 1). Helitrons, which represent a second major subclass of DNA transposons, are most likely mobilized as ssDNA intermediates through a replicative, rolling-circle–like mechanism. They encode a putative protein with a central domain homologous to the rolling-circle replication proteins encoded by rolling-circle genetic elements (e.g. plasmids, phages) and a C-terminal domain related to the PIF1 group of DNA helicases. Plant Helitrons also encode 1–3 additional putative proteins homologous to ssDNA binding proteins (for review Kapitonov and Jurka 2006, 2007; Feschotte and Pritham 2007a). Finally, Mavericks represent a third subclass of DNA transposons recently identified in a wide range of eukaryotes. The Mavericks are distinguished from the other DNA transposons by their large size (ranging between 9 and 22 kb) and extensive coding capacity (9–20 open reading

frames [ORFs]), which include a gene encoding a viral-like DNA polymerase (Figure 1) (Feschotte and Pritham 2005; Kapitonov and Jurka 2006; Pritham et al. 2007). Mavericks also encode a retroviral-like integrase and therefore their transposition cycle involves integration of a dsDNA intermediate. TEs that utilize unknown transposition mechanisms are being discovered and described at an unprecedented rate due to the tremendous abundance of genome data now available for scrutiny (Kapitonov and Jurka 2001; Goodwin et al. 2003; Feschotte 2004; Feschotte and Pritham 2005; Feschotte and Pritham 2007a; Pritham et al. 2007).

The Distribution of TEs in the Tree of Life The genomes of hundreds of eukaryotes have been or are in the process of being sequenced, providing the opportunity to analyze a vast quantity of TE data. However, it should be noted that there is a strong bias in this data set toward animals and fungi (e.g., opisthokonts) and to a lesser degree apicomplexan (which include malaria parasites) and plant genomes. This bias is evident when the number and incidence of projects are considered in a phylogenetic context (see Figure 2). Surprisingly, many branches of the eukaryotic phylogeny have yet to be sampled. This bias in sampling makes difficult, and should probably preclude, any broad generalizations about TE diversity and distribution in eukaryotes. Surveys of the TE populations within the sequenced genomes have made it clear that TEs are indeed widespread and persistent entities in metazoans, fungi, and plants. In addition, a positive correlation is often seen between genome size and TE content in these genomes (Feschotte and Pritham 2007b). However, the distribution and abundance of TEs in unicellular eukaryotic organisms is far less understood,

649

Journal of Heredity 2009:100(5)

Figure 2. The distribution of eukaryotic genome projects mapped in a phylogenetic context. A survey was undertaken of eukaryotic genomes with sequencing projects completed or underway. Both the presence and abundance, of genome sequencing projects in a specific phylogenetic branch is indicated by a gray circle. The size of the circle is meant as a rough indication of total number of projects. The smallest circle indicates a single project and the largest indicating .50 projects. A 5 supergroup Plantae, B 5 supergroup Excavate, C 5 supergroup Rhizaria, D 5 supergroup Unikonts, E 5 supergroup Chromalveolates (phylogeny redrawn from Keeling et al. 2005).

hampered by the paucity of sequencing projects and therefore the scarcity of data, as well as a less systematic and careful scrutiny of the sequenced genomes.

Genome Sequence Comparisons Reveal Patterns in TE Diversity Examination of complete genome sequence data allows the analysis of an organism’s genome at a single point in time. Dramatic differences between the success of retrotransposons and DNA transposons are revealed when surveys of genome sequence data are undertaken (Figure 3). Variation occurs in terms of total number, composition, and location of TEs within and between genomes. For example, both the human and mouse genomes are dominated by retrotransposons (Lander et al. 2001; Waterston et al. 2002), whereas DNA transposons have been relatively more successful in the genome of the nematode Caenorhabditis elegans (Consortium 1998). In the budding and fission yeast genomes, only a few hundreds LTR retrotransposons are found, despite a 1 billion years of divergence (Hedges 2002). Are these

650

patterns purely stochastic or are they the result of evolutionary forces acting to influence TE success and shape genome architecture? If TE diversity and success was attributed solely to random gain and loss, than given the rapid turnover of TEs, no trends should be apparent. Given the rapid turnover of TEs and a constant rate of TE introduction and amplification, if stochasticity was the major determinant in shaping these patterns than no trends should be apparent. However, some trends in TE composition do appear to be conserved. For example, the human, macaque, mouse, rat, and dog genomes share a strikingly similar pattern of TE composition, despite continuous and extensive lineage-specific TE activity (Pace and Feschotte 2007). Similarly, analysis of the sequenced genomes of 12 Drosophila species separated by up to 40 million years of evolution shows that retrotransposons predominate in all these species, whereas DNA transposons consistently represent less than 20% of the total TE content (Clark et al. 2007). What does the conservation of TE composition between species tell us, and can inferences be made about the history of a species when a difference in pattern is observed?

Pritham  TEs and Genome Evolution

Figure 3. Variation of TE composition across genomes. For each species, the relative proportion of RNA and DNA TEs was calculated. The data were compiled either from the corresponding papers reporting draft genome sequences or from the following sources: nematode C. Feschotte, personal communication; rice (Jiang et al. 2004) and N. Jiang, personal communication; Entamoeba (Pritham et al. 2005); Giardia lamblia (Arkhipova and Morrison 2001); Trichomonas vaginalis (Pritham et al. 2007); and Ellen Pritham (unpublished data). The species are Hs 5 Homo sapiens, Mm 5 Mus musculus, the nematode, Caenorhabditis elegans, Dm 5 Drosophila melanogaster, De 5 Drosophila erecta, Ag 5 Anopheles gambiae, Aa 5 Aedes aegypti, Ed 5 Entamoeba dispar, Eh 5 E. histolytica, Ei 5 E. invadens, Em 5 E. moshkovskii, Sc 5 Saccharomyces cerevisiae, Sp 5 Schizosaccharomyces pombe, At 5 Arabidopsis thaliana, Os 5 Oryza sativa japonica, Gi 5 Giardia lamblia, Tv 5 Trichomonas vaginalis.

Forces that Influence TE Diversity can be Subdivided into Three Groups based on the Scale at which They Act: Molecular, Genetic, and Environment Molecular Properties of the TE Molecular properties of the TE can function to influence TE activity and therefore accumulation in genomes, as well as the propensity of TEs to be vertically or horizontally transmitted. Some examples of molecular factors include transposition mechanism (Eickbush and Malik 2002), the pattern and timing of transposition (which can be influenced by tissue and temporal specific promoter regions or alternative splicing) (Rio 2002), TE autoregulation (Rio 2002), targeting (Devine and Boeke 1996; Eickbush and Malik 2002); (for review Lesage and Todeschini 2005) and infectivity (or the ability to move cell to cell or between organisms) (Malik et al. 2000). The non-LTR retrotransposons provide an excellent example of how the mechanism of transposition can influence both the ability of TEs to colonize specific genomic niches and their ability to propagate horizontally, as well as limit the selective impact of new insertions. For non-LTR retrotransposons, in-

tegration is coupled to reverse transcription of the mRNA in a process termed TPRT (Christensen and Eickbush 2005). Due to the inherent instability of RNA it is probable that the DNA form of the retrotransposon would be more likely to move horizontally than the RNA itself. Because the DNA is integrated directly into the nuclear genome, as it is reverse transcribed from RNA, the window of opportunity for horizontal movement appears to be temporally narrow for these elements (for review Eickbush and Malik 2002). In addition, for most non-LTR elements the TPRT process results in many copies that are truncated in the 5#-end, with the promoters being lost and further propagation inhibited. These ‘‘dead on arrival’’ copies are typically the most abundant product of TPRT. Thus, for a successful horizontal transfer event to occur, a rare complete nonLTR element would have to be transferred. Also, many nonLTR retrotransposons have target-site specificity, which minimizes the genomic space where integration can occur. For example, R2 and R4 elements target the 28S genes and CRE and NeSL are targeted to splice-leader sequences (Burke et al. 1987; Teng et al. 1995; Malik and Eickbush 2000) (and for review Eickbush and Malik 2002). Both TPRT and site-specific targeting are expected to reduce the potential of non-LTR retrotransposons to be moved between species by horizontal transfer. On the other hand, targeted integration into other highly repeated sequences minimizes the deleterious effects of these elements, which facilitates their vertical persistence (Malik et al. 1999). Indeed, in some cases non-LTR retrotransposons have even gone extinct in a number of mammals (Casavant et al. 2000; Grahn et al. 2005; Rinehart et al. 2005; Cantrell et al. 2008). In contrast, horizontal movement of both DNA transposons and LTR retrotransposons appears to be more frequent. Recurrent waves of horizontal transfer of DNA transposons are thought to explain the diversity of recently active DNA transposons in the genome of the little brown bat (Myotis lucifugus) (Ray et al. 2008). The diversity of TEs in M. lucifugus deviates dramatically from the pattern observed in the genomes of other well-studied mammals where no recent DNA transposon activity has been reported and retrotransposons predominate (Pritham and Feschotte 2007; Ray et al. 2007, 2008). The compact eukaryotic genomes, like those of the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, provide insight into how the ability to target specific genomic regions has allowed LTR retrotransposons to persist in these highly compact, gene-rich genomes and highlights the complex interplay between molecular properties of the TE and genetic properties of the genome. Both genomes are small (;12.5 Mb) and have limited intergenic space, proving a hazardous terrain for TE insertion. TEs that target specific sites in the genome, in effect narrow the scope of their mutagenic impact to these specific genomic locales. The Ty1 and Ty3 retrotransposons of S. cerevisiae target the upstream of genes transcribed by RNA polymerase III (Kim et al. 1998). For example, 90% of Ty1 elements are within the 750 bp flanking a tRNA gene. Ty5 retrotransposons, which naturally inhabit the genome of

651

Journal of Heredity 2009:100(5)

S. paradoxus, actively target the silent chromatin located at the mating loci and near the telomeres (Zou et al. 1996). In S. pombe, the Tf1–2 elements have evolved a different targeting strategy whereby they integrate upstream of RNA polymerase II transcripts (Leem et al. 2008). Remarkably, the targeting strategy of all these retrotransposons has evolved independently and is mediated by direct interactions between host factors and the integrase proteins encoded by the elements (Gao et al. 2008). This illustrates how TEs can establish intimate and long-term relationships with their hosts. Targeting potential does not seem to be a universal feature of LTR retrotransposons and the location and mechanism for targeting differs among elements suggesting that this intrinsic property has evolved repeatedly at different evolutionary times. Ecological Influences on Host TE Populations The ecology of the host population and environment are also likely to play an indirect but important role in the success of TEs within genomes. Examples of these forces include parasite load, environmental quality, resource availability, and competition. These ecological components can affect the breeding system, effective population size of the host, and exposure to vectors of horizontal transfer which in turn influence the ability of TEs to proliferate and fix in a genome (Wright and Schoen 1999; Arkhipova and Meselson 2000; Lynch and Conery 2003). Of course, these forces are not mutually exclusive and they are often interconnected. For example, effective population size will affect TE success directly by determining the likelihood of new insertions to reach fixation within a population and also indirectly by influencing other features of genome organization and gene structure which determines whether an insertion is likely to be detrimental or largely neutral for the fitness of the host (Lynch 2007). In addition, none of these factors are static with respect to time. Sex and recombination have been postulated to be of major import in the success of TEs in populations, and their influence has been investigated in a number of different systems (Arkhipova and Meselson 2000; Wright et al. 2003; Valizadeh and Crease 2008). Because TE insertions do not in general have a selective advantage they will be eventually lost over time when sequence erosion and drift leads to a higher rate of loss than birth or reinfection occurs through zygote formation and horizontal transfer. Therefore it is expected that strictly asexual organisms will eventually be purged of TEs. The bdelloid rotifers are a widely diverged group of animals where meiosis, males, and sex have never been observed. Studies investigating TE distribution and diversity in the genome of bdelloid rotifers reveal the presence of a diverse assortment of DNA transposons and 2 families of retrovirus-like elements, but an apparent lack of non-LTR retrotransposons (Arkhipova and Meselson 2000; Arkhipova and Meselson 2005). Because DNA transposons and retroviruses, but not non-LTR retrotransposons, are thought to be prone to HT, this biased TE composition was interpreted to support the theory that the lack of sex

652

indeed result in a general purging of TEs that cannot be reintroduced horizontally. Recent studies from the same group show that bdelloid rotifers have been subject to recurrent and massive horizontal transfer events, a finding that may lend support to the explanation that the biased TE composition of their genomes is driven by the presumed asexuality of these organisms. However, breeding systems do not seem to be an overriding force in determining the diversity of TEs in other organisms. A survey of TEs in the ;20-Mb genomes of 4 related Entamoeba species reveals that a diverse set of TEs (including retrotransposons and DNA transposons) contribute to between 5% and 8% of these genomes (Pritham et al. 2005). Entamoeba are unicellular amoebas that are either parasitic or free living and have a genome size of ;20 Mb. They reproduce asexually using binary fission and no sexual stage has even been observed, however, a cryptic sexual stage has not been formally ruled out. Yet different species of Entamoeba display dramatic variation in the type and success of TEs populating their genomes. The closely related extracellular, human parasites E. histolytica and E. dispar genomes are packed with non-LTR retrotransposons, whereas the genomes of the reptilian parasite, E. invadens and the free living, E. moshkovskii have relatively few retrotransposons but host a wide diversity of DNA transposons. No evidence of horizontal transfer was detected and phylogenetic analysis suggested that many of the TE lineages detected in these 4 species were present in their common ancestor and most likely have been vertically inherited. Demographic factors like population bottlenecks are expected to play a role in TE diversity both due to alterations in the efficacy of natural selection as well as the impact of genetic drift in changing genetic diversity. In addition, it has been suggested that the mechanism of transposition that leads to differential accumulation, such as is seen between retrotransposons (copy and paste) and DNA transposons (cut and paste) predisposes them to differential success in an effective population size dependent manner (Lynch and Conery 2003). Therefore, it may be that E. moshkovskii and E. invadens have an effective population size sufficient to allow for DNA transposon accumulation, whereas E. dispar and E. histolytica do not. Indeed, studies have revealed that E. dispar and E. histolytica have gone through recent population bottlenecks (Ghosh et al. 2000). It is clear that a single intrinsic or extrinsic factor is not sufficient to explain TE diversity within a genome.

Are TEs Ubiquitous Components of Eukaryotic Taxa? The publication of several, unicellular eukaryotic genome papers fail to report the presence of TEs in the respective genomes. Included in the list are the red alga Cyanidioschyzon merolae, supergroup Plantae (16.5 Mb) Matsuzaki, Misumi et al. 2005, the Apicomplexans: Babesia bovis (9.4 Mb) Brayton et al. 2007 Cryptosporidium hominis (9.2 Mb) (Xu et al. 2004), C. parvum (9.09 Mb) Abrahamsen et al. 2004,

Pritham  TEs and Genome Evolution

Plasmodium falciparum (23.27 Mb) Gardner et al. 2002, P. yoelli yoelli (20.17 Mb) Carlton et al. 2002 and Thelieria parva (8.35 Mb) Bishop et al. 2005 and the Unikont, Encephalitozoon cuniculi (2.8 Mb) (Katinka et al. 2001). However, because most of these organisms are only distantly related to the majority of the eukaryotic genome sequences available in the databases, the lack of reported TEs in some cases might reflect an inability to identify TEs based on sequence homology to known TE types. Closer inspections of these genomes with de novo repeat identification software (for review Feschotte and Pritham 2007b) might reveal the presence of novel TE families, that have never been previously described or are only distantly related to known TEs. Nonetheless, it is noticeable that these genomes are all extremely small, ranging in size between 2.8 and 23.27 Mb. Among unicellular eukaryotes, there is a strong correlation between genome and cell size. The seeming dearth of TEs identified in these genomes may provide insight into the population demography of these species. For example, the lack of TEs coupled with the relatively small genome size might indicate that natural selection is effectively removing TEs from these genomes—perhaps due to a selective pressure to maintain cell size and therefore genome size. Another, not mutually exclusive explanation for the lack of detectable TEs in these species would be that a TE-depleted genome was inherited from a common ancestor that was itself TE free. For example, perhaps the common ancestral genome of all Apicomplexa was TE free (of the 6 Apicomplexan genomes published, no convincing reports of TEs have been made), due to a single demographic accident. However, even if the genome of the common ancestor suffered a massive TE extinction, what remains a puzzle is the ability of the genome to remain TE free. Horizontal transfer of both LTR retrotransposons and DNA transposons is postulated to be a frequent occurrence and in fact it appears even necessary to explain the persistence of these TEs over long evolutionary time. It would seem that these unicellular organisms would be particularly susceptible to horizontal transfer due to the lack of a protected germline. If these genomes are indeed TE free, why and how they remain TE free is perplexing. Does their life history as obligate intracellular parasites preclude these organisms from coming in contact with the vectors, like viruses, that might act as intermediates for the horizontal transfer of TEs? Have they developed a particularly effective line of defense against genomic invaders? Paradoxically, apicomplexans appear to have lost the RNA interference machinery, which has been shown to help protect against TEs and viruses in animals and plants (Ullu et al. 2004). However, it is difficult to determine if this loss was secondary, as a result of the lack of threats posed by TEs or viruses or if the loss was coincident with the loss of TEs. It is also worth mentioning that Apicomplexans seem to have dearth of TEs despite having a sexual stage and going through meiosis. Seven of the 8 species in the list inhabit the cell of another organism and their existence depends on exploitation of that organism’s resources. The eighth organism Cyanidioschyzon merolae is an extremophile, inhabiting an acidic hot-spring (Misumi et al. 2005). Intracellular parasites

might be expected to be under a strong selective constraint to maintain cell size in order to occupy the cellular niche, effectively. In addition, genome reduction is a general feature of intracellular pathogens. Most bacterial intracellular pathogens, in concert with having a reduced genome are also depauperate in TEs, with the notable exception of some Rickettsiales species (Masui et al. 1999; Duron et al. 2005; Simser et al. 2005; Sanogo et al. 2007; Cordaux 2008). A key to this puzzle may be that in addition, to providing resources, perhaps the intracellular host environment acts as a shield from exposure to the vectors that are necessary for TE horizontal transfer, as mentioned above. Therefore, it might be reasonable to expect that an intracellular pathogen might initially lose its TEs through selection and drift, and then maintain, a TE-free genome, as a side effect of the protection from vectors, afforded by the inhabitation of the intracellular environment. Careful examination of the genomes of unicellular eukaryotes will provide a better picture of the relative importance of different factors in explaining the pattern seen.

Some Unicellular Eukaryotic Genomes are Populated by TEs Multiple phylogenetically diverse extracellular pathogens have had their genomes sequenced and display a variety of TEs that have accumulated with varied levels of success. For example, TEs have been identified in the genomes of Leishmania major, Trypanosoma brucei, T. cruzi (Kinetoplastids), Trichomonas vaginalis (Trichomonad), Giardia lamblia (Diplomonad) (Figure 2B), Entamoeba dispar, E. histolytica, E. invadens, E. moshkovskii, and E. terrapinae (Archamoebae; Figure 2D) and Perkinsus marinus (Alveolate; Figure 2E) (unpublished data). Most of these organisms are parasites and all display genomes ,200 Mb in size. Taking a closer look at unicellular eukaryotic genomes and in particular very small ones with seemingly atypical life history traits (e.g., absence of sex, obligate intracellular parasites,. . .), may be an excellent means tobegan to decipher the relative importance of molecular, genetic, environmental, and population level forces in defining TE composition and success.

Conclusions Recent advances in genome sequencing have led to a vast accumulation of TE data and allowed many interesting observations to be made. For example, a remarkable diversity of TEs has been uncovered, some with striking relationships to viruses, revealing a dynamic relationship between TEs and viruses. Consideration of the genome sequencing projects in a phylogenetic context reveals that despite the hundreds of eukaryotic genomes that have been sequenced, a strong bias in sampling exists. There is a general under-representation of unicellular eukaryotes and a dearth of genome projects in many branches of the eukaryotic phylogeny especially, in the supergroup, Rhizaria.

653

Journal of Heredity 2009:100(5)

This bias in sampling warrants an embargo on generalizations concerning the distribution and behavior of TEs in eukaryotes. Indeed, 9 eukaryotic genomes are apparently devoid of TEs altogether, upending the long held idea that TEs are ubiquitous in eukaryotic taxa. The population biology of the genome is reminiscent of an ecosystem where TEs are born, replicate and die and new populations emerge, migrate, and colonize other locations— as well as become extinct. This metaphor is not entirely artificial as chromosomes and even regions of chromosomes can be compared with different ecological niches to which TEs are more or less well adapted. Migration of TEs can occur between chromosomes, as well as between individuals. In addition, there is competition between TEs for genomic resources (host factors) as well as transposition proteins, because the latter are generally encoded by a relatively small fraction of TEs within a given genome. The factors that govern the TE diversity and richness of a genome are complex and are likely to be a combination of properties intrinsic to the TE itself as well as extrinsic to the host. Natural selection is acting on those TEs that are the most fit, where fitness is a measure of the number of copies produced without adversely affecting the fitness of the host and/or the ability to colonize new environments. Understanding the role of these factors in influencing the diversity and differential success of TE in genomes may allow observed patterns in extant genomes to provide a window into the past demographic history of the species in question. Small unicellular eukaryotic genomes are an excellent substrate to study the role of various factors in promoting TE diversity and composition.

Cappello J, Cohen SM, Lodish HF. 1984. Dictyostelium transposable element DIRS-1 preferentially inserts into DIRS-1 sequences. Mol Cell Biol. 4:2207–2213. Casavant NC, Scott L, Cantrell MA, Wiggins LE, Baker RJ, Wichman HA. 2000. The end of the LINE?: lack of recent L1 activity in a group of South American rodents. Genetics. 154:1809–1817. Christensen SM, Eickbush TH. 2005. R2 target-primed reverse transcription: ordered cleavage and polymerization steps by protein subunits asymmetrically bound to the target DNA. Mol Cell Biol. 25:6617–6628. Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, et al. 2007. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 450:203–218. Consortium CeS . 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 282:2012–2018. Cordaux R. 2008. ISWpi1 from Wolbachia pipientis defines a novel group of insertion sequences within the IS5 family. Gene. 409:20–27. Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002. Mobile DNA II. Washington (DC): American Society for Microbiology Press. Devine SE, Boeke JD. 1996. Integration of the yeast retrotransposon Ty1 is targeted to regions upstream of genes transcribed by RNA polymerase III. Genes Dev. 10:620–633. Duron O, Lagnel J, Raymond M, Bourtzis K, Fort P, Weill M. 2005. Transposable element polymorphism of Wolbachia in the mosquito Culex pipiens: evidence of genetic diversity, superinfection and recombination. Mol Ecol. 14:1561–1573. Eickbush TH, Malik HS. 2002. Origins and evolution of retrotransposons. In: Craig NL, Craigie R, Gellert M, Lambowitz AM, editors. Mobile DNA 2. Washington (DC): ASM Press. p. 1111–1144. Evgen’ev MB, Arkhipova IR. 2005. Penelope-like elements—a new class of retroelements: distribution, function and possible evolutionary significance. Cytogenet Genome Res. 110:510–521. Feschotte C. 2004. Merlin, a new superfamily of DNA transposons identified in diverse animal genomes and related to bacterial IS1016 insertion sequences. Mol Biol Evol. 21:1769–1780.

Funding

Feschotte C, Pritham EJ. 2007a. DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 41:331–368.

National Institute of Health, National Institute of Allergy, and Infectious Disease (NIH 5R01AI068908-02).

Feschotte C, Pritham EJ. 2005. Non-mammalian c-integrases are encoded by giant transposable elements. Trends Genet. 21:551–552.

Acknowledgments I thank Michael Lynch for the invitation to participate in the 2007 American Genetics Association Symposium: Mechanisms of Genome Evolution, and Cedric Feschotte for critical review and comments on the manuscript.

References Arkhipova I, Meselson M. 2000. Transposable elements in sexual and ancient asexual taxa. Proc Natl Acad Sci USA. 97:14473–14477. Arkhipova IR, Meselson M. 2005. Diverse DNA transposons in rotifers of the class Bdelloidea. Proc Natl Acad Sci USA. 102:11781–11786. Arkhipova IR, Morrison HG. 2001. Three retrotransposon families in the genome of Giardia lamblia: two telomeric, one dead. Proc Natl Acad Sci USA. 98:14497–14502. Burke WD, Calalang CC, Eickbush TH. 1987. The site-specific ribosomal insertion element type II of Bombyx mori (R2Bm) contains the coding sequence for a reverse transcriptase-like enzyme. Mol Cell Biol. 7:2221–2230. Cantrell MA, Scott L, Brown CJ, Martinez AR, Wichman HA. 2008. Loss of LINE-1 activity in the megabats. Genetics. 178:393–404.

654

Feschotte C, Pritham EJ. 2007b. Computational analysis and paleogenomics of interspersed repeats in eukaryotes. In: Stojanovic N, editor. Computational genomics. Norwich (UK): Horizon Scientific Press. p. 31–53. Gao X, Hou Y, Ebina H, Levin HL, Voytas DF. 2008. Chromodomains direct integration of retrotransposons to heterochromatin. Genome Res. 18:359–369. Ghosh S, Frisardi M, Ramirez-Avila L, Descoteaux S, Sturm-Ramirez K, Newton-Sanchez OA, Santos-Preciado JI, Ganguly C, Lohia A, Reed S, et al. 2000. Molecular epidemiology of Entamoeba species evidence of a bottleneck (Demographic sweep) and transcontinental spread of diploid parasites. J Clin Microbiol. 38:3815–3821. Goodwin TJ, Butler MI, Poulter RT. 2003. Cryptons: a group of tyrosinerecombinase-encoding DNA transposons from pathogenic fungi. Microbiology. 149:3099–3109. Goodwin TJ, Poulter RT. 2004. A new group of tyrosine recombinaseencoding retrotransposons. Mol Biol Evol. 21:746–759. Grahn RA, Rinehart TA, Cantrell MA, Wichman HA. 2005. Extinction of LINE-1 activity coincident with a major mammalian radiation in rodents. Cytogenet Genome Res. 110:407–415. Gregory TR. 2005. Synergy between sequence and size in large-scale genomics. Nat Rev Genet. 6:699–708. Hedges SB. 2002. The origin and evolution of model organisms. Nat Rev Genet. 3:838–849.

Pritham  TEs and Genome Evolution Jiang N, Feschotte C, Zhang XY, Wessler SR. 2004. Using rice to understand the origin and amplification of miniature inverted repeat transposable elements (MITEs). Curr Opin Plant Biol. 7:115–119.

Pritham EJ, Feschotte C, Wessler SR. 2005. Unexpected diversity and differential success of DNA transposons in four species of entamoeba protozoans. Mol Biol Evol. 22:1751–1763.

Kapitonov VV, Jurka J. 2001. Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA. 98:8714–8719.

Pritham EJ, Putliwala T, Feschotte C. 2007. Mavericks, a novel class of giant transposable elements widespread in eukaryotes and related to DNA viruses. Gene. 390:3–17.

Kapitonov VV, Jurka J. 2006. Self-synthesizing DNA transposons in eukaryotes. Proc Natl Acad Sci USA. 103:4540–4545. Kapitonov VV, Jurka J. 2007. Helitrons on a roll: eukaryotic rolling-circle transposons. Trends Genet. 23:521–529. Katinka MD, Duprat S, Cornillot E, Metenier G, Thomarat F, Prensier G, Barbe V, Peyretaillade E, Brottier P, Wincker P, et al. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature. 414:450–453. Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Pearlman RE, Roger AJ, Gray MW. 2005. The tree of eukaryotes. Trends Ecol Evol. 20:670–676. Kidwell MG, Lisch DR. 2001. Perspective: transposable elements, parasitic DNA, and genome evolution. Evolution Int J Org Evol. 55:1–24. Kidwell MG, Lisch D. 2002. Transposable elements as sources of genomic variation. In: Craig NL, Craigie R, Gellert M, Lambowitz AM, editors. Moblie DNA II. Washington (DC): American Society for Microbiology Press. p. 59–90. Kim JM, Vanguri S, Boeke JD, Gabriel A, Voytas DF. 1998. Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res. 8:464–478. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. 2001. Initial sequencing and analysis of the human genome. Nature. 409:860–921. Leem YE, Ripmaster TL, Kelly FD, Ebina H, Heincelman ME, Zhang K, Grewal SI, Hoffman CS, Levin HL. 2008. Retrotransposon Tf1 is targeted to Pol II promoters by transcription activators. Mol Cell. 30:98–107. Lesage P, Todeschini AL. 2005. Happy together: the life and times of Ty retrotransposons and their hosts. Cytogenet Genome Res. 110:70–90. Lynch M. 2007. The origins of genome architecture. Sunderland (MA): Sinauer. Lynch M, Conery JS. 2003. The origins of genome complexity. Science. 302:1401–1404. Malik HS, Burke WD, Eickbush TH. 1999. The age and evolution of nonLTR retrotransposable elements. Mol Biol Evol. 16:793–805.

Ray DA, Feschotte C, Pagan HJ, Smith JD, Pritham EJ, Arensburger P, Atkinson PW, Craig NL. 2008. Multiple waves of recent DNA transposon activity in the bat, Myotis lucifugus. Genome Res. 18:717–728. Ray DA, Pagan HJ, Thompson ML, Stevens RD. 2007. Bats with hATs: evidence for recent DNA transposon activity in genus myotis. Mol Biol Evol. 24:632–639. Rinehart TA, Grahn RA, Wichman HA. 2005. SINE extinction preceded LINE extinction in sigmodontine rodents: implications for retrotranspositional dynamics and mechanisms. Cytogenet Genome Res. 110:416–425. Rio DC. 2002. P transposable element in Drosophila melanogaster. In: Craig NLe.a., editor. Mobile DNA II. Washington (DC): ASM. p. 484–518. Sanogo YO, Dobson SL, Bordenstein SR, Novak RJ. 2007. Disruption of the Wolbachia surface protein gene wspB by a transposable element in mosquitoes of the Culex pipiens complex (Diptera, Culicidae). Insect Mol Biol. 16:143–154. Simser JA, Rahman MS, Dreher-Lesnick SM, Azad AF. 2005. A novel and naturally occurring transposon, ISRpe1 in the Rickettsia peacockii genome disrupting the rickA gene involved in actin-based motility. Mol Microbiol. 58:71–79. Teng SC, Wang SX, Gabriel A. 1995. A new non-LTR retrotransposon provides evidence for multiple distinct site-specific elements in Crithidia fasciculata miniexon arrays. Nucleic Acids Res. 23:2929–2936. Ullu E, Tschudi C, Chakraborty T. 2004. RNA interference in protozoan parasites. Cell Microbiol. 6:509–519. Valizadeh P, Crease TJ. 2008. The association between breeding system and transposable element dynamics in Daphnia pulex. J Mol Evol. 66:643–654. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature. 420:520–562. Wessler SR. 2006. Transposable elements and the evolution of eukaryotic genomes. Proc Natl Acad Sci USA. 103:17600–17601.

Malik HS, Eickbush TH. 2000. NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans. Genetics. 154:193–203.

Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, et al. 2007. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 8:973–982.

Malik HS, Henikoff S, Eickbush TH. 2000. Poised for contagion: evolutionary origins of the infectious abilities of invertebrate retroviruses. Genome Res. 10:1307–1318.

Wright SI, Agrawal N, Bureau TE. 2003. Effects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana. Genome Res. 13:1897–1903.

Masui S, Kamoda S, Sasaki T, Ishikawa H. 1999. The first detection of the insertion sequence ISW1 in the intracellular reproductive parasite Wolbachia. Plasmid. 42:13–19.

Wright SI, Schoen DJ. 1999. Transposon dynamics and the breeding system. Genetica. 107:139–148.

Misumi O, Matsuzaki M, Nozaki H, Miyagishima SY, Mori T, Nishida K, Yagisawa F, Yoshida Y, Kuroiwa H, Kuroiwa T. 2005. Cyanidioschyzon merolae genome. A tool for facilitating comparable studies on organelle biogenesis in photosynthetic eukaryotes. Plant Physiol. 137:567–585.

Xiong Y, Eickbush TH. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362. Xu P, Widmer G, Wang Y, Ozaki LS, Alves JM, Serrano MG, Puiu D, Manque P, Akiyoshi D, Mackey AJ, et al. 2004. The genome of Cryptosporidium hominis. Nature. 431:1107–1112.

Pace JK 2nd, Feschotte C. 2007. The evolutionary history of human DNA transposons: evidence for intense activity in the primate lineage. Genome Res. 17:422–432.

Zhang X, Wessler SR. 2004. Genome-wide comparative analysis of the transposable elements in the related species Arabidopsis thaliana and Brassica oleracea. Proc Natl Acad Sci USA. 101:5589–5594.

Poulter RT, Goodwin TJ. 2005. DIRS-1 and the other tyrosine recombinase retrotransposons. Cytogenet Genome Res. 110:575–588.

Zou S, Ke N, Kim JM, Voytas DF. 1996. The Saccharomyces retrotransposon Ty5 integrates preferentially into regions of silent chromatin at the telomeres and mating loci. Genes Dev. 10:634–645.

Pritham EJ, Feschotte C. 2007. Massive amplification of rolling-circle transposons in the lineage of the bat Myotis lucifugus. Proc Natl Acad Sci USA. 104:1895–1900.

Corresponding Editor: Michael Lynch

655