Structural and functional liaisons between ...

1 downloads 0 Views 729KB Size Report
In: Garrido-Ramos MA (ed). Repetitive DNA. Genome Dyn 7. Karger Publishers, Basel, pp 1–28. López-Flores I, de la Herrán R, Garrido-Ramos MA et al (2004).
Chromosome Res DOI 10.1007/s10577-015-9483-7

REVIEW

Structural and functional liaisons between transposable elements and satellite DNAs Nevenka Meštrović & Brankica Mravinac & Martina Pavlek & Tanja Vojvoda-Zeljko & Eva Šatović & Miroslav Plohl

# Springer Science+Business Media Dordrecht 2015

Abstract Transposable elements (TEs) and satellite DNAs (satDNAs) are typically identified as major repetitive DNA components in eukaryotic genomes. TEs are DNA segments able to move throughout a genome while satDNAs are tandemly repeated sequences organized in long arrays. Both classes of repetitive sequences are extremely diverse, and many TEs and satDNAs exist within a genome. Although they differ in structure, genomic organization, mechanisms of spread, and evolutionary dynamics, TEs and satDNAs can share sequence similarity and organizational patterns, thus indicating that complex mutual relationships can determine their evolution, and ultimately define roles they might have on genome architecture and function. Motivated by accumulating data about sequence elements that incorporate features of both TEs and satDNAs, here we present an overview of their structural and functional liaisons. Keywords mobile elements . DNA transposons . retrotransposons . satellite DNA . tandem repeats Abbreviations CR centromeric retrotransposon LTR long terminal repeat Responsible Editor: Maria Assunta Biscotti, Pat Heslop-Harrison and Ettore Olmo N. Meštrović : B. Mravinac : M. Pavlek : T. Vojvoda-Zeljko : E. Šatović : M. Plohl (*) Ruđer Bošković Institute, Bijenička 54, HR-10000 Zagreb, Croatia e-mail: [email protected]

MITE satDNA SINE TE THAP TIR TSD UTR

miniature inverted-repeat transposable element satellite DNA short interspersed element transposable element Thanatos-associated protein terminal inverted repeat target site duplication untranslated regions

General aspects of TEs and satDNAs The most abundant repetitive DNA components in eukaryotes are transposable elements (TEs) and satellite DNAs (satDNAs), highly dynamic sequences whose evolution determines the landscape in every plant, animal, and fungi genome (López-Flores and GarridoRamos 2012). TEs are the major component of interspersed repetitive DNA. Based on structural characteristics and mechanisms of transposition, they are divided into two classes, retrotransposons (class I) and DNA transposons (class II). Retrotransposons utilize RNA-mediated reactions and copy-and-paste mechanism as the basis for their transposition throughout a genome. They can be further subdivided into two groups depending on the presence or absence of Long Terminal Repeat (LTR) sequences on their ends. DNA transposons use DNAmediated processes in their spread and are based on a transposase gene that is mostly flanked by Terminal

N. Meštrović et al.

Inverted Repeats (TIRs). This gene codes for the enzyme that recognizes TIRs and performs transposition to a new genomic location by cut-and-paste mechanism. DNA transposons and retrotransposons can be autonomous or non-autonomous. Autonomous TEs move by using their own transposition machinery, while non-autonomous elements use the machinery of autonomous counterparts. Unique hallmark for the most of them is Target Site Duplication (TSD), a duplication of short segment of genomic DNA sequence at the insertion site. Extensive research in recent years established TEs as genome builders that represent major source of novelties which influence literally every aspect of genome function, organization, and evolution. Briefly, TE activities contribute to the genome size, epigenetic modifications, centromere, and telomere architecture and function. Highly related TE copies distributed throughout a genome promote ectopic recombination and duplications of large DNA segments, affect gene expression and transcript diversification, and provide components for functional sequences, regulatory as well as coding ones. It must be however noted that only a limited part of total number of TEs and their activities is adopted and beneficial for the genome, while the majority of their activities are either neutral or deleterious (reviewed, for example, in Britten 2006; Feschotte and Pritham 2007; Slotkin and Martienssen 2007; Feschotte 2008; Cordaux and Batzer 2010; Bennetzen and Wang 2014; Piacentini et al. 2014). SatDNAs are the most abundant class of tandem repeats usually highly prevalent at and around centromeres, in regions with suppressed recombination (Talbert and Henikoff 2010). They are typically characterized by sequential arrangement of repeat units (monomers) in the form of long arrays, up to several megabases. Monomer length between 150–180 bp and 300–360 bp were observed in many satDNAs and can be considered as evolutionarily favored (HeslopHarrison and Schwarzacher 2013). Although diversity of satDNA families varies rapidly, and many unrelated satDNAs can populate genomes (Meštrović et al. 2009), some satDNAs remain conserved over long evolutionary periods (Mravinac et al. 2002). Monomer sequences of a satDNA evolve in concert as a result of molecular drive, the process in which mutations are homogenized throughout a family of monomers in a genome and are fixed in a population (Dover 1986). Theoretical models predict unequal crossing-over and gene conversion as

the most widespread mechanisms involved in dynamics of satDNA evolution (Stephan 1989). SatDNAs are involved in functional interactions needed for centromere stability and evolution (reviewed in Plohl et al. 2014). Recent studies also suggest contribution of satDNA transcripts to centromere and heterochromatin assembly and maintenance (Gent and Dawe 2012). Dual characteristic of satDNA evolution, which in the same time combines long-term stability of homogeneous arrays and ability for rapid replacements of individual satDNAs through extensive copy-number changes, is important factor of genome resilience (Meštrović et al. 2006). A comprehensive overview of transcription-related functional roles of satDNAs in the genome is presented in the review by Biscotti et al. (2015). Although TEs and satDNAs have mostly been investigated independently, there is a number of reports suggesting links between these two repeat classes in the evolution of many eukaryotic genomes. In this review, we outline the current knowledge regarding accumulating information about structure, evolution, and putative functionality of sequences that clearly show associations between mobile elements and sequences repeated in tandem.

Intermingling of repetitive sequences in heterochromatic regions The heterochromatin certainly represents the most common Bmeeting point^ of TEs and satDNAs, and within this portion of the genome, these two classes of repetitive sequences are especially plentiful at (peri)centromeric regions in a wide range of species (Wong and Choo 2004; Heslop-Harrison and Schwarzacher 2011). Intensive intermingling of satDNA and TEs is a widespread characteristic of plant centromeres (Jiang et al. 2003). In particular, there is a Ty3/gypsy group of centromere-specific LTR retrotransposons, the CR family, whose counterparts have been detected in the detailedly studied centromeres of the grass species: CRR in rice (Cheng et al. 2002), CRM in maize (Zhong et al. 2002), cereba in barley (Houben et al. 2007), and CRW in wheat (Liu et al. 2008). The CR elements intermingle with centromere-specific satDNAs, and also interact with the centromere protein CenH3, indicating their active participation in grass centromeres’ function. Moreover, the more recent

Liaisons between TEs and satDNAs

reports on maize and wheat centromeres imply that CR elements might overpower the satellite sequences and take a leading role in kinetochore formation (Wolfgruber et al. 2009; Li et al. 2013). In Arabidopsis thaliana, it has been shown that the centromeric LTR retrotrasposon Tal1 from the congeneric species Arabidopsis lyrata, when introduced into A. thaliana by transformation, preferentially targets the centromeric satDNA repeats of A. thaliana, in spite of significant difference between A. lyrata and A. thaliana centromeric satellites (Tsukahara et al. 2012). One of the best studied cases of TEs invading satDNA regions are the mammalian non-LTR L1 (LINE-1) elements that, being present in >500,000 copies, constitute 21 % of the human genome (Lander et al. 2001). It was evidenced that L1s show target preference for 5′TTAAAA-3′ sites of insertion (Feng et al. 1996; Jurka 1997), which could explain their extensive presence within (peri)centromeric (A+T)-rich alpha satDNA arrays. The occurrence of L1s within the alpha satDNA was effectively used to date/reconstruct the evolution of human X chromosome (peri)centromere (Schueler et al. 2001). It was revealed that the human-specific L1Hs subfamily (the youngest one) is present only in the homogenized DXZ1 alpha satDNA fraction related to the active centromere, while the older primate-specific L1P elements are enriched in the flanking centromereincompetent alpha satellite arrays (Schueler et al. 2001, 2005). Aside from being passively associated with alpha satellite evolution, it is more probable that L1 retroelements play an active role in the formation and function of centromeric chromatin, as it has been evidenced that the functional human 10q25 neocentromere is also enriched for transcriptionally active L1 sequences (Chueh et al. 2005, 2009). In contrast to humans, the Drosophila melanogaster centromere, modeled by a 420 kb region of a stable Dp1187 minichromosome, is based on different, mostly simple 5-bp-long AATAT and AAGAG satellite repeats, that are interspersed with transposon fragments as well as A+T-rich DNA (Sun et al. 2003). Interestingly, the copies of intact transposons have been found at different locations in the AATAT satDNA arrays. Human (Feng et al. 1996; Jurka 1997) and Drosophila (Sun et al. 2003) studies suggest that TEs might accumulate in (peri)centromeric regions due to their preference to integrate into A+T-rich segments, generally abundant in (peri)centric heterochromatin. For plant CRM retrotransposon elements, it has been

proposed that targeting specificity for centromere regions is most likely provided by a putative targeting domain of their integrase (Neumann et al. 2011). It could be speculated that satDNAs and TEs generally reside in heterochromatic regions because the gene-poor chromosomal domains might allow their propagation without hazardous effects on genome stability. On the other hand, targeted integration, association with epigenetic centromeric components, and transcription activity of these sequences point to their active participation in centromere structure and function (Neumann et al. 2011; Li et al. 2013; Quenet and Dalal 2014). Regardless the grounds for colocalization, the juxtaposition of satellite repeats and mobile elements in the same genomic regions undoubtedly indicates their interrelationships, evident here in concurrent amplification and formation of new sequences that integrate and/or combine the building units of both types of repetitive elements.

Structural relationships between mobile elements and satellite DNAs Satellite DNAs derived from mobile elements Sequence homologies between satDNAs and transposons/retrotransposons identified in several species raise the possibility that satellite repeats can be derived from mobile elements. One of the first findings of TE-derived satDNA was pvB370 satDNA with high sequence similarity to LTR of pDv mobile element conserved in eight species of the Drosophila virilis group (Heikkinen et al. 1995). Recent genome-wide survey of TE participation to the formation of tandem repeats in human genome identified one quarter of all minisatellites/satellites derived from TEs (Ahmed and Liang 2012). Generally, TE-derived satDNAs are based on structural parts of mobile elements simply amplified in a form of tandem arrays (Fig. 1, Table 1). Although studies in different animal and plant species show a significant diversity of TE-derived satDNAs, some common trends can be observed. One is that monomers with length above the standard size (>500 bp) are frequently derived from LTRs and Untranslated Regions (UTRs) of retrotransposons. These elements mostly have (peri)centromeric localization, but in contrast to Bclassical^ satDNAs, those derived from retrotranposons usually occupy a single (peri)centromeric locus, indicating limited potential to

N. Meštrović et al. Fig. 1 Segments of TEs amplified as satDNA repeats. Possible sources of satDNA monomers are mapped on schematic diagrams of LTRretrotransposon, LINE and SINE elements, and on a DNA transposon according to the selected most elaborated examples described in the literature. White segments represent genomic sequences unrelated to TEs

spread on non-homologous chromosomes. In support, two maize chromosome-specific pericentromeric satDNAs with long monomers are entirely derived from LTR and UTR of centromeric retrotransposons (CR) belonging to two different subfamilies, CRM1 and CRM4 (Sharma et al. 2013). Although these satDNAs originated from two separate events, they are derived from similar regions of their parent TE. In addition, the Sobo satDNA with the monomer size of 4.7 kb in pericentromeric region of the potato chromosome 7 shares significant sequence similarity with LTRs of a Sore1 retrotransposon (Tek et al. 2005). It was suggested that Sobo monomer sequence was created by recombination-based excision of a genomic region between two LTRs. Similarly, in another potato genome with repeat-based and repeatless centromeres, all three satDNAs with extremely long monomers (≥3 kb) are unique to individual centromere (Gong et al. 2012). They consist of a truncated

gag-coding domain surrounded by two LTRs of a non-autonomous LTR retrotransposons classified as Ty3/gypsy elements. In addition, there are other cases where satDNAs arose from amplification of parts of LTR-retrotransposon genes. For example, the 3984bp-long monomer of rye B-chromosome satDNA contains 530-bp fragment that shares high similarity to the gag coding region of the crwydryn retrotransposon (Langdon et al. 2000). In wheat Aegilops speltoides and its relatives, a centromeric repeat with 250-bp monomer shows 53 % amino acid sequence similarity to the gag gene of the CR retrotransposon Cereba from Hordeum vulgare (Cheng and Murata 2003). In contrast to other TEderived satDNAs, this satDNA is located on all centromeres of the complement, probably due to the higher potential for the spread because of monomers similar to satDNA-typical length (Heslop-Harrison and Schwarzacher 2013).

SatDNA

DNA transposons

non - LTR retrotransposons

250–300

200

Drosophila melanogaster Gallus gallus domesticus Hydromantes (European salamanders) Xenopus laevis

18HT repeat

Cen2, Cen3, Cen4, Cen7, and Cen11 Hy/Pol III

Arabidopsis thaliana Secale cereale Drosophila guanche

Ensat1 Ensat2

D1100

SGM

Xstir

760–3000

Gallus gallus domesticus

HinfI satDNA

512–760

1100

1850 974

86

356

770

Xenopus laevis

satellite 1

750

3984

Aegilops speltoides

250 element

1180–5390

St3-58, St3-238, St18 Solanum tuberosum and St3-294 E3900 Secale cereale

4700

Solanum bulbocastanum

Sobo

819 and 696

Pericentromere specific (ch 7)

Centromere specific (ch 6)

Centromere specific (ch 9)

Location

Tnr1 (MITE; originally identified in rice) SGM-IS (from D. subobscura and D. madeirensis)

Atenspm2 Atenspm2 and Arnold 1

Xmix MITE

SINE-like

TART and HeT-A; telomeric retrotransposons CR1-C

CR1

Agudo et al. 1999

Li and Leung 2006

Pasero et al. 1993

Cheng and Murata 2003

Langdon et al. 2000

Gong et al. 2012

Tek et al. 2005; Gong et al. 2012

Sharma et al. 2013

Reference

Langdon et al. 2000 Miller et al. 2000

(mostly on) centromeric heterochromatin

Kapitonov and Jurka 1999; Lippman et al. 2004 ch B specific

Centromere specific (ch 4)

Dispersed on all chromosomes Hikosaka and Kawahara 2004

Centromere specific (ch 2, 3, 4, Shang et al. 2010 7 and 11) Dispersed on all chromosomes Batistoni et al. 1995

Centromere specific (ch Y)

Pericentromere specific (ch 4)

cereba (from Hordeum vulgare); class all pericentromeres of Ty3/gypsy-like retrotransposons SINE-like dispersed

Different elements of Chromovirus Centromere specific (ch 2, clade; Ty3/gypsy retrotransposons 8, 9 and 3/9*) crwydryn ch B specific

CRM1; class of Ty3/gypsy retrotransposons CRM4; class of Ty3/gypsy retrotransposons Sore1

Monomer length (bp) TE description

1386

Zea mays

Species

CRM4TR

LTR retrotransposons CRM1TR

TE type

Table 1 Representatives of satDNAs derived from TEs

Liaisons between TEs and satDNAs

N. Meštrović et al.

Among animals, HinfI satDNA detected in the pericentromeric regions of the chicken chromosome 4 includes 307 bp long part of non-LTR retrotransposon CR1 composed of a partial coding region and the complete 3′-UTR (Li and Leung 2006). In molluscs, NmE5 satDNA monomer is structured from NmE1 monomer variant by insertion of a part of LTR-like retrotransposon (Biscotti et al. 2008). In some cases, new satDNAs are created from a whole retrotransposon, a combination of different truncated mobile elements, or from a mobile element and other genomic sequences. In Xenopus laevis genome, for example, satellite 1 DNA was initially a SINE that was amplified in tandem more than 30,000 times on different locations in the genome (Pasero et al. 1993). The most prominent example is potato St3-294 satDNA with monomer longer than 5.4 kb, which is derived from the whole non-autonomous LTR retrotransposon located subtelomerically on ch9 (Gong et al. 2012). The centromeric region of the D. melanogaster Y chromosome contains a tandemly repeated unit created from telomeric non-LTR retrotransposons TART and HeT-A; a part of a TART element is combined with three consecutive truncated HeT-A elements (Agudo et al. 1999). Monomers of five centromere-specific satDNAs from the chicken genome are derived from unknown genomic sequences and retrotransposon-related parts. In contrast to different monomer length and nucleotide sequences, all of them share the same parts of ORF2 and 3′-UTR of CR1-C non-LTR retrotransposon, thus suggesting the origin from one progenitor sequence (Shang et al. 2010). Hy/Pol III satDNA from European salamanders, with arrays dispersed along chromosomes, is a composite element made by insertion of a SINE into some ancient retrotransposed Pol III RNA sequence (Batistoni et al. 1995). Besides retrotransposons, DNA transposons can also be a template for satDNA formation, particularly their TIR sequences. For example, MITE-like Xstir satDNA of X. laevis is composed of internal regions of TIRs of one MITE (Hikosaka and Kawahara 2004). In A. thaliana, TIRs of DNA transposon Atenspm2 were the basis for formation of two satellites: Ensat1 and Ensat2 (Kapitonov and Jurka 1999). Complex monomer of Ensat1 shares similarity with 499-pb-long part of Atenspm2 transposon, whereas another part of this transposon, 151 bp long, makes a part of the Ensat2 monomer. The remaining segment of the Ensat2 monomer is 85 % identical to the internal portion of another

transposon, Arnold1. Parts of mobile elements have also been identified as components of specific satDNA families distributed in closely related species. For example, D1100 satDNA from rye shows similarity with TIR of Tnr1 MITE, originally identified in rice (Langdon et al. 2000). A SGM satellite of Drosophila guanche, which builds 10 % of the genome, has been probably derived from a common ancestor of SGM transposon present in Drosophila subobscura and Drosophila madeirensis (Miller et al. 2000). Interestingly, in two coral genomes, there is a 110-bp-long MITE-like element that was probably derived from a piggyBac-like transposon which forms high copy tandem repeats, a feature quite unusual for MITEs (Wang et al. 2010). The example of the MITE-like element with satDNA attributes shows that strict distinctions between two classes of elements vanish, and that some forms of TEs and satDNAs may be simply considered as a unique sequence type. Tandem repeats as structural components of mobile elements Tandem repeats have also been found as integral components of TEs. According to available literature data, it seems that tandem repetitions are more frequent in DNA transposons than in retroelements (Fig. 2a, Table 2). This feature was particularly observed in the family of DINE-1 elements, abundant components widespread in drosophilid genomes (Yang and Barbash 2008). DINE1 elements are a group of short non-autonomous modularly-structured DNA transposons. They were initially classified as a group of MITEs, but since they share features compatible with the rolling-circle replication as the mechanism of distribution, they were recently reclassified as Helentrons (Thomas et al. 2014). The common structural feature of many DINE-1 elements initially described in drosophilids is a central region with internal tandem repeats. Repeats are variable in DNA sequence, copy number (2–10), and monomer length (50–500 bp) when compared among species (Yang and Barbash 2008). According to classification in Thomas et al. (2014), elements called PERI from seven species of the Drosophila buzzatii cluster (Kuhn and Heslop-Harrison 2011), MINE-1 and MINE-2 from Lepidoptera (Coates et al. 2010, 2011), DTC84 from the clam Donax trunculus (Šatović and Plohl 2013), pearl from the oyster (Gaffney et al. 2003), MgE from Mytilus galloprovincialis (Kourtidis et al. 2006a), and Tsp from

Liaisons between TEs and satDNAs Fig. 2 a Schematic presentation of DNA transposons and LTRretrotransposons with internal tandem repeats that have been found amplified into long arrays of satDNAs. b A model of possible correlation between number of central tandem repeats in a DNA transposon with transposition and recombination rates. The presented scheme is a compilation of the model of Scalvenzi and Pollet (2014), and information given in Marzo et al. (2013) and Dias et al. (2014)

the sea urchin (Cohen et al. 1985) are all DINE-1 like elements. Recently, two different MITE elements (terMITE1 and terMITE2) were described in termites, both also containing variable number of internal tandem repeats which are 16 and 114-bp long, respectively (Luchetti 2015). In plant genomes, element-specific central tandem repeats (60 or 240 bp) were found in two non–autonomous elements, Tnat1 and Tnat2, specific for A. thaliana (Noma and Ohtsubo 2000) while short tandem repeats (