copia - CiteSeerX

24 downloads 0 Views 1MB Size Report
f i r. BARE-1. Fig. 3. DOTPLOTs of hypothetical BARE-1 polyprotein (x-axis) to: A, copia polyprotein; B, Tnt-1 translated open ..... Finnegan DJ: Eukaryotic transposable elements and ge- .... Smyth DR, Kalitsis P, Joseph JL, Sentry JW: Plant re-.
Plant Molecular Biology 22: 829-846, 1993. © 1993 Kluwer Academic Publishers. Printed in Belgium.

829

BARE-l, a copia-like retroelement in barley (Hordeum vulgare L.) Inari Manninen and Alan H. Schulman* Institute of Biotechnology, University of Helsinki, P.O. Box 45 (Karvaamokuja 3A), SF-O0014 Helsinki, Finland (* author for correspondence) Received 8 December 1992; accepted in revised form 15 April 1993

Key words: transposon, barley, Hordeum vulgare, plant genes, retrovirus

Abstract

Retroviruses and retrotransposons make up the broad class of retroelements replicating and transposing via reverse transcriptase. Retroelements have recently been found to be ubiquitous in the plants. We report here the isolation, sequence and analysis of a retroelement from barley (Hordeum vulgare L.) with all the features of a copia-like retrotransposon. This is named BARE-1 (for BArley RetroElement 1), the first such element described for barley. BARE-1 is 12 088 bp, with long terminal repeats (LTRs) of 1829 bp containing perfect 6 bp inverted repeats at their ends and flanked by 4 bp direct repeats in the host DNA. Between the long terminal repeats is an internal domain with a derived amino acid sequence of 1285 residues, bearing homology to the gag, pro, int and rt domains of retroviruses and both plant and non-plant copia-like retrotransposons. Cultivated barley contains about 5000 elements in the genome similar to the BARE-I putative gag domain, but ten-fold more hybridizing to rt or LTR probes. The particular BARE-1 element reported here appears to be inactive, as the putative protein-coding domain is interrupted by four stop codons and a frameshift. In addition, the 3' LTR is 4~o divergent from the 5' LTR and contains a 3135 bp insertion. Nevertheless, we have recently detected transcripts hybridizing to BARE-1 on northern blots, presumably from active copies. Analysis of BARE-I expression and function in barley is currently underway.

Introduction

Retrotransposons, replicating through an RNA intermediate, form a major class of eukaryotic transposons [ 16, 77]. Retrovirus-like retrotransposons contain long terminal direct repeats (LTRs) and internal domains with open reading frames (ORFs) bearing homology to retroviral

gene products [6, 13, 84]. Retrotransposons may be present in plants and animals in retroelement superfamilies of high copy number, with individual members containing a complete copy of the canonical element to varying degrees, and so constitute an appreciable fraction of eukaryotic genomes [16, 38, 73]. In the plants, at least seven retroelements have been well characterized to

The nucleotide sequence data reported will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession number Z 17327.

830 date [22], although only the maize Bsl [35] and the tobacco Tntl [23, 58, 59] have been observed to actively transpose. When compared to the functional domains of retroviral polyproteins, the ORFs of Tntl and the Arabidopsis Tal [38] are arranged like that of copia [55], an organization which thus transcends both species and kingdoms. Surveys of plant genomes by PCR (polymerase chain reaction) amplification of conserved retroelement domains suggest that such structures are both diverse and ubiquitous [ 17, 31 ]. Recent reports have indicated the presence ofretroelements in barley [ 17, 53, 54, 80]. In our search for transposable elements in barley, we have isolated a retroelement with all the identifiable regions of a copia-like retrotransposon. We name this BARE-l, for BArley RetroElement 1), the first complete element of its kind described for Hordeum vulgare. We here present the organization and sequence of a complete BARE-1 element and describe its features in comparison to other known retroelements.

Materials and methods

Genomic library screening A Hordeum vulgare L. cv. NK 1558 (Northrup King) genomic library in 2EMBL3 (Sal I site), prepared by Clontech Laboratories, was screened against LTR-specific oligonucleotide and cloned probes. In each screening, one million plaques were plated on the host Escherichia coli NM538, blotted onto GeneScreen Plus membranes (Du Pont) and hybridized. For RNA probes (Riboprobe Gemini System, Promega) and cloned DNA probes, prehybridizations (from 6 h to overnight) and hybridizations (overnight) were performed at 42 °C in 50~o formamide, 5X SSPE (0.9 M NaC1, 50 mM NaH2PO4 pH 7.4, 5 mM EDTA), 1 ~/o SDS, 10~o dextran sulphate, 50 mM Tris-HCl pH 7.5, 5 x Denhardt's solution (0.1 ~o (w/v) FicoU, 0.1~o (w/v) polyvinylpyrrolidone, 0.1 ~o BSA), and 100/~g/ml herring sperm DNA. 32p-labelled probe was added to the hybridization

solution at 10 4 cpm/ml. The membranes were washed twice for 5 min at 25 °C in 2 x SSC (300 mM NaC1, 30 mM sodium citrate pH 7.0), twice for 30min at 60 or 65 ° C i n 2 x SSC, 1~o SDS, and twice for 30 min at 25 °C in 0.1 × SSC. Oligonucleotide probes were phosphorylated with [7-32p]ATP [68]. Membranes were prehybridized (6 h) and hybridized (overnight) at 55 °C in 1~o SDS, 1 M NaC1, 10% dextran sulphate, 0.05 ~o sodium pyrophosphate, 50 mM Tris-HC1 pH 7.5, 100 #g/ml herring sperm DNA. Membranes were then washed with 6 × SSC, 0.05~0 sodium pyrophosphate three times for 15 min at 25 °C and once for 15 min at 55 °C.

Subcloning Lambda DNA from selected clones was prepared from liquid lysates, digested with Sal I and run on a 0.7~o agarose gel in 1 x TAE (40 mM Trisacetate pH 7.8, 2.0 mM EDTA). Gel slices containing the selected DNA fragments were melted at 70 ° C, extracted with an equal volume of phenol, and centrifuged for 5 min after freezing at -80 ° C. The supernatant was extracted with 1:1 phenol/chloroform, followed by chloroform, and DNA precipitated with ethanol. The Sal I fragments were ligated to Sal I sites in pGem-3Zf+ (Promega) or pBluescript II SK + vectors (Stratagene). Segments of larger Sal I fragments were then further subcloned. All DNA modifications and cloning steps were made according to standard methods [46, 68]. Overlapping deletions were made with exonuclease III after digestion with mung bean nuclease.

Sequencing Deletion and restriction subclones were sequenced using SP6, T3 or T7 primers (Promega), and occasional gaps bridged with specific primers. Sequencing was made by the dideoxy method [67] using the Sequenase 2.0 kit (United States Biochemical). Taq polymerase (Taq Rack Sequencing Systems, Promega), dITP, or 7-deaza

831 dGTP with Sequenase 2.0 were used to eliminate compressions.

Sequence analysis

Sequence data were analysed with the GCG Sequence Analysis Software Package Version 7.0 [ 12]. Data library searches were conducted with the FASTA, TFASTA and WORDSEARCH programs of the GCG package and with the BLAST program [1], against continuously updated versions of the EMBL, GenBank and SWISSPROT data libraries. Consensus eukaryotic promotor motifs [7] were identified with the aid of the program EUKPROM of the PC/GENE software (IntelliGenetics). Alignments were made with the GCG programs BESTFIT and PILEUP and adjusted by hand. PILEUP is based on the progressive method of Feng and Doolittle [ 15], and finds the best global alignment for multiple sequences, whereas BESTFIT optimizes local alignments between two sequences. Amino acids were grouped by their chemical similarity, taking into account previous classifications [2, 25, 40]. Sets of amino acids considered similar were (one-letter code): {I,L,M,V,A}; {S,T,P}; {R,K,H}; {W,Y,F}; {D,E}; {N,Q}; {G}; {C}. Alignments of the hypothetical translation of BARE-1 to other retroelements were aided by similar alignments reported previously [2, 23, 28, 35, 36, 47, 57, 61, 73, 78]. Translations were made from the following data library accessions and used in the alignments and sequence comparisons: FeLV (feline leukaemia virus, F6 ~), M18247; HIV-I (human immunodeficiency virus Type I, HXB2), K03455; RTBV (rice tungro bacilliform virus), M65026; CoYMV (commelina yellow mottle virus), X52938; CaMV (cauliflower mosaic virus), J02046; Tntl (Nicotiana tabacum retrotransposon), X13777; Ty912 (Saccharomyces cerevisiae retrotransposon), M 11351; del (Lilium henryi retrotransposon), X13886; copia (Drosophila melanogaster retrotransposon), X02559; BAP (Hordeum vulgate aspartic proteinase), X56136; barl21, bar29, and bar30 [17].

Plant DNA preparation

DNA was extracted from dark-grown shoots as described previously [11], with the following modifications. Nucleic acids from 1 g tissue was redissolved in 200/~1 of 50 mM Tris-HC1 pH 8.0, 10 mM EDTA. RNA was precipitated with an equal volume of 5 M LiC1 for 15 min on ice, then pelleted in a microcentrifuge for 10 min. DNA was precipitated from the supernatant with 2 volumes ethanol at -20 °C for at least 60 min, pelleted and resuspended in 700 #1 of TE (10 mM Tris-HCl pH 8.0, 1 mM EDTA). The DNA was re-precipitated with sodium acetate and isopropanol (5 min at 25 °C), pelleted, washed with 70~o ethanol, dried and resuspended in 100 /~1 TE.

DNA blotting and hybridization

DNA was digested in the appropriate buffers supplied by the restriction enzyme manufacturers, separated on a 0.8 or 1 ~o agarose gel (10-15/tg per slot) in TBE buffer (100 mM Tris pH 8.3, 100 mM H3BO3, 20 mM EDTA), and blotted onto Hybond-N membranes (Amersham) according to the manufacturer's instructions. Prehybridization, hybridization and washings were made as suggested by the manufacturer. Probes: LTR (cRNA, 32p-labelled), a 1356 bp Bsm I - S n a BI BARE-1 subclone (bases 685-2041); ORF (DNA, random-primed, 32p-labelled), the 1930 bp Sal I - Eco RI fragment of BARE-1 (bases 3784-5714); flanking sequence (DNA, randomprimed, 32p-labelled), an 1100 bp deletion fragment located -785 to -1895 upstream from the beginning of the 5' LTR in the 224a clone, generated by exonuclease deletion of a larger Sal I fragment. Hybridized blots were washed as follows: 2× SSC, 0.1~o SDS, twice for 10 min at 25 °C; 1 × SSC, 0.1~o SDS, 15 min at 65 °C; 0.1 × SSC, 0.1~o SDS, twicefor 10minat65 °C. Slot blots were made with a PR600 SlotBlot apparatus (Hoeffer) onto Hybond-N membranes. The blots were probed with an entire 5' LTR subcloned into the pSL1180 vector (Pharmacia),

832 excised from the plasmid, then gel-purified, and digoxigenin-labelled according to the manufacturer's instructions with the Genius Nonradioactive DNA Labeling and Detection Kit (Boehringer). Unhybridized probe was removed with the following washes: 2 x S SC, 0.1% SDS, twice for 5 min at 25 °C; 0.1 x SSC, 0.1% SDS, twice for 15 min at 65 °C. The probe was then detected immunologically according to the kit manufacturer's instructions. The slot blots for the RT (reverse transcriptase) region were probed with a fragment (bases 5714-7240) labelled as for the LTR probe. The putative gag region was probed with a 727 bp Acc I-Bst XI fragment, consisting of bases 3471-4198 of BARE-1.

Results

Isolation of BARE-1 clones

Sequence analysis of a clone, isolated from the barley cv. NK 1558 genomic library for other purposes, revealed a 224 bp region more than 80~o identical to the 5' end of the 5' LTR of the WIS-2 retroelement [29, 54]. The library was then rescreened with the oligonucleotide 5'-GGATAGTCAGGGTCTTCTGG-3', from which 11 hybridizing plaques were purified and 4 subcloned. One clone (224a) contained an element with two LTRs bordering an internal domain having a derived amino acid sequence homologous to retroviral and retrotransposon gene products. This element, 12 088 bp, was named BARE-1 and is described below. A computergenerated restriction map for BARE-1 is presented in Fig. 1, together with a table of homologies to the sequences compared in detail here. The complete sequence in both directions was determined for 13271 bp containing the element, of the ca. 20 kb insertion in the initial 2EMBL3 clone. L TR organization

The basic BARE-1 LTR consists of 1829 bp (5' LTR, bases 309-2137; 3' LTR, bases 7412-

12397). The LTR is 3-6 times longer than those of many retroelements, but only 74 bp longer than those of WIS 2-1A [45]. The BARE-1 5' LTR ends contain a perfect 6 bp inverted repeat (IR), which begins with the 5' TG 3' canonical for retroviruses and retroviral-like retrotransposons. The 3' IR in the LTR is imperfect by 1 bp. The IR of BARE-1 is completely homologous to that of Arabidopsis Tal [78], Drosophila copia [14], yeast Ty912 [63], wheat WIS-2 [54], and Tntl [23] retrotransposons, and identical to the first 5 of the 9 bp IR of pea PDRI [43]. Whereas the retroviruses bear the sequence 5' AA 3' at the host/LTR junction [77], this is more variable in plant retroelements, with BARE-l, PDR-1, and WIS 2 all having a C at the junction but Tal having the retroviral dinucleotide. The BARE-1 LTR and internal region is not especially AT-rich, being only 51.8~o AT (53.0% for the whole element) compared with 62.5 ~o (58.7 ~o for the whole element) for Tntl [59] or 64.2~o for the whole del element in lily [73]. With the exception of the wheat retroelement WIS 2 LTR to which it is 74 ~o homologous (with higher homology in interspersed blocks), the BARE-1 LTR shows no similarity to the LTRs of any other known retroelement. However, database searches with 5' LTR sequence revealed that fragments of BARE-l-like retroelements have been inadvertently cloned from the barley genome previously, but not noted. A region upstream of the transcriptional start in a genomic clone for acyl carrier protein ACP III (bases 606-1, accession M58754, non-coding strand) is 72~o homologous to the BARE-1 LTRs (bases 1107-1739 and 11345-11986). This clone was derived from a genomic library of cv. Bonus of SvalrfAB [27], distinct from cv. NK1558 from which BARE-1 was obtained. In addition, a lambda genomic clone for nitrate reductase (accession X57845, bases 1-1083) from barley cv. Himalaya [69] is 89~o homologous to the BARE-1 LTRs (bases 1072-2139, 5' LTR). This homology extends from the border of that lambda insertion to the 3' end of the LTR, and suggests that it is either a solitary LTR or the 3' terminus of a complete BARE-l-like element. The homol-

833

\

ELEMENT

m73

X64100

ACP II NR WIS.2 Tnt-1 del

M58754 X57845 X13777 X13886

co~

xo25~

•ry912 FeI.V HIV.I RTBV CoYMV CaMV BAP bet29 hal'.J0 bar121 barby/~R

Ml1351 M18247 K03455 M65026 X52938 J02046 X56136

79

76 72 89 74 37 40 43 40 41 38

73

25/46 18/39 20/43 19/46 18/43 17/44 21/45 22/48 25/45

M94470

23/48 16/41 17/43 14/33 22/41 17/41 19/38 20/51 20/'44 14/40

40/57 22/45 35/57 28/52 23/46 18/47

42/62 17/43 35/57 25/51 16/44 16/41 19/44 18/45 17/43

42/63 33/58 24/48 17/38

72 89 74 37 40 43 40 41 38

21/43 13/39

56/72 39/6O 3O/55 89/97

Fig. 1. Map of BARE-l-containing region of clone 224a, and summary table of homologies to corresponding retrovirus regions. Selected restriction sites shown; most sites predicted but not tested. Restriction site numbering corresponds to accession Z17327. White, untranscribed; shaded, hypothetical transcript; darker shading, putative protein coding region (transcriptional direction marked with arrow); hatched, similar to rye R173 flanking region. GAG, putative gag domain; AP, proteinase; ED, endonuclease (integrase); RT, reverse transcriptase; RH, RNaseH. Features labelled: a and b, potential transcription start sites; c, 3' LTR insertion; d, poly(A) addition signal. Numbers within the coding region refer to predicted amino acids in each domain. Numbers in the table are % identity for DNA alignments, and To identity/To similarity for protein alignments.

ogous region near the ACP III gene, however, covers only the middle portion of an LTR and is fairly divergent; it may be a nonfunctional, solitary LTR which has accumulated random mutations. The LTRs of functional retrotransposons contain the promotor motifs necessary for initiation of RNA synthesis by RNA polymerase II, as well as the 5' and 3' RNA processing signals and part of the structure necessary for reverse transcription. It is divided into three contiguous regions, organized 5' U 3 - R - U 5 3' [76, 77], with U3 containing the TATA and CCAAT boxes as well as the cap and polyadenylation signals. For this discussion, only the positions of the signals in the BARE-1 5' LTR will be defined. Of the eight

putative TATA boxes in the B A R E - I LTR, one identical to the plant consensus [37] and similar to the experimentally determined TATA boxes of Tntl [58] is present at base 1656 with an associated putative cap signal at base 1689 but no potential CCAAT site downstream. A second putative TATA box is located at base 1292, with a cap site at base 1333 and possible polyadenylation signal at base 1485 (for function in the 3' LTR). This signal, GATAA, differs from the plant consensus AATAAA [52] but is similar to the G A T A A A for barley aspartic proteinase [66]. The Tntl element, however, has no consensus polyadenylation signal although it can be highly transcribed [58, 59]. The largest internal direct repeat found in the BARE-1 LTR was of 10 bp,

834 repeated once (Fig. 2). Palindromes of 10 bp and 12 bp, and 10 bp and 11 bp inverted repeats each repeated once, were the largest internal IR sequences. The 10 bp IR repeats are in fact located at the two putative TATA boxes just discussed. The R domain is defined as the repeat present at both ends of the retroelement transcript [76, 77], generated by its location in between the start of transcription in the 5' LTR and the site of polyadenylation in the 3' LTR. On that basis, a potential R domain might start at base 1335 in the LTR and extend to around base 1508, being fairly long compared to those of retroviruses [76, 77]. The U5 region would then extend from the 3' end of R to the 3' end of the element, a length of ca. 629 bp, also quite long by retroviral standards. As a consequence of the reverse transcription mechanism of retrotransposons during their transposition, the 5' and 3' LTRs are regenerated. The 3' LTR of clone 224a reported here contains an insertion of 3135 bp into the presumptive U3 region, although the two LTRs reported here are otherwise 96~o identical. This insertion, which has no known homologies, and the other mutations have therefore occurred since the last transposition of this BARE-1 copy. Comparisons to LTRs from two other lambda clones 5'

(-- L T R

2131...ccc~cl~

GC_,C~ GT C~GCUAgG,,~,CaU2160 3'

5' 7391

LTR -')

ACTCTBm~TmmmmmmmSC[TBTCWmmRT...

7420

Fig. 2. Top: putative initiation site on the RNA BARE-I transcript for (-)-strand DNA synthesis, showing complementarity to the proposed methionyl-tRNA primer; numbering as per accession Z17327. Bottom: putative purine-rich initiation site for (+)-strand DNA synthesis on the DNA (-)-strand; purines are shaded.

revealed 9 2 ~ and 94~o identity to the LTR sequence reported here. Other plant retroelements sequenced have revealed 96~o [38] and 100~o (a recently transposed element) identity between the two LTRs.

Insertion site

The BARE-I sequence of 224a is flanked by 4 bp direct repeats, 5' GAAC 3', presumably generated upon its insertion. The insertion site for the two clones for which we have external sequence, 24a and 224a (bases 1-308, accession Z17327), show no similarity. For clone 224a, however, a BLAST database search revealed a highly significant (probability of random occurrence 3.4 x 10- 35) match, with 78.9~o identity, to a region flanking a dispersed repetitive sequence of rye (R173-1, accession X64100). The R173 family is present in 15 000 generally solitary copies dispersed among all rye chromosomes and has a basic repeat unit of 3.5 kb [26, 64]. Although R173 itself did not appear to hybridize to wheat or barley DNA [26], bases 6192-7056 just adjacent to the right repeat of the R173 sequence in clone R 173-1 were reported to have 84 ~o homology to an internal region of the wheat retroelement WIS-2 [65]. As discussed below, BARE-1 has homology to the R173 flanking sequence not only outside the LTRs but also in its internal domain, and WIS 2 most probably is homologous to R173-1 homology in the same place. In the BARE-1 insertion site of clone 224a, homology to clone R173-1 is not contiguous, but split in half: bases 6669-6364 adjacent to the repeat in clone R173-1 (accession X64100, complementary strand) show 78.9~o homology to the 5' flanking region (bases 1-303 in BARE-l, accession Z17327)while bases 6357-6037 (also inverted from the accession) show 72.9~o homology to the 3' flanking region (bases 12404-12717) of the retroelement (summarized, Fig. 1). We have not sequenced far enough out from the BARE-1 insertion in 224a to determine if the entire R173 repetitive element itself is preserved in the flanking regions. These homologies indicate that the

835 homologous sequence in R173-1 was at some point interrupted through the insertion of a BARE-I element or ancestor. Barley is thought to have separated from wheat and rye before the latter two diverged [ 18], and before a dramatic increase in R173 copy number occurred in rye [64]. The recent observation of only about 20 R173 copies in wheat (P. Langridge, personal communication) is consistent with this. The fraction of R173 elements adjacent to the common WIS 2 (and B A R E - l ) region has not yet been reported, and it is not yet clear if any such sequences in rye contain a retroelement insertion. Neither can we say at this point whether this region in barley (or wheat) is interrupted by a retroelement in every case, so the time of the insertion event cannot yet be estimated. Primer binding sites

Reverse transcription of retroelement R N A transcripts has been found always to proceed from a t R N A primer bound at the 5' end of the internal domain, next to the 3' end of the 5' LTR [6, 76, 77]. Adjacent to the 3' side of the BARE-1 LTR is a potential site for (-)strand priming (Fig. 2) having 15 of 19 bases complementary to the 3' end of the wheat initiator methionyl-tRNA [20], which is likely to be virtually identical to that of barley. The WIS 2-1A site [45] is quite similar with 4 nucleotide differences, two increasing and two decreasing the match to the methionyl-tRNA. The (+)-strand D N A synthesis is initiated generally from a purine-rich sequence just 5' to the 3' LTR [6, 77]. A 10 bp polypurine tract is found just inside the BARE-1 3' LTR (Fig. 2). The extent and sequence of both primer regions is consistent with those reported for other plant retroelements [34, 38, 43, 73], and is 5 purines longer but otherwise identical to that of WIS 2-1A [45].

BARE-1 internal domain In retrotransposons, the region between the 5' and 3' LTRs generally bears strong homology to the retroviral gag, pro, int, and rt regions [22, 77].

The group of copia-like retrotransposons, which besides copia includes Tyl and Ty2 of yeast [8, 81], Tntl of tobacco [23] and Tal of Arabidopsis [78], is organized 5' L T R - p r o - i n t - r t - R N a s e H - L T R 3'. The BARE-1 region between the LTRs revealed striking similarity to the tobacco Tntl [23] and Drosophila copia elements (Fig. 3) on the protein level, when the dot plots for the hypothetical translations of the barley element's foward reading frames were superimposed. The presence of such a long coextensive homology in BARE-1 identifies it as a copia-like retroelement. Using the Tntl and copia alignments as a guide, the putative protein products of BARE-1 were derived (Fig. 4). The first long O R F begins 1255 bp past the end of the 5' LTR, and extends 384 amino acid residues, being interrupted about midpoint by a stop codon. A frameshift then occurs, but the strong homology to Tntl and copia continues for 917 residues with three more stop codons. The assembled ORFs, coding for a putative polyprotein of 1301 amino acids and a molecular mass of 147 kDa, is followed by 6 stop codons in the 134 bp before the 3' LTR. The putative leader sequence The putative 5' leader sequence, estimated at 2057 bp, is quite long compared to the 60, 300, and 461 bp of the Bsl [35], copia [14], and Tntl [58] transcripts respectively. It includes 18 start codons outside the LTR and 6 within the LTR but downstream of the first putative cap site. However, A U G triplets upstream of the actual start of translation are known not to interfere with retroelement translation either in vivo or in vitro [34], unlike for other genes [39]. A remarkable feature of the 5' leader is the presence, as in the flanking sequences, of homology to a rye R173 flanking region (Fig. 1). In the leader, unlike the flanking sequences discussed previously, the homology is in one continuous block. Bases 29693615 in BARE-I show 76~o homology to bases 6671-6035 in R173-1 (complementary strand, accession X64100) and likewise 71~o homology to the regions outside the LTRs (Fig. 1). The similar divergence of the putative leader from both sequences bordering the LTRs and the R173-1

836

A

B •i I

I

I

i

I

//

/

/ //

/ ,

/

/

/ 1000

_ /

/

1000

/

_

/ /

f

/

./

=m

rO

//

¢O 500

500 _

/

_

i /

/

I'-

.

' . . J

,

/

jr

J

0~ 0

i

i

,

J

500

i

-1

1000

0

,

i

=

,

500

I

i

i

,

i

i

r

1000

BARE-1 Fig. 3. DOTPLOTs of hypothetical BARE-1 polyprotein (x-axis) to: A, copia polyprotein; B, Tnt-1 translated open reading frame. Axes indicate amino acid position of the predicted polyproteins. A match of 12 amino acid residues within a window of 18 produces a dot; movement of the window over the sequences produces a line for long contiguous homology.

flanking sequence itself implies that two R173-1like regions became associated with BARE-1 at different times. Furthermore, the 224a BARE-1 element does not appear to have been inserted into a putative leader sequence of another BARE-1 copy, since homology to the retroelement's internal domain in the flanking regions does not extend beyond the R173-1-like sequences. The report of R 173- l-like homology also in W I S 2 [65] suggests that its presence in retroelements of the tribe Triticeae may be ancient. The gag region The presence of virus-like particles (VLPs), and therefore gag gene products, has not been demonstrated for plant retrotransposons as it has been in yeast and Drosophila [6], so the functions of gag polypeptide products in possible plant retrotransposon incapsidation is therefore unknown. The putative gag region of BARE-1 shows little similarity to its counterpart even in Tntl or copia (Fig. 1). However, a conserved motif,

C y s X 2 C y s X 4 H i s X 4 C y s , has been identified as in-

variant in all replication-competent retroviruses, plant pararetroviruses (DNA-packaging), and copia [9, 61] and has been found as well in the plant retroelements del 1-46 of lily [73 ] and Tntl of tobacco [23]. This domain, part of a larger family of similar nucleic acid-binding proteins [4], has been demonstrated for at least two retroviruses to mediate positioning of the tRNA (-)strand primer and initiation of reverse transcription [60]. Analysis of the 5' end of the BARE-1 putative protein product reveals a single, strikingly well conserved canonical domain of this type (Fig. 5A) beginning at Cys 253, base 4148. Counting Cys 253 as residue 'n', this BARE-1 domain contains the invariant cysteines at n, n + 3, and n + 13, and histidine at n + 8. In addition, it displays the tyrosine at n + 2 common in single-domain mammalian retroviruses, a conserved glycine at n + 7, a tryptophan at n + 9 as in some mammalian and avian retroviruses, and the conserved aspartic

837 10

20

30

40

50

60

70

80

90

GAG 1

~TLNF

NTFLEKAKLK DDGSNFVDWA HNLIKLLLQA GKKDYVLNRA LGDEPPATAD QDVKNAWLTR KEDYSVVQRA VLYGLEPGLQ RRFERHGAYE

101

MFQELKFIFQKNTRIERYET FDKFYACKME ENSSVSEHVL KMAGYSSRLA ELGIELPQEA ITDRILQSLP PSYKGFLLNY NMQGMNKSPG ELFAMLKVVE

201

*ELRKEHQVLMVNKTTSFKR NDKGKKGSSK KSGKPVANPT KKPKAGPKPE TECYYCKGMG HWKRNCPKYL ADKKAAKEKS G ~ D V

301

VFDTGSVAHI CNSKQELRNK RRLAKDEVTM RVGNGSKVDA IAVGTISLQL PSGLVMNLNN CYLVSALSMN IIWILFIARR LLV+FKSENN GCSVSMSNIF

401

YGHAPIVRGFFILNLDSDNT HIHNIETKRV R V N I L

501

DPMSVEARSGYHYFLTFTDD LSRYGYVYLM KHKSETFEKF KQFQSEVENH YNKKIKFLRS DRGGEYLSFE FGAHLRQCGI VSQLTPPGTP QCNGVSERRN

601

RTLLEMVRSMMYITDLPLSF WGYLLKTAAF TLNRAPSKSV EMTPYELWYG NRPKLSFLKV WGYDAYVKKL QPEYLEPKAE KCVFIGYPKE TVGYTFHLKS

701

EGKVFVAKNEAFLEKEFLSR ELSGRKIE

801

EPTNYEEAMMGPDSNKWLEA MKSEIGSMYE NKVWTLEVLP EGCKAIQNKW IFKKKTGADS NVTVYKA*LV AKGFSQVQGI DYDETFSPVA MLKSVRIMLA

901

IAAFFDYEIWQMDVKAAFLN GLLKEELYMM QPEGFVDPKN ANKACKLQGS IYGLVQASRS WNKRFGEVIK AFGFIQVVGE SCIYKKVSGS SVAFLILYVD

1001

DILLIGNGVEFLENIKDYLN KSFSMKDLGE AAYILGIKIY RDRS*RVIGL SQSTYLDKVL KRFKMEQSKK GLLPVLQGTR LSKTQCPATD KDIEHMSTVP

1101

YASAIGSIMYAMLCIRPDVS LAISMAGRFQ SNPGVDHWMA VKNILKYLKR TTEMFLVYGG DKELAVKGYV DASFDTDPDD SKSQTGYVFI LNGGVVSWCS

1201

SKQSVVADSTCEAEYLAASE ATKEGVWMKQ LMTDLGVVSS ALNPITLFCD NMGVIALAKE PQFHKNTIRI KRRFNLIRDY VEEEDVNICK VHMDLNVAPA

1301

D*[]

AP YLTSSRSSAW

ED WHCRLGHIGV KRMKKLHTDG LLESLDTCEP CLMGKMTKTP FSGTMERASD LLEIIHTDVC

PLIPLD GGARQGETPV VVMPG*EEVN DDDHETPDQV PVESRRSTRP RTTREWYGNP VLSIMLLDNN

Fig. 4. Hypothetic~ BARE-1 polyprotein. Symbols: *, stop codon; +, ~ameshi~. Shaded arrows indicate be~nning of coding

domains labelled above the sequence. Abbreviations as in Fig. 1. acid or asparagine of mammalian retroviruses at n + 12 [9]. Many of these are likewise shared with the plant retroelements which have been analyzed (Fig. 5A). The BARE-1 putative nucleic acid -binding domain resembles that of feline leukemia virus (FeLV) more than the domain of the plant pararetroviruses. The amino acids of all positions can be found in one or more functional retroviruses or retrotransposons [9, 23]. The BARE-I putative gag domain as a whole, hypothesized to extend over residues 1-281 (bases 3392-4234) by comparison with Tntl [23] is only 25 ~o identical (46~o similar)to Tntl and 20~o identical (43~o similar) to copia. This low similarity is not surprising, as the gag and env products are the most rapidly changing of all retroviral proteins [ 13, 49]. The proteinase region The gene products of retroviruses and retrotransposons are expressed as polyproteins which then undergo endoproteolytic cleavage into functional units by the self-contained aspartic proteinase [82]. An alignment of the putative BARE-1 gene product with mammalian retroviruses, plant pararetroviruses, and retrotransposons (Fig. 5B) re-

veals a completely conserved aspartic proteinase active site (residue 303, base 4298) and a canonical C-terminal domain (residue 399, base 4584) [5, 57, 72, 75]. As expected, the region surrounding the active site is somewhat more similar to the aspartic proteinase domain of the tobacco Tntl than to a cellular barley aspartic proteinase [66]. The retroelement aspartic proteinases are however extremely diverse, so little sequence conservation outside the active site can be observed (Fig. 1) [13, 49]. Better sequence alignments might be constructed if more crystallographic tertiary structures besides that for the HIV-I proteinase [41] were available. The endonuclease region Endonuclease activity appears to be both necessary and sufficient for integration of LTRcontaining retroelement D N A into the genome [24]. An alignment of endonuclease sequences from diverse retroviruses and retrotransposons to the hypothetical BARE-1 polyprotein (Figs. 5C, D) reveals blocks of amino acids highly conserved or identical in B A R E - l , despite the relatively low overall sequence similarity (Fig. 1). Zinc-finger

838

|

0

......

ooooo . o ": .x

. . .~-L~.. . . . . . . .~. . .

>I .~ o

~

~'~

2

o~

II1

,

~.+-

L

,

,

,

,

i..

~0

:::::

0 0

0

~

i_

g~oo

o o ~ o o o o o

A

o 8.5~

~l.~..~__.~ ~ ~ ~-~->

> ~

g

&&

v v

~.~>,~

839 motifs of structure H i s - ( 3 - 4 ) - H i s - ( 2 0 - 3 2 ) Cys-(2)-Cys [30, 51 ], which could be involved in the binding of endonuclease to D N A [36], could be identified in BARE-1 and other aligned sequences (Fig. 5C), and are postulated for other retroelements as well [73]. In addition to this motif, a segment of the endonuclease domain reveals good conservation to those of other retroelement endonucleases (Fig. 5D). The del element, which is more similar to the yeast Ty3 and Drosophila 17.6 and gypsy family of retroelements than to the yeast Tyl and Ty2 and the Drosophila copia group [28, 47, 73], differs most from the other retroelements compared. The reverse transcriptase-RNase H region An alignment was made between the putative RT/RnaseH region of BARE-I to retrotransposon, caulimovirus, and retrovirus RT/RNase H domains (Fig. 6). This part of BARE-1 is somewhat more similar to copia-like elements than other coding regions are (Fig. 1). Clearly dividing the RNase H domain, the RT domain, and the region connecting the two, which is evolving more than twice as fast as RT in retroviruses [49], was not possible in the absence of proteolytic and structural data, so these regions are presented together in Fig. 6. A comparison of these domains from Tntl and copia, closest to BARE-1 with 63.0~o and 56.6~o similarity respectively, helped to define the putative R T - R n a s e H region in BARE-1 as between A s p 729 and A s p 13°° (bases 5574-7289), similar in size to that of HIV-1 [2]. Pairwise comparison [56] of the RT regions by P I L E U P divides the RT/RNase H of BARE-1 and the copia-like retroelements from those of caulimoviruses and retroviruses, in agreement with previous phylogenies [ 13, 84]. The identification of conserved residues within very divergent RTs is complicated by the distinct alignments produced by different methods [2, 13, 84]. Nevertheless, our alignment reveals several conserved motifs identified by others. A region between Tyr 864 and Ala 889 is included in conserved block 2 of Xiong and Eickbush [84], within which the motif Y K A R L V A G F has been interrupted by a C---,T transition at what would

be A r g 867 conserved as Arg or Lys in all RTs aligned here. The region between M e t 897 and L e u 1°°2 is very highly conserved, and includes the invariant motif YxDD. The double Asp of this block has been conserved among at least 82 diverse RT and other polymerases [13, 36, 84]. Site-specific mutagenesis of HIV-1 RT at three residues in this motif destroyed enzymatic activity in vitro [42]. The three RT-like sequences from barley that were cloned by P C R from primers spanning conserved domains [17] are surprisingly divergent from BARE-1 and from each other (Fig. 6). The bar29 sequence is most similar to BARE-I RT, with 55.6 ~o identity over the length published, but at most 33.3~o identity to the other two P C R clones. On this basis, it is unlikely that bar29, bar30, and barl21 are members of a BARE-1 family of elements. However, another RT fragment (accession M94470) recently cloned from the barley genome by similar methods [80], is 89~o identical and 96~o similar to the BARE-1 putative RT over its length and therefore may be a member of a BARE-1 retroelement family. Alignment of the BARE-1 hypothetical gene product to RNase H domains of other retroelements (Fig. 6) indicates several areas of strong conservation between them. Because of the diversity of the RT-bearing elements, it was not possible to simultaneously achieve the best local alignment for conserved residues in RNase H previously identified [10, 13, 36] and the best global alignment without adding many gaps between otherwise conserved residues in the retroelement classes. However, several highly conserved domains and residues are apparent. The region between P r o 1°99 and Va11118 is almost identical in all retrotransposons compared, and includes Gly 11°5, Tyr 11o91 and Asp 1170 which are invariant in all examined retroviruses as well. In addition, residues Ile 11°7, Ala 12°6, Glu 1223, and Asp 1249 are either identical or conserved in all retroelements aligned, consistent with the alignment of Sch0del et al. [70]. BARE-1 copy number Southern hybridizations with probes to the LTR,

840

• w "0~'0

"0

>..~

m

C..~

. . . . .

L • ,4-~ I~ Q. 3

L.

cn

. . . . .

..

.~,lk--~

~ >

,'~

. . . . .

°

.

.

.

.

.

.

.

.

.-~'0

. ..J ~

O.O.

~

..J .J

:> ~

:>



>. ..J

. a a a a a a a ~ = h . a a a ; : ~ a ~ a o a

.

.

.

.

.

.

v,. v.. ,~-

~O.

,

.

;o

ro

o0

cn

(n

~





.

B

V~'lt..

>.~

o . D . . .

>-~

>-

,'o

Q.'O

o~>

:

• . J Z ,

.~

L'O~

•l ~ i

. ~. u.

~[~.N: ~ - - . .

a

uJUJI

..~

N .....

n

~-

.

(.1

I T C r (J o )

>

, . .

.

B

.>.-o,~(n>~0

;>

.

~r~-~-o .

-o -o 0/

.

o

.



,



m

"N" I..~-5

.



~..~ ~

,~[~jo--o

0

.



cr.~

>.CL

.~

• NI-~-~, -~ ~

O~

~_j

j=

,o

o'..~

"~..~

~ ..~

>.

...

~ o a ~ o c o m ~ o ~ o o ~ ~o h - O, m o ~

O~ o

r,- u~ r,- ~

m

c:) o

(::) ~

~

--

>

";" ~

o

~.~

o

-~r , o

o o

>.

>.

>.

841

"IDm~

~....% ....

o).=

• 0".~

[TO)



0 " > . ~

o ) ~.

e.~

.......

I~

~

o-

:11:11 I-, . ['~]

, ..~

~s--~=--~_N=--~

-.=

8 u

c~C

~-

C . . . . . .

L

.

,

u

O'C.~

.'~

m c

,

,

.'0~

, ~ 0 . c : C:

li E

. . . .

{J{J ( J l - ~ ~ , . > . > . > . ~ . 1 ~ ' ] O" o ) E - 0

~1

>- >- ~- C

.NoN

>

. > > E

I : : : ~ o °°r~

.

.

.

.

.

.

.

.

N 3 3 : ~

.



• o) o~ c r 4 ~

0""0

m In f~ m

e L L

!!i!!

.

-';N •



, ~) 0 " 0

~

m o

nd m , ~ m

I~ ~

.

N ....

~: . . . . .

.

.

.

.

.

.

.

c

.

.

.

~.

0.'0

o)

cr

>.~-

.

o l o} 0 "

~1

IT

.

.

=~.: : : : :

"0"0

,,~

.

. i

u



.

m

IE~I-N E

u

~

m

E m

O'O'L ~

a

.

>.~ ~

• >->'0-~-~

o°o

I::~:

.....

~

~

C

u

u



842 gag, proteinase, and endonuclease domains (not

shown) show extensive hybridization even at high stringency (0.1 × SSC, 65 °C). The presence of sequences ranging from larger than 23 kb to smaller than 2 kb indicates the presence of a large family of related sequences. To quantify the BARE-1 prevalence, slot blots containing serial dilutions of the cloned 5' LTR as a standard and total genomic DNA of cv. Bomi were probed at high stringency (wash at 0.1 x SSC, 65 °C) with the entire LTR (data not shown). Densitometry indicated that 1 ng of barley DNA gave the same hybridization response as 74 pg of LTR standard. Since the plasmid standard was 5233 bp and contalned one LTR, and the haploid genome of barley is about 5.5 pg or 5.0 x 109 bp [3], 7.1 x 104 plasmid copies gave the same response as one haploid genome equivalent of barley DNA. If all LTRs are present as pairs, the barley genome contains about 3.5 x 104 copies of BARE-1 based on the LTR frequency. However, a probe for the putative gag domain, among the most rapidly evolving in retroelements [ 13, 49], indicated only 4.9 × 103 copies (Fig. 7). A similar genome reconstruction experiment with a probe for the RT region gave an estimated 7.8 × 104 copies of RT-like sequences per haploid barley genome. In view of the occurrence of highly conserved RT domains and the presence of RT-like sequences outside the BARE-1 family (see above), this is probably an overestimate of the BARE-1 element number. The LTR copy number is likewise probably greater

Fig. 7. Slot blot with putative gag domain for estimation of BARE-1 copy number. Row A, cloned gag standard: Slot 1,

10 ng; 2, 1 ng; 3, 100 pg; 4, 10 pg; 100 ng herring sperm D N A added as carrier to all samples. Row B, cv. Bomi total genomic DNA: Slot 3, 100 ng; 4, I0 ng. Sample in slot 4 had 100 ng carrier DNA added.

than the number of active elements, as solo LTRs have been observed for example with yeast retrotransponsons. Discussion

From the fortuitous cloning of a genomic fragment with homology to retroelement LTRs, we have been able to isolate and characterize the first complete transposable element reported for barley. Analysis of this element, B A R E - l , reveals putative promotor and RNA-processing motifs in the LTR which would be necessary for replicative and transpositional activity. Indeed we have evidence (manuscript in preparation) that BARE-1 is transcribed in vivo and that the BARE-1 LTR can drive transient expression of reporter constructs. The hypothetical ORF is interrupted by a frameshift and four stop codons but nevertheless displays striking homology to protein products required of other retroelements which are required for function: a nucleic acid-binding polypeptide; an aspartic proteinase; an endonuclease; reverse transcriptase; RNase H. The stop codons and frameshift make the BARE-1 copy presented here inactive, and the presence of 4 ~o sequence divergence between the two LTRs indicates that this element has not been recently mobile. However, it may be the very inactivity of the cloned BARE-I which has preserved its high similarity to other copia-like retroelements. Reverse transcriptase has reported errorrates of 6 × 10-4to 3 x 10-5 [62], whereas cellular DNA polymerase I yields a mutation rate at least 103 lower [21, 74]. Thus, inactive retroelements replicating as part of the chromosome might tend to change more slowly over time than active elements. These two replication paths would create two groups within any family of retroelements: slowly diverging inactive copies accumulating random mutations; active elements with functionally essential amino acid residues conserved in the face of high mutability and rapid evolution [ 13]. The apparent diversity of retroelement sequences in higher plants [17, 53, 54, 80], as well as in-depth analyses of specific retroelement families [38, 79] supports this concept.

843 Based on quantitative slot-blotting, RT and BARE-1 LTR sequences appear to be present in the haploid barley genome in more than 104 copies. The rapid evolution of gag domains may account for the detection of only 4900 copies similar to BARE-1. The number of RT-containing elements has not been established, but appears to be greater than that of BARE-1. The barley elements bearing the PCR-amplified RT sequences recently reported [17, 80] may be no closer to B A R E - I than this retroelement is to copia. BARE-1 displays a general similarity to W I S 2, a rye-like sequence appears to be associated with at least some representatives of both, and a B A R E /-like barley RT region is closely related to wheat and oat sequences [80]. Wheat, barley, rye, and oats are known to share 23 ~o of their moderately repetitive (101-105 copies) DNA, and about 17~o of their total genomes [18, 50]. Thus, these cereals may share a large super-family of retroelements stemming from their common progenitor. Furthermore, retroelements are extremely widespread not only in plants, but also in animals and fungi, and RT sequences or products are present in myxobacteria, some E. coli isolates [32, 33, 44], and even in mitochondria [71]. Trees based on similarities in retroelement gene products [17, 22, 28, 84] do not generally parallel phylogenies of the host organisms or genomes. Whether this broad but incongruous distribution, and the apparent antiquity of retroelements, may result from their horizontal transfer between phyla and the persistence and tolerance of their parasitism, or instead from retroelements serving a more positive function in genome evolution [16, 19], remains open. McClintock [48] has proposed a role for transposons in the response to 'genome shock', and Ty elements can produce adaptive mutations in yeast [83]. We are therefore analyzing the transcription, function, and diversity of BARE-1 retroelements in an attempt to understand their role in barley.

Acknowledgements The authors wish to kindly thank Dr P. Langfidge for papers in press, Anne-Mari Kakko for excel-

lent technical assistance, and Prof. Mart Saarma for critical reading of the manuscript. This work was supported by a grant from the Ministry of Agriculture and Forestry of Finland.

References 1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 215: 403410 (1990). 2. Barber AM, Hizi A, Maizel JV Jr, Hughes SH: HIV-1 reverse transcriptase: Structure predictions for the polymerase domain. AIDS Res Hum Retroviruses 6: 10611072 (1990). 3. Bennett MD, Smith JB: Nuclear D N A amounts in angiosperms. Phil Trans R Soc Lond B 274:228-274 (1976). 4. Berg JM: Potential metal-binding domains in nucleic acid binding proteins. Science 232:485-487 (1986). 5. Blundell TL, Cooper JB, Sali A, Zhu Z-Y: Comparisons of the sequences, 3-D structures and mechanisms of pepsin-like and retroviral aspartic proteinases. In: Dunn BM (ed) Structure and Function of the Aspartic Proteinases, pp. 443-453. Plenum Press, New York (1991). 6. Boeke JD, Corces VG: Transcription and reverse transcription of retrotransposons. Annu Rev Microbiol 43: 403-434 (1989). 7. Bucher P, Trifonov EN: Compilation and analysis of eukaryotic POL II promoter sequences. Nucl Acids Res 14: 10009-10026 (1986) 8. Clare J, Farabaugh P: Nucleotide sequence of yeast Ty element: evidence for an unusual mechanism of gene expression. Proc Natl Acad Sci USA 82:2829-2833 (1985). 9. Covey SN: Amino acid homology in gag region of reverse transcribing elements and the coat protein gene of cauliflower mosaic virus. Nucl Acids Res 14:623-633 (1986). 10. Davies JF, Hostomska Z, Hostomsky Z, Jordan SR, Matthews DA: Crystal structure of the ribonuclease H domain of HIV-1 reverse transcriptase. Science 252: 88-95 (1991). 11. Dellaporta SL, Wood J, Hicks JB: A plant D N A minipreparation: Version II. Plant Mol Biol Rep 1:19-21 (1983). 12. Devereux JR, Haeberli P, Smithies O: A comprehensive set of sequence analysis programs for VAX and Convex systems. Nucl Acids Res 12:387-395 (1984). 13. Doolittle RF, Feng D-F, Johnson MS, McClure MA: Origins and evolutionary relationships of retroviruses. Quart Rev Biol 64:1-30 (1989). 14. Emori Y, Shiba T, Kanaya S, Inouye S, Yuki S, Saigo K: The nucleotide sequences ofcopia and copia-related RNA in Drosophila virus-like particles. Nature 315:773-776 (1985). 15. Feng DF, Doolittle RF: Progressive sequence alignment

844

16. 17.

18.

19.

20.

21.

22. 23.

24. 25.

26.

27.

28.

29.

30.

31.

32.

as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351-360 (1987). Finnegan DJ: Eukaryotic transposable elements and genome evolution. Trends Genet 5:103-107 (1989) Flavell AJ, Smith DB, Kumar A: Extreme heterogeneity of Tyl-copia group retrotransposons in plants. Mol Gen Genet 231:233-242 (1992). Flavell RB, Rimpau J, Smith DB: Repeated sequence DNA relationships in four cereal genomes. Chromosoma 63:205-222 (1977). Georgiev GP: Mobile genetic elements in animal cells and their biological significance. Eur J Biochem 145:203-220 (1984). Ghosh HP, Ghosh K, Simsek M, RajBhandary UL: Nucleotide sequence of wheat germ cytoplasmic initiator methionine transfer ribonucleic acid. Nucl Acids Res 10: 3241-3247 (1982). Gojobori T, Yokoyama S: Rates of evolution of the retroviral oncogene of Moloney murine sarcoma virus and its cellular homologues. Proc Natl Acad Sci USA 82: 4198-4201 (1985). Grandbastien M-A: Retroelements in higher plants. Trends Genet 8:103-108 (1992). Grandbastien M-A, Spielmann A, Caboche M: Tntl, a mobile retroviral-like transposable dement of tobacco isolated by plant cell genetics. Nature 337:376-380 (1989). Grandgenett DP, Mumm SR: Unraveling retrovirus integration. Cell 60:3-4 (1990). Gribskov M, Burgess RR: Sigma factors from E. coli, B. subtilis, phage SP01, and phage T4 are homologous proteins. Nucl Acids Res 14:6745-6763 (1986). Guidet F, Rogowsky P, Taylor C, Song W, Langridge P: Cloning and characterization of a new rye-specific repeated sequence. Genome 34:81-87 (1991). Hansen L, von Wettstein-Knowles P: The barley genes Acll and Acl3 encoding acyl carder proteins I and III are located on different chromosomes. Mol Gen Genet 229: 467-478 (1991). Hansen LJ, Chalker DL, Sandmeyer SB: Ty3, a yeast retrotransposon associated with tRNA genes, has homology to animal retroviruses. Mol Cell Biol 8:5245-5256 (1988). Harberd NP, Flavell RB, Thompson RD: Identification of a transposon-like insertion in a Glu-1 allele of wheat. Mol Gen Genet 209:326-332 (1987). Hartshorne TA, Blumberg H, Young ET: Sequence homology of the yeast regulatory protein ADR1 with Xenopus transcription factor TFIIIA. Nature 320:283-287 (1986). Hirochika H, Fukuchi F, Hirochika R: Retrotransposons are ubiquitous in plants. In: Molecular Biology of Plant Growth and Development (Proc Third Int Congr Plant Mol Biol, Tucson, USA), Abst. 1758 (1991). Inouye M, Inouye S: Retroelements in bacteria. Trends Biochem Sci 16:18-21 (1991)

33. Inouye M, Inouye S: msDNA and bacterial reverse transcriptase. Annu Rev Microbiol 45:163-186 (1991). 34. Jin Y-K, Bennetzen JL: Structure and coding properties of Bsl, a maize retrovirus-like transposon. Proc Natl Acad Sci USA 86:6235-6239 (1989). 35. Johns MA, Babcock MS, Fuerstenberg SM, Fuerstenberg SI, Freeling M, Simpson RB: An unusuallycompact retrotransposon in maize. Plant Mol Biol 12:633-642 (1989). 36. Johnson MS, McClure MA, Feng D-F, Gray J, Doolittle RF: Computer analysis of retroviral pol genes: Assignment of enzymatic functions to specific sequences and homologies with nonviral enzymes. Proc Natl Acad Sci USA 83:7648-7652 (1986). 37. Joshi CP: An inspection of the domain between putative TATA box and translation start site in 79 plant genes. Nucl Acids Res 15:6643-6653 (1987). 38. Konieczny A, Voytas DF, Cummings MP, Ausubd F: A superfamily ofArabidopsis thaliana retrotransposons. Genetics 127:801-809 (1991). 39. Kozak M: Selection of initiation sites by eucaryotic ribosomes: effect of inserting AUG triplets upstream from the coding sequence for preproinsulin. Nucl Acids Res 12: 3873-3893 (1984). 40. Kyte J, Doolittle RP: A simple method for displaying the hydropathic character of a protein. J Mol Biol 157: 105132 (1982). 41. Lapatto R, Blundell T, Hemmings A, Overington J, Wilderspin A, Wood S, Merson JR, Whittle PJ, Danley DE, Geoghegan KF, Hawrylik SJ, Lee SE, Scheld KG, Hobart PM: X-ray analysis of HIV-1 proteinase at 2.7 .~ resolution confirms structural homology among retroviral enzymes. Nature 342:299-302 (1989). 42. Larder BA, Purifoy DJM, Powell KL, Darby G: Sitespecific mutagenesis of AIDS virus reverse transcriptase. Nature 327:716-717 (1987). 43. Lee D, Ellis THN, Turner L, Hellens RP, Cleary WG: A copia-like element in Pisum demonstrates the uses of dispersed repeated sequences in genetic analysis. Plant Mol Biol 15:707-722 (1990). 44. Lim D, Maas WK: Reverse transcriptase in bacteria. Mol Microbiol 3:1141-1144 (1989) 45. Lucas H, Moore G, Murphy G, Flavell RB: Inverted repeats in the long-terminal repeats of the wheat retrotransposon Wis 2-1A. Mol Biol Evol 9:716-728 (1992). 46. Maniatis T, Fritsch EF, Sambrook J: Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1982). 47. Marlor RL, Parkhurst SM, Cortes VG: The Drosophila melanogaster Gypsy transposable element encodes putative gene products homologous to retroviral proteins. Mol Cell Biol 6:1129-1134 (1986). 48. McClintock B: The significance of responses of the genome to challenge. Science 226:792-801 (1984). 49. McClure MA, Johnson MS, Feng D-F, Doolittle RF:

845

50.

51.

52.

53.

54.

55.

56.

57. 58.

59.

60.

61.

62.

63.

64.

Sequence comparisons of retroviral proteins: Relative rates of change and general phylogeny. Proc Natl Acad Sci USA 85:2469-2473 (1988). Mclntyre CL, Clarke BC, Appels R: Amplification and dispersion of repeated DNA sequences in the Triticeae. Plant Syst Evol 160:39-59 (1988) Miller J, McLachlanAD, KlugA: Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes. EMBO J 4:1609-1614 (1985). Mogen BD, MacDonald MH, Graybosch R, Hunt AG: Upstream sequences other than AAUAAA are required for efficient messenger RNA 3'-end formation in plants. Plant Cell 2:1261-1272 (1990). Moore G, Cheung W, Schwarzacher T, FlaveU R: BIS 1, a major component of the cereal genome and a tool for studying genomic organization. Genomics 10:469-476 (1991). Moore G, Lucas H, Batty N, Flavell R: A family of retrotransposons and associated genomic variation in wheat. Genomics 10:461-468 (1991). Mount SM, Rubin GM: Complete nucleotide sequence of the Drosophila transposable element copia: homology between copia and retroviral proteins. Mol Cell Biol 5: 1630-1638 (1985). Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443-453 (1970). Pearl LH, Taylor WR: A structural model for the retroviral proteases. Nature 329:351-354 (1987). Pouteau S, Huttner E, Grandbastien MA, Caboche M: Specific expression of the tobacco Tnt 1 retrotransposon in protoplasts. EMBO J 10:1911-1918 (1991). Pouteau S, Spielmann A, Meyer C, Grandbastien M-A, Caboche M: Effects of Tntl tobacco retrotransposon insertion on target gene transcription. Mol Gen Genet 228: 233-239 (1991) Prats AC, Sarih L, Gabus C, Litvak S, Keith G, Darlix JL: Small finger protein of avian and murine retroviruses has nucleic acid annealing activity and positions the replication primer tRNA onto genomic RNA. EMBO J 7: 1777-1783 (1988). Qu R, Bhattacharyya M, Laco GS, de Kochko A, Rao BLS, Kaniewska MB, Elmer JS, Rochester DE, Smith CE, Beachy RN: Characterization of the genome of rice tungro bacilliform virus: Comparison with Commelina yellow mottle virus and caulimoviruses. Virology 185: 354-364 (1991). Roberts JD, Bebenek K, Kunkel TA: The accuracy of reverse transcriptase from HIV-1. Science 242: 11711173 (1988). Roeder GS, Farabaugh PJ, Chaleff DT, Fink GR: The origins of gene instability in yeast. Science 209: 13751380 (1980). Rogowski PM, Manning S, Liu J-Y, Langridge P: The R173 family of rye-specific repetitive DNA sequences: a structural analysis. Genome 34:88-95 (1991).

65. Rogowsky PM, Liu J-Y, Manning S, Taylor C, Langridge P: Structural heterogeneity in the R173 family of ryespecific repetitive DNA sequences. Plant Mol Biol 20: 95-112 (1992). 66. Runeberg-Roos P, TOrmakangas K, Ostman A: Primary structure of a barley-grain aspartic proteinase. Eur J Biochem 202:1021-1027 (1991). 67. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74:5463-5467 (1977). 68. Sambrook J, Fritsch EF, Maniatis T: Molecular Cloning: A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1989). 69. Schnorr KM, Juricek M, Culley D, Kleinhofs A: Analysis of barley nitrate reductase cDNA and genomic clones. Mol Gen Genet 227:411-416 (1991). 70. SchOdel D, Weimer T, Will H, Sprengel R: Amino acid similarity between retroviral and E. coli RNase H and hepadnaviral gene products. AIDS Res Hum Retroviruses 4:9-11 (1988). 71. Schuster W, Brennicke A: Plastid, nuclear and reverse transcriptase sequences in the mitochondrial genome of Oenothera: is genetic information transferred between organelles via RNA? EMBO J 6:2857-2863 (1987). 72. Skalka AM: Retroviral proteases: first glimpses at the anatomy of a processing machine. Cell 56:911-913 (1989). 73. Smyth DR, Kalitsis P, Joseph JL, Sentry JW: Plant retrotransposon from Lilium henryi is related to Ty3 of yeast and the gypsy group of Drosophila. Proc Natl Acad Sci USA 86:5015-5019 (1989). 74. Steinhauer DA, Holland JJ: Direct method for quantitation of extreme polymerase error frequencies at selected single base sites in viral RNA. J Virol 57:219-228 (1986). 75. Tang J, Wong RNS: Evolution in the structure and function of aspartic proteinases. J Cell Biochem 33:53-63 (1987). 76. Temin HW: Structure, variation and synthesis of retrovirus long terminal repeat. Cell 27:1-3 (1981). 77. Varmus HE: Retroviruses. In: Shapiro JA (ed) Mobile Genetic Elements, pp. 411-503. Academic Press, New York (1983). 78. Voytas DF, Ausubel FM: A copia-like transposable element family in Arabidopsis thaliana. Nature 336:242-244 (1988). 79. Voytas DF, Konieczny A, Cummings MP, Ausubel FM: The structure, distribution and evolution of the Tal retrotransposable element family of Arabidopsis thaliana. Genetics 126:713-721 (1990). 80. Voytas DF, Cummings M, Konieczny A, Ausubel FM, Rodermel SR: copia-like retrotransposons are ubiquitous among plants. Proc Natl Acad Sci USA 89:7124-7128 (1992). 81. Warmington JR, Waring RB, Newlon CS, Indge KJ, Oliver SG: Nucleotide sequence characterization of Ty

846 1-17, a class II transposon from yeast. Nucl Acids Res 13:6679-6693 (1985). 82. Wellink J, van Kammen A: Proteases involved in the processing of viral polyproteins. Arch Virol 9 8 : 1 - 2 6 (1988).

83. Wilke CM, Adams J: Fitness effects of Ty transposition in Saccharomyces cerevisiae. Genetics 131: 31-42 (1992). 84. Xiong Y, Eickbush TH: Origin and evolution of retroelements based upon their novel reverse transcriptase sequences. EMBO J 9:3353-3362 (1992).