SRP-RNA sequence alignment and secondary structure - BioMedSearch

1 downloads 0 Views 1MB Size Report
5555. ZEA.M A gaaa uugggu a-cugga c. 'gaaugg u-aagg cua agcag auca -ga c- cgg 235. ZEA.M.: B gaaa uugggu a cuacac gaaugg u -aag gcuuCaaUgcag auca.
Nucleic Acids Research, Vol. 19, No. 2 209

SRP-RNA

sequence

alignment

and secondary structure

Niels Larsen+ and Christian Zwiebl,* Department of Biostructural Chemistry, Institute of Chemistry, Arhus University, Langelandsgade 140, 8000 Arhus C, Denmark and 'Department of Molecular Biology, The University of Texas Health Center, PO Box 2003, Tyler, TX 75710, USA Received August 30, 1990; Revised and Accepted December 3, 1990

ABSTRACT The secondary structures of the RNAs from the signal recognition particle, termed SRP-RNA, were derived buy comparative analyses of an alignment of 39 sequences. The models are minimal In that only base pairs are included for which there is comparative evidence. The structures represent refinements of earlier versions and include a new short helix.

INTRODUCTION Interplay between the signal recognition particle (SRP), the signal ribosomes, and the SRP-receptor is required to translocate secretory proteins across biological membranes (1, 2). The extensively studied canine SRP is composed of six proteins and an RNA-molecule, the SRP-RNA, which previously has been referred to as 7SL RNA (3). Typically, SRP-RNAs consist of about 300 nucleotides and are contacted directly by proteins SRP9, SRP14, SRP19, SRP68 and SRP72; a sixth protein SRP54 assembles only in the presence of bound SRP19 (4). The RNA exhibits extensive secondary structure (5) and it is also folded in three dimensions, as judged by the size and shape of the SRP particle (240 x60 A) (6, 7). Understanding the function of SRP requires a detailed knowledge of the RNA structure. Comparative sequence analyses (8, 9) have proven particularly useful for determining secondary structures of small and large RNA molecules. Here, we apply this approach to the SRP-RNA, for which many sequences are now available. The compilation of the available SRP-RNA sequences, includes those from 6 bacteria (eubacteria), 9 archaea (archaebacteria), and 24 eucaryotes. Since the last review of the structure and function of SRP-RNA (7), several archaeal sequences have appeared (10, 11, 12, 13, 14), and the similarity between bacterial 4.5S RNAs and the conserved portion of the other SRP-RNAs has been recognized (15, 16, 17). sequence,

SRP-RNA sequences The 39 known SRP-RNA sequences used in our analysis are listed in Table 1. Abbreviated and full names are shown, and whether the sequence derives from DNA, RNA, or SRP. Experimental evidence that SRP is involved in protein translocation has only

been provided for canine and several plant SRPs. Frequently, the precise ends of the RNA molecules were not determined, especially when the gene was sequenced. Four partial sequences from Triticum aestivum (c), Crysanthemum morifolium, Benincasa hispida, and Gynura aurantiaca are included. We also searched for additional SRP-RNA sequences in a merged version of the GenBank and EMBL data bases (December 1989). Motif files describing common primary and secondary structural features of the SRP-RNAs were constructed as inputs for the search program ANALYSEQP (18). Only one human pseudogene was found which may not be expressed and, therefore, is not included here. Sequence alignment In the alignment shown in Figure 1, the sequences are grouped as bacteria (top), archaea (middle) and eucaryotes (bottom) of each panel. In our alignment procedure, closest relatives were aligned first on the basis of primary structure similarity; each group of aligned sequences was then treated collectively and aligned against other groups. Sets of conserved nucleotides were then identified and used for aligning in the more variable regions. Finally, where little or no primary structure similarity existed, common secondary structural elements were used as additional markers.

Comparative sequence analysis In our derivation of secondary structure, we distinguish clearly between base pairs that are supported by covariances, and those that are not contradicted. A covariance is the observation that a base pair in one organism is different by both bases when compared to the equivalent base pair in another organism. If the two different pairs are of Watson-Crick type, we observe a compensating base change (CBC). Covariances and CBCs support the existence of a base pair, because during evolution, random single mutations that introduce an unstable pairing would not generally have been compensated for by a further mutation that restored the stability, unless it was required. Thus, such observations is positive evidence, and the more CBCs, the stronger the evidence. Negative evidence is a mismatch, which we define here as neither Watson-Crick pairs nor G-U pairs. In

*

To whom correspondence should be addressed

+

Present address: Department of Microbiology, University of Illinois, Urbana, IL 61801, USA

210 Nucleic Acids Research, Vol. 19, No. 2

contrast, sequence conservation provides neither positive or negative evidence. For each base pair in Figure 2 we estimated positive and negative evidence by counting the number of CBCs and mismatches. The set of sequences that aligned unambiguously at a given alignment position was first identified; the most conserved base pair in this set was then found and the remaining pairs added as CBCs where they covaried. Our guideline was to consider base pairs supported, if there was at least twice as much positive evidence as negative. As a general rule, when there was less, we preferred not to include a base pair. However, when a base pair was supported in one primary kingdom (now termed domain, 19) and disproven in others, we included it as specific for that group. Secondary structure The derived secondary structure models are presented as diagrams in Figure 2a, b and c for a bacterium (Bacillus subtilis), an archaeum (Halobacterium halobium) and a eucaryote (Canis species), respectively. Supported base pairs are juxtapositioned and connected with a line; bases of unsupported pairings at helical ends are placed adjacently with no line between them. Finally, when there is more negative than positive evidence, the base symbols are spaced apart with no line between them. In addition to the secondary structure diagrams, Watson-Crick and G-U pairs of supported helices of all SRP-RNAs are shown in reverse print in Figure 1. Below, the eight identified helices of the SRP-RNAs, numbered one to eight starting at the 5'-end, are discussed, and Bacteria THE. THE. LEG. PNE. PSE.AER. ESC. COL. MIC. LYS. BAC. SUB.

Thermus thermophilus Legionellapnewnophila Pseudomonas aeruginosa Escherichia coli Micrococcus lysodeikticus Bacillus subtilis

MET. VOL. MET. FER. MET. THE. MET. ACE. HAL. HAL. ARC. FUL. SUL. SOL. PYR. OCC. THE. CEL.

Methanococcus voltae Methanothermusfervidus Methanobacteriwn thermoatotrophicum Methanosarcina acedvorans Halobacteriwn halobium Archaeglobusfulgidus Sulfolobus solfataricus Pyrodicnum occultwn Thermococcus celer

(33) (34) (35) (36)

DNA

(315)

DNA

(10) (12) (13) (37) (13) (13) (14) (13)

DA DNA

DNA DNA

DNA

Archaea

DNA DNA

DNA

DNA DM

DNA

Eucaryotes ZEA.M.

A-H

TRI.A. A-C CRY .MOR. LYC. ESC. C IN. HYB. HOM.S. A HOM.S. B CAN. SPE. RAT. RAT. XEN. LAE. DRO. MEL. SCH. POM. YAR. LIP. BEN. HIS. GYN . AUR.

Zea mays Triticum aestivwn

Crysanthemum morifolium Lycopersicon esculentwn Cineraria hybrida Homo sapiens Homo sapiens Canis species Ratns ramus Xenopus laevis Drosophila melanogaster Schizosaccharomyces pombe Yarrowia lipotytica Benincasa hispida Gynura aurantiaca

(38) (39)

SRP SRP

J4U)

RNA

(41) (40) (42)

RNA RNA RNA

(43)

DNA

(44) (45) (46) (47) (48, 49, 50) (48) (40) (40)

SRP RNA RNA

RNA SRP SRP RNA RNA

Table 1. Catalog of the SRP-RNA sequences used in this survey. Abbreviated and full species names were grouped as bacteria (top), archaea (middle) and eucaryotes (bottom) as in Figure 1. References containing sequence information appear in brackets. Sources of the isolates appear in the right column: DNA, implies that the sequence of the gene was determined; RNA, indicates that RNA was isolated from a crude mixture and either sequenced directly or from its cDNA; SRP, shows that the RNA was isolated from a purified particle.

characteristics of each helix and the connecting bulges and loops, as well as their domain-specific peculiarities, are considered.

Helix 1: Is present in archaea and in Bacillus subtilis, but may also exist in the remaining bacteria, since the terminal nucleotides of 4.5S RNA (position 1 to 8 in Escherichia coli) may correspond to nucleotides 2 to 9 in Bacillus subtilis. The helix contains between three and nine base pairs which link the two ends of the SRP-RNA. Its variability in length suggests, as for many RNA molecules, that the main function may be to close the ends of the RNA. Helix 2: Is universal and generally consists of three base pairs. It defines the 5'-domain of the SRP-RNA, which also includes helices 3 and 4. In archaea, helix 2 may be extended at its proximal end (the end with the base closest to the 5'-end) by up to three base pairs. More sequences will be required to provide conclusive evidence for this putative extension.

Helix 3: In eucaryotes, helix 3 consists of four proximal base

pairs, two unpaired bases and a single distal base pair. Only the four proximal base pairs are present in the archaea although there is some support for a fifth. The helix constitutes only two base pairs in Bacillus subtilis, and is absent from the lower eucaryotes Schizosaccharomyces pombe and Yarrowia lipolytica. Helix 4: Extends through all three domains except in Schizosaccharomyces pombe. A constant number of four base pairs occur in the helix, with a variable number of nucleotides in the loop. The large loops of Bacillus subtilis and the archaea participate in tertiary interactions (discussed below). The animal sequences possess a loop of six or eight nucleotides which apparently is compensated for by the loss of three or four nucleotides following helix 2. Plants and fungi exhibit only two nucleotides in the terminal loop and there is strong support for the last distal base pair of helix 4. Whether this is sterically feasible will have to be verified experimentally, but a similar two base loop was detected in the 23S-like mitochondrial rRNAs of higher

eucaryotes

(20).

Helix 4 and helix 2 are continuous on the 3' strand in all SRPRNAs and may stack coaxially. Evidence for this concept is provided by the even distribution of CBCs along the seven continuous base the two helices. In contrast to most other pairs offth aepar te RNA helices (e.g., those of 5S RNA), where such changes generally occur with highest frequency at helix centers, in a coaxially stacked helix the ends are stabilized, and could thus better accommodate base changes.

Helix 5: Consists of several base paired regions with a variable number of bulges. A highly variable region encompassing nucleotides at positions 48 to 97 and 253 to 298 (referring to the secondary structure of Canis species, Figure 2c), is separated from a conserved section (positions 102 to 127 and 223 to 250) by a micrococcal nuclease hypersensitive bulge. The former region varies not only in base composition, but also in size. Thus, for eucaryotic and archaeal SRP-RNAs (Figure 1), the alignment reveals that nucleotides corresponding to about one helical turn are lacking from the latter SRP-RNAs. The conserved part of helix 5 starts with a region of 6 or 7 base pairs, generally followed by a single bulged pyrimidine on the 5'-strand and four nucleotides at the 3'-strand. The next pyrimidine on the 5'-strand is base paired and followed by another small bulge, with one or two nucleotides on the 5'-strand and a single nucleotide on

Nucleic Acids Research, Vol. 19, No. 2 211

solfataricus, and Pyrodictium occultum also within the internal loop (Figure 1). This may reflect that helix 6 is continuously

the other. The next bulge, composed of four or three and four to six nucleotides, is preceded by a four base pair helix. Four to six base pairs, sometimes containing a single nucleotide bulge, terminate helix 5.

stacked. Helix 7: This new helix is exclusive to eucaryotic SRP-RNAs and supported by several covariances. In the plant SRP-RNAs, A-C pairs might occur in helix 7, as they do sporadically elsewhere in the SRP-RNA. Since such atypical A-C and A-G pairings appear occasionally in well supported secondary structural regions (Figure 1) it is likely that they base pair.

Helix 6: Is exclusive to the eucaryotes and archaea. Its sequence is variable except for two bases in the terminal loop (Figure 3, left), which may be involved in protein binding, possibly SRPl9 (21). The CBCs generally occur throughout most of the helix, and in the archaea Archaeoglobus fulgidus, Sulfolobus 22

THE.THE. LEG.PNE. PSE.AER.

33

4444 22

4444

33

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

ESC.COL. -------------------------------.-------------------MICLYS. ------------------------------------------------BAC.SUB.

MET.VOL.

?cgggcc

a--aut ccu--aa| nuuM sacg--aaucMiAmlc-ga &ua-g1; ggaaggu-ulu-gcMugua4444222 > 3333 *u. g u--ug -aaaa M cu-ag uguu' cguguaa uaa-c-au 3 ggc g ------

99cic

uaug a--au cg--aa g u aa- -au gac uacg cg--aa _ a -aa--au a-aga auaug 'cg--aa a c -----cuc-ag guu ugagac cg--aacacc-gc g ARC.FUL. auacg aa--ac guu ucg--aa -ag --ca---SUL.SOL. ucu uauacaacg guagg g caguaa 9ucaguaa uaaug au aacgagc PYR.OCC. gcu guagg THE.CEL. aa- ac guu uauacg cu-ag cg- -aa uguaa -gc uu 9 55555 555555555 4444222 4444 3 3333 222 3333 3 --a u---ugaa ZEA.M. A -- ---------uguaaag -c guagcg g g cgga uuZEA.M. B -------------_--auu uguaa_ag-----u-------c guagcg g gZEA.M. C --------------g ---auu' ugga ag------------c guagcg g uguaa

MET.FER.

MET.THE. MET.ACE. HAL.HAL.

cu-ag cu -ag cu-ag cu-ag cu-ag

guu 'gcc c

g

uuu-

guu

uguaa uguaa uguaa cgaca uguaa

ag- -- - - auu ZEA.M. D -------------uguaa c guagc g - - auu ZEA.M. E -------------c guagc g uguaaa 9 -------------g ---------ZEA.M. F ----------------auu c guagcg g1uguaa* ---g-c ZEA.M. G -------------c guagcg ---auu ag- -------uguaa ZEA.M. H --------------g ---aauaagguagu ag------------uguaa c guagcg gu --auuaaa u a ag----------uguaa TRI.A. A -------------caguugcgag ----auu c TRIA. B -------------g' ug ----aguugcggg' uguca TRI.A. C ---------------------------------------------------------------------~~~~~~~~~~~~~~~~~~~~~~~~~

g9

CRY.MOR.

LYC.ESC.

CIN.HYB.

HOM.S. A HOM.S. B

CAN.SPE. RAT.RAT. XEN.LAE. DRO.MEL. SCH.POM. YARLIP. BENHIS.

GYN.AUR.

-----------------------------------------------------------------------------------g uguaa u guaac ------------u uauaa

g

guaacg uG

c guggcg g c guggcg g c guggcg 9 c guggcg

-

ag-

g-

uguag uguag uguag

gcuac--------uc gcuac--------uc ---------gcuac- uc - --------g uc gc.uac--uguag -gcuac ---uuguaa cuguggcgug 9 ----ug'cgcuuc -------ug uguaa g-uggc-| g -------------c -------------------uguaau----------------------------------------agc-------uguaa M"Muu---------- uu c gcagug gugca --------------g aa -----------------------ag -----------------------

--------_--

gjWuu gcaacg1uu=uauaa

BAC.SUB.

---------------------

-

555555 5!

5555555555555555

WsIt

tnw^uguc--gac 1l

uc

lL-----------MET. FER. -9|L MET. THE. - a l |- -- -- -- -- -- c-----------MET. ACE. -a

u,

cgccacccugcuagacucuucgacggc

ucccgcggcccgaaucgccgcgccc 55555555

MET. VOL.

- -a z zz l

--?

caau--gaa c

_

uguc --gaa

acuga--gaa

u

_

uci u

-a u C acggu ---aa HAL.HAL. ARC. FUL. ag cgclluloga--gag u SUL SOL. -c lSlE------------- tbsZZXwacuau--tga-c_ c r"e------------u , -u PYR.OCC. akfii4|cccau--gau caacggu--gau -----------ggaa THE. CEL. aa 5555555 51 555555 r c -uug-ugacac--uggc ZEA.M. A gcgauggcu--uu cc ggcCu-g -uug-ugacuc--uagc ZEA.M. B gcgauggu---uu

=ggCCCug

ugccu-g ZEA.M. C guaccgugu ZEA.M. D gcgauggcu--uu ggccC-g ZEA.M. E ZEA.M. F cugacggiau -u-gg ggccc-g ZEA.M. G gcgauggcu--uu ZEA.M. H uug-uggug---cau ugccu-g gccu-g TRI.A. A ugu-agcgc--- ug gccu-g TRI.A. B gugcgguuu----g TRI.A. C CRY. MOR. -ucuu anc------

gcgauggcu--uu ggccc-g

-ugg-ugugccu-uugu -cua-ugugccuuuggc -cua-ugugccuuuggc --ua-ugugccacuggc -cua-ugugccuuuggc

-gug-cuacaaa--cug -gug-ugcuacu--- gc -uga-uguaaccu-ugc

c c c c

c c c

agnugguu-gunc -ccc>gcagc--acccg cccnnc u -c-gcc ggcuu cc u gcu 9! _u uieg uu: u

uuag--uu g cc--- agaagguu-g -9 LYC. ESC. -g9u u gc--ac g -g C N. HYB. a---agguu cu ---- g c HOM.S. A ga c g a-guu HOMS. B ga c --c-a-guu ggcu- -g cc - --CAN. SPE. ga c -gu -cC gc33 c----si c - 9u ug g c -g 9 -a gas9ca RAT. RAT. c g c a - guc-guug cu- -g XEN. LAE. c ucuacugua- -gc ccagc-ucauguu -- ---- ga DRO. MEL. a cg -ucggguu - Q- guc cEElu -ag guu --SCH. POM. --ac-uuggcc ---- aU ga-gucu ggcu --YAR. L P. g BEN. H S. uuggacagna-cacagagca --- gggcauu - ucg-agacaca GYN. AUR. ugaaaggg g

c c

c

-

u c u

62 62

62

auuuuuu-

62 65 65 65 65 63

ugga

-gga gaa

-cuuc u-------aucaagg-ugguuggagg-

-------

82

87

17 60

uggaugucaau-

ugaac

gga gga

u

85 81 84 84

-ggacuuu --

cgga

cu u-

3 ugaau

---------u -- -uggua'

82

83

ig-

ugga

-gaa a-- -aucaa ug--aua= -u _ --gga ------u u

80

62 62 62 62 62 62 61

cgga ucgga u-

acaagga -u ----u-u cgga

_

55555555555555 5 THE. THE. LEG. PNE. PSE. AER. ESC. COL. MIC. LYS.

aaa uu

9 83

aagaaaaaggguuggaau-

62 29 47 57 58

24 30 29 29 78 14 0 150 15 1 15 2 15 5 14 8 15 4 15 3 148 1 5

14 5 14 2 14 3 14 6 14 6 14 3 14 6 14 3 14 3 14 1 47 89 140 142 144 144 144 14 4 14 2 14 4 1 07 126 93 66

212 Nucleic Acids Research, Vol. 19, No. 2 u-cag u-cagg

gaaCC gaacc

- - ag LEG.PNE.----------------- - ua PSE.AER.----------------- - ac ESC.COL.----------------- - uc MICLYS.-----------------

caacc uuacca

gaacu

u-cagg

a -gac BAC.SUB.------------------ -

66

mETlO. uuaa

686666866

ucaaa

uaaa-

cuc--

PYR.OCC.

ZEA.M

A

gaaa

ZEA.M.:

B

gaaa

ZEA.M.

D

ggaa ggaa

E F

gaa

ZEA.M. G

ggaa

ZEA.M.

gaaa

H

TRI.A.

TFIIA. TRI.A. CB

6666 666

uugggu uugggu cugggc

gaaa gaaa

ccuu-

gaacc

a-cugga a

c

u u u c

ugaa u

ug--9ggc

a-'

C- cuuaa u gc- uaa u

cu

LYC.ESC. ' aa cuaa u g- uu gCaINug-YBguc

A-guaa IO.S ~ ~ ~~-aa B ga HCM.S.

C

gga

CANSPE. RAT.RAT.

XENLAE. DRO.MEL.

gga gua gua

BENHIS GYN.-AUR

THE.THE.

cu-

gaa-g uug-

c

LEG.PNE. c

MICLYS. gu BAC.SUB. c

gaaugg

gaacgg gaaugg gaaugg gcaug

u-aagg

agcag agca agcag agca agcaa

gga--a gga--a' ga--a gga- -a gga- -a

agcag agcaa

gga- -a gga- -a

CC cc ug gc uaa88 888 55 55555

-Uuac

u-a

- gc

u

-g -gc

a a

u-a

u-a -a

-

aC

a-c u-a

gC-C -g

u-gagg u-gag

-

H

g

uaa

agcag auca -ga a auca c auca auca c g guc agcag uca aUU C gcag auca c g9CuCa cua

c-

-

cgg cgg cgg

uuu-

agcag

u-

cgg

235

u-gagg

gcuuca

cgg

-u a gg uu-gagg

23 28 3

gaacc aacc

c-cagg -cag

guuuca gcuuca guuuca gaa- -a gaa- -a

cg cg

ggccg u-uagg caau u-gagg

acaau - caagua agcag aca:u agcag aca

cag cug cag cag

gaacc

c-cagg

aa- -a

228 232 232 232 232

cauug gcaccg

gcacag gcacca

aac CC

gaacc

uc

uag DRO.MEL. u SCH.POM. u 'cga ARAIP. uc

gga

ugcacu

-cagg

-uagg c-cag ug-gagg -c a gu

-c

uca

gcag

gcuuca agcag gcug gca gca

c

agca

g-C--

aa aa

-

C

aa

gcag

g

u

c

acaaa

-

ucag ucag

aa

ucag cag cag

aa gcag a-gcag aga gaac- -C -a gaa- -a' ugcac gca - - a cc*aucc' -cggucagcag

uugg

c ugcalc u -c

uaacu

caaca cgcuu' caacu

caacu

uac

--cug

c-

---

93 66

105 115 168 271

---c

uc cac gga ----c gcu-cagaaccucacu-

acc-cagauccuc ucu-:

u

gcc-uaga-cuucauacua gcc-cagagccucacu- --gcc-cagagccucacu ---

1

303 3109

-aa----a ua--

0

-

gcg

3 18

u309

acu-306--308 u-----'gac gac gac

u------

gac --- Ng a

cMc3--------caau-nggccucgcagccuacuccgcucgcCuccuccacccccgcca- ---iuigacca~u ------

cugcacu-ccuaggccgcu-ugggccucguagccuacuc--cagc-gaccaccauc-------u-----c ' g caguaacgcc u uugcuc

-

232 23 2321 2194 15

308 309

-0

a

cugcagc-a

-

230

gcc-cagagccucacu- --ca caacu ~~cu-u cugca~~~~~ cau--266--u-----cuugcau-aga 305 ucc--cagagccucacuU--caacu cugcacuu-

-

uaa

auc

cccc

cugcacu-u

uc

23 3

233 236 236 23 3

gcuuca

u-gagg

c a-cu--a aa----

-

g auaa uua g

235

315

-

auaa

guc g ucg

cgg cgg

cgg

u-

ucg---a-g--c---- ca c ug cga ucua ga-ga

cugcacu

uaa

''

23 234 237 23 3

306 0

cugcacu-u cugcagc-a

guaa auaa-

G g ZEAM. u

198

232

-gu u-a-g-c 235 -a ac -gu 231 240

gcag aU Ca gcuu u-gagg ucuuca agcag u-gagg gcuuca agcag

u -a ag

gaaug

gaacc guacc c 'u u'gggaaa

-uga-ggcaa ug a- gc -ga

-

ZEA.M. F uu ZEA.M. TRI .A. A u

u u

'gaaugg

gua gga-g -ucaa

uaa~ guaa~ uaa

RAT.RAT. XEN.LAE.

888

83

87

86 86 138

ucgga---u gga-gaa

ZEA.M. C u ZEA.M. D u ZEA.M. E u

HOM.S. A u HOM.S. B u CAN-SPE. u

agcaa

- -cuu---------------------------------

a~~~~uaa~ -

ZEAM-B u

LYC.ESC. u CIN.HYB. u

gcag 888

-a

gc gg gc

-g*gc

cucu

Cgcc? -------------------------------

auaaa

ZEA.M. A u

CRY.MOR

uaa uaa

uu

ggug1

guag

u

TRI.A. C u

gga -

aca

ugac

55555555 555565555 55555555555 555555555555 55 5555 u-u-cac--aag g-caaa guagE -

gcca

u

TRI.A. B u

aa -a

gga-ga ~~~~ucu

ARCUL-SL u

THE.CEL-

uaa

ucaa* gauuaaE ggucg* gu c

SULSOL- u

PYR.OOC.

c-cagg c-cagg

ggaac

gcag

agcaa

-

ggucuucacguuuccccaggauc?--------------------------------- -u g~ c uaacg ~ lg a-V -----cgagcua guugcM 111111111

5555 5

MET.VOL. u MET.FER. u MET.THE- u MET.ACE. u HAL HAL- u

aa

uc

gga

SCH.PCM. YARLIP.

u-cagg c-gagg c-cagg c-cagg

aaacc

gga- -a 888

c-a* -* uagj

gca

gga- -a

gga--a

5555 5 88 888 gg gc uaa~ -g*gc u-aU -ugau

-ucgg

gca

agcag

-ga u u-a -gc gaacc agcag gga--a u-cagg uuug----c 88 888 55 5555 888 888 888 888 77 88888 77

u-uggu

c '

c -c agg

u-cag

gcaac

cuaca c

u- -ugc ug--ggu u - -cgcc

gaaac g-aac g-aac

ggaac

u -cuaua u-uggu cc -cugaa u-ug -cugaa u cugaa C u-:ggg-9

guag

ZEA.M. C

ZEA.M. ZEA.M.

uauac666

888

guag-a

uagc------ -ac

cc-a

gga

66

uuuc- ---- -guu ucg-

cucuu-

HAL HL. *cag ga SUSO.

AULCSOL. 1gag gga

88888

cuuu- ---- -gc cuua- ---- -gu ccua- ---- -ac

ua

MET FER. ga MET.THE. Eggag MET.ACE. ilgaa

THE.CEL.

6666

gga--a

gga-:gga- -a

u-cagg u-cagg cccagg

888

888

888

888

88888 - - ca THE.THE.-----------------

------

a -a -g g a

-

auauaacc'

cacuc cacuc

-

cacuc

-

cacuc

cuagccc-

cag aa

cguuau gagugga--c gaucaccuu- --g -------ugguauaugu 'U-

-a

gC-

ac U -aacau gac a-aacau u-aacau --c-c gac- cc -aacau -aacac

--

a---

g

-auuu

BEWHIS.

GYN.AUR.

---

ac

uuu u------u------uua -

u-------

u ----c u c - -------

29 8

3 02 204 89

302 3 03

301 299 299 30 1 29 7 0 5

270

~~~~~~~~~~~~~~~~~~~~~~~~~~ 66

Figure 1. Alignment of SRP-RNA sequences. The abbreviated species names correspond to those listed in Table 1. Sequences are grouped as bacteria (top), archaea (middle) and eucaryotes (bottom). Helices are shown in reverse print for each group and are numbered from one to eight (starting from the 5'-end) in lines with blank species name fields. The alignment was created using the sequence editor ALMA (30), and a PostScript file was printed on a photosetting machine.

-separated

into three Helix 8: In contrast to helix 6, helix 8 is base paired regions and two internal loops (Figure 3, right). The nucleotides in the loops are highly conserved, perhaps reflecting their interactions with SRP-proteins, possibly SRP64 and SRP72 (21), or because tertiary structure forms in this region (22, 23). One A-G pairing is supported in the central helical region (Figure

The terminal loop of helix 8 usually contains four purines, but for plants six pyrimidines are present. In Escherichia coli, helix 8 contains the sequence GAAGCAGCCA (matching GAAGCAGCAU in Bacillus subtilis at positions 168 to 177, Figure 2a), which also occurs in the 23S rRNA at positions 1068- 1077. It has been proposed that 4.5S

-1).

Nucleic Acids Research, Vol. 19, No. 2 213 a

5

AU~15

GAc A GA1UG ' -UC-G GAG AA U 'CCUG 100 up, A C AC-G UC CCGAG

uG UG

-

G

A

cGCA

I

GC cc u

GUC A I UG IGU GUUUGG ~~~~~~~~~~~~CUUAAGUA I II IIf lIi I. GGGACG I~~~~~~~~~~~~~1111. CG UCGA GAC CGGGUC U GAAUUCGU C c U ACUC u ACAGCUAAGCGAUUGGGAUUCG GUUCGUUACUUUAAG

I I U G 2

UC

GGA A G C.~~ G

-

..

A '

A

,

c

G

u

AAUCGAG

C AA

250

U

u

CCAA

50

8

C

G ~~~~~~~~~~~~~~~~~~~~~~G'

G u ~ ~ ~ ~ ~ ~ -C

~

~

~

~

~

~

~

~

CG-G

~

G AG A

C-

AGU GACC10

' N-

C

G

CG

GGC GGG CACGI A CCG G

A

G

UG

A

G~ ~ ~ ~ ~ ~ ~ ~ ~ ~-U ~ ~'

AGGGCCGGCC

CGGGCG~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

C

C ~~~~~~~~~~A A C-U A U C-U 300 U-CU-C ~ ~ ~ ~

~

~

~

~

'~~~~~~~~~~~A GC

G

A

A

AUCAGAU At 250 ~

~

G

UCG~~~~~C

G

CU~~~~~~~~~~~~~~IS ~

~

~

~

~

~~~~~~~~~~C-GU

~

U~~~~~~~~~~~~~~~-C

~~~~ylC

8C-G

A

20

6~~~~U

U C~~~~~~~G ~~~~~~~~C

A6

C

U-

15

U

GA C

U U -A Cu

U b UAUUCAAAUUUCACAUC GUA\U UCA CCCUACCAGcIUUC U C-U U GA AICA 111 C-G1 A'II III IllIl Al II 11 1 C-G CGA GUCUUUU CCGACCUCACUCCCUGAUAG UG U UUCCUCUUUC6UAGGUAUCUUUG A c~~~~~~~~~~~~~~~~~~~ A G

A

A

U

C-U UUU

A A AAu A

(CU A

U~~~~~~~Cc A

8

GUC

A

U

200-A

iagamsof ) Baills sbtiis (actriu), ) Hlobaterwn aloium arcaeu) ad c)Cam spcie (euaryte) Pefec srucure 2 secnday SR-RN Figue lie, -UandA-Gpais wth flle an anope crcl, rspetivly Paredbass ivoledin ertaryintratio ar boed,an wth pars re onnctd bas he5'en. old rrwsinFiur 2 idiateexermetalydeerind utsiesbymiroocalnulese(3),whledahe netoeihtfrm ae umerd helce ws rite o aigin nclotde. igre wreedte b th eitr DSRU (2)an aPotSrit il aphtoetinGmchne a nd2bpontatth arrwsinFiurs

RNA competes for an EF-G binding site using this decamer sequence (24) since it can interact with EF-G (25). The sequence AGCAG is highly conserved in the SRP-RNAs and in the 23S RNAs. However, since we could not detect any covariation among the SRP-RNA and rRNA sequences in the remainder of the decamer, the hypothesis cannot be universally valid. The overall implications of the deduced SRP-RNA secondary structure will be addressed in the following:

Evidence for division of living matter into three major domains (bacteria, archaea and eucaryotes) is based primarily on the analyses of ribosomal RNA sequences. The archaeal SRP-RNA

single bacterial representative in their 5 '-ends (Figure 1), and are distinct from the eucaryotes. On the other hand, they resemble the eucaryotes in that they possess helix 6; similar features can be discerned at the more detailed level. Any alternative grouping would include a mixture of these features, and is therefore not equally supported. structures all resemble the

214 Nucleic Acids Research, Vol. 19, No. 2 * A

*, *

0 .

_

.

0

-

.

.

0

-

.

.._ -0

0

-

_

0 *

It is supported by several covariances within both archaea and eucaryotic RNAs, but mismatches and non-covariant changes also occur. Such folding would bring the size of the structure in closer agreement with what is observed in the electron microscope (6). However, both pairings receive insufficient support at present to be included in the model.

0

;-'0

*

0

0

*1 _

A G

C G *

0 *-@ 0 -

*

*-

G

A

*0n

Figure 3. Secondary structure of helix 6 (left) and 8 (right) of Halobacterium halobium showing overall sequence conservation between the corresponding bases in archaea and eucaryotes: invariant bases are shown by letters, dot diameters are proportional to the extent of conservation. Only one Zea mays sequence (ZEA. M. A) and one Triticum aestivum sequence was used to avoid a biased high degree of conservation due to overrepresentation of these closely related sequences.

The RNA region corresponding to helix 8 and the conserved part of helix 5 is common to all SRP-RNAs and present in most bacteria as the 4.5S RNA. The large size of the SRP-RNA of Bacillus subtilis could be explained by the fact that the 4.5S RNA termini occur within a bulge of the Bacillus subtilis RNA and align precisely with the hypersensitive cutting sites of micrococcal nuclease in canine SRP-RNA (indicated in Figure 2); perhaps, a bacterial 4.5S RNA has originated by degradation of a larger bacterial SRP-RNA equivalent and insertion into the genome after reverse transcription (26). We do not know if the terminal sequences of the B. subtilis RNA were rescued and constitute a separate molecule in the other bacteria.

Tertiary interactions in SRP-RNA Of known higher order interactions, the five Watson-Crick base pairs that form between the loops of helices 3 and 4 in Bacillus subtilis and the archaea (marked by > > > > > < < < < < in Figure 1 and boxes in Figures 2a,b) are supported by conclusive phylogenetic evidence. Similar pairings occur between the loops of 23S rRNA helices 5 and 7 (bases 50-120 in Escherichia coli numbering), and it may also be regarded as reminiscent of the pairing between the D- and T-loop of tRNA. We subjected the unpaired regions of the structure to a search for tertiary pairings using a recently developed computer program CBCFOLD (27). This program is well suited for searching regions which cannot be aligned with certainty. Watson-Crick base pairs and G-U pairs were permitted and one out of five sequences were allowed to mismatch. No additional tertiary interactions were found, for which there is conclusive phylogenetic evidence. A few weakly supported interactions were located, however, including the pairing of C9 with G14 and of C190 with G243 (Canis species numbering). The first is supported by three CBCs and is exclusive to eucaryotes; like the five base pair tertiary interaction, this pairing could contribute to a more tightly structured 5'-domain. The second interaction would involve backfolding of helix 8 to lie adjacent to helix 5.

Intereactions with ribosomal RNA SRPs associate with polysomes (28) during translocation of secretory proteins, and therefore we searched-using CBCFOLD-for phylogenetically supported SRP-RNA-rRNA interactions. For the large subunit, the pairing of U94-A95 (U206-U207 in Bacillus subtilis, Figure 2a) with U1273-A1274 (Escherichia coli numbering), receives some support. In Escherichia coli 23S rRNA, such pairing would involve a tract of bases that link structure domains 2 and 3; the nucleotides are reactive towards both nucleases and chemical reagents in the naked RNA (29), which suggests they are at the subunit surface and could potentially interact. We emphasize however, that we have not found firm evidence for any RNA-RNA contacts with the ribosome.

Outlook Analysis of the available SRP-RNAs using the phylogenetic approach has provided a solid secondary structural model of the SRP-RNA. The data base is sufficiently large to identify major group-specific structural features clearly and even reveal details between minor phylogenetic divisions. Currently, the data base is too small to prove possible intra- or intermolecular interactions, more sequences are needed. In the absence of sufficient amounts of material for structure determination by X-ray crystallography, biochemical methods (enzymatic and chemical modification, site-directed mutagenesis, cross-linking) and molecular modeling will be useful for resolving the three-dimensional structure of the RNA. The present sequence alignment and structures should form a solid basis for design and interpretation of such experiments. Ultimately, such experiments should also provide insight into how the SRP-proteins are assembled in the SRP, and how SRP interacts with the other participants of the translation-translocation machinery. Data distribution The sequence alignment and secondary structure diagrams are available from the second author upon request. We invite submission of new SRP-RNA sequences (as hard copies or by electronic mail to 'zwieb %jason [email protected]. utexas .edu'; in return, submitters will receive the updated alignment.

ACKNOWLEDGEMENTS Niels Larsen was supported by the Bioregulation Research Center, Arhus University, under the Danish Biotechnology program. We thank Roger A. Garrett and Jens Nyborg for critical reading of the manuscript and many colleagues who made the SRP-RNA sequences of their chosen organism attainable.

Nucleic Acids Research, Vol. 19, No. 2 215

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

12. 13. 14. 15.

16. 17. 18. 19. 20.

21. 22. 23. 24. 25.

26. 27. 28. 29.

30. 31. 32. 33. 34. 35.

36. 37. 38. 39. 40. 41. 42. 43. 44.

45. 46. 47.

48. 49. 50.

Walter P. and Lingappa V. (1986) Ann. Rev. Cell Biol. 2, 499-516 Hortsch M. and Meyer D.I. (1986) Int. Rev. Cyt. 102, 215-242 Zieve G. and Penman S. (1976) Cell 8, 19-31 Walter P. and Blobel G. (1983) Cell 34, 525-533 Zwieb C. (1985) Nucleic Acids Res. 13, 6105-6124 Andrews D., Walter P. and Ottensmeyer P. (1987) EMBO J. 6, 3471 -3477 Zwieb C. (1989) in: Prog. in Nucleic Acid Res. and Mol. Biol. 37, 207-234 Levitt M. (1969) Nature (London) 224, 759-763. Fox G.E. and Woese C.R. (1975) Nature (London) 256, 505-507 Kaine B.P. and Merkel V.L. (1989) J. Bact. 171, 4261-4266 Haas E.S., Brown J.W., Daniels C.J. and Reeve J.N. (1990) Gene, 90, 51-59. 0stergard L., Larsen N., Leffers H., Kjems J. and Garrett, R.A. (1987) System. Appl. Microbiol. 9, 199-209 Kaine B.P., Mol. Gen. Genetics, Mol. (1990) 221, 315-321 Kaine B.P., Schurke C.M. and Stetter K.O. (1989) System. Appl. Microbiol. 12, 8-14 Struck J.C.R., Vogel D.W., Ulbrich N. and Erdmann V.A. (1988) Nucleic Acids Res. 16, 2719. Poritz M.A., Strub K. and Walter P. (1988) Cell 55, 4-6. Zwieb C. (1988) Endocyt. Cell Res. 5, 327-336 Staden R. (1988) Computer Appl. Biosci. 4, 53-59 Woese C.R., Kandler 0. and Wheelis M.L. (1990) Proc. NatI. Acad. Sci. U.S.A. 87, 4576-4579. Egebjerg J., Larsen N. and Garrett R.A. (1990) in: Hill W., Dahlberg A., Garrett R.A., Moore P.B., Schlessinger J. and Warner J.R. (eds.), The Ribosome: Structure, Function and Evolution, pp. 168-179. American Society of Microbiology, Washington D.C. Siegel V. and Walter P. (1988) Proc. Natl. Acad. Sci. U.S.A. 85, 1801-1805 Zwieb C. and Ullu, E. (1986) Nucleic Acids Res. 14, 4639-4657 Zwieb C., and Schuler D. (1989) Biochem. and Cell Biol. 67, 434-442 Brown S. (1989) J. Mol. Biol. 209, 79-90 Moazed D., Robertson J. and Noller H.F. (1988) Nature (London) 334, 362-364 Lampson B.C., Viswanathan M., Inouye M. and Inouye S. (1990) J. Biol. Chem. 265, 8490-8496 Larsen N., in preparation Gunning P.W., Beguin P., Shooter E.M., Austin L. and Jeffrey L. (1981) J. Biol. Chem. 256, 6670-6675 Andersen A., Larsen N., Leffers H., Kjems K. and Garrett R.A. (1986) in: van Knippenberg P.H. and Hilbers C.W. (eds.), Structure and Dynamics of RNA. Plenum Publishing Corporation. Thirup S. and Larsen N. (1990) Proteins 7, 291-295 Gundelfinger E.D., Krause E., Melli M. and Dobberstein B. (1983) Nucleic Acids Res. 11, 7363-7374 Larsen N., unpublished Struck J.C.R., Toschka H.Y. and Erdmann V.A. (1988) Nucleic Acids Res. 16, 9042 Brown S., Thon G. and Tolentino E. (1989) J. Bact. 171, 6517-6520 Toschka H.Y., Struck J.C.R. and Erdmann V.A. (1989) Nucleic Acids Res. 17, 31-36 Hsu L.M., Zagorski J. and Foumier M.J. (1984) J. Mol. Biol. 178, 533-550 Moritz A., Lankat-Buttgereit B., Gross H.J. and Goebel W. (1985) Nucleic Acids Res. 13, 31-37 Campos N., Palau J., and Zwieb C. (1989) Nucleic Acids Res. 17, 1573-1588 Marshailsay C., Prehn S. and Zwieb C., (1989) Nucleic Acids Res. 17, 1771 Sanger H.L., unpublished Haas B., Klanner A., Ramm K. and Sanger H.L. (1988), EMBO J. 7, 4063 -4074 Ullu E., Murphy S. and Melli M. (1982) Cell 29, 195-202 Ullu E. and Weiner A.M. (1984) EMBO J. 3, 3303-3310 Siegel V. and Walter P. (1986) Nature (London) 320, 81-84 Li W.Y., Reddy R., Henning D., Epstein P. and Busch H. (1982) J. Biol. Chem. 257, 5136-5142 Ullu E. and Tschudi C. (1984) Nature (London) 312, 171-172 Gundelfinger E.D., Di Carlo M., Zopf D. and Melli M. (1984) EMBO J. 3, 2325-2332 Poritz M.A., Siegel V., Hansen W. and Walter P. (1988) Proc. Natl. Acad. Sci. U.S.A. 85, 4313-4319 Brennwald P., Liao X., Holm K., Porter G. and Wise J.A. (1988) Mol. Cell Biol. 8, 1580-1590 Ribes V., Dehoux P. and Tollervey D. (1988) EMBO J. 7, 231-237