Sm protein-protein interactions - NCBI

11 downloads 495646 Views 3MB Size Report
form a 6S heterooligomeric protein complex which appar- ently binds as such ...... SRP14, and for the ribosomal proteins, S6 and S18, which have been shown.
The EMBO Journal vol.14 no.9 pp.2076-2088, 1995

snRNP Sm proteins share two evolutionarily conserved sequence motifs which are involved in Sm protein-protein interactions Herbert Hermann, Patrizia Fabrizio, Veronica A.Raker, Kirani Foulaki, Horst Hornig, Hero Brahms and Reinhard Luhrmann1 Institut fur Molekularbiologie und Tumorforschung, Emil-Mannkopff-StraBe 2, D-35037 Marburg, Germany ICorresponding author Communicated by R.Luhrmann

The spliceosomal small nuclear ribonucleoproteins (snRNPs) Ul, U2, U4/U6 and U5 share eight proteins B', B, Dl, D2, D3, E, F and G which form the structural core of the snRNPs. This class of common proteins plays an essential role in the biogenesis of the snRNPs. In addition, these proteins represent the major targets for the so-called anti-Sm auto-antibodies which are diagnostic for systemic lupus erythematosus (SLE). We have characterized the proteins F and G from HeLa cells by cDNA cloning, and, thus, all human Sm protein sequences are now available for comparison. Similar to the D, B/B' and E proteins, the F and G proteins do not possess any of the known RNA binding motifs, suggesting that other types of RNA-protein interactions occur in the snRNP core. Strikingly, the eight human Sm proteins possess mutual homology in two regions, 32 and 14 amino acids long, that we term Sm motifs 1 and 2. The Sm motifs are evolutionarily highly conserved in all of the putative homologues of the human Sm proteins identified in the data base. These results suggest that the Sm proteins may have arisen from a single common ancestor. Several hypothetical proteins, mainly of plant origin, that clearly contain the conserved Sm motifs but exhibit only comparatively low overall homology to one of the human Sm proteins, were identified in the data base. This suggests that the Sm motifs may also be shared by non-spliceosomal proteins. Further, we provide experimental evidence that the Sm motifs are involved, at least in part, in Sm protein-protein interactions. Specifically, we show by co-immunoprecipitation analyses of in vitro translated B' and D3 that the Sm motifs are essential for complex formation between B' and D3. Our finding that the Sm proteins share conserved sequence motifs may help to explain the frequent occurrence in patient sera of anti-Sm antibodies that cross-react with multiple Sm proteins and may ultimately further our understanding of how the snRNPs act as auto-antigens and immunogens in SLE. Key words: pre-mRNA splicing/protein-protein interactions/ Sm proteins/snRNPs/systemic lupus erythematosus

Introduction The four major small nuclear ribonucleoprotein particles (snRNPs), U1, U2, U4/U6 and U5, are essential components of the eucaryotic pre-mRNA splicing machinery (reviewed by Guthrie, 1991; Moore et al., 1993). The snRNPs contain as many as 40 distinct proteins that can be divided into at least two groups. The first group consists of the snRNP-specific proteins which are probably responsible for fulfilling the snRNP-specific functions in pre-mRNA splicing. The second group is comprised of the common snRNP proteins, also called Sm proteins (see below), which are present in every snRNP and which are essential for the biogenesis of the snRNP particles. In HeLa cells, at least eight tightly bound Sm proteins, denoted B' (29 kDa), B (28 kDa), D (16 kDa), D2 (16.5 kDa), D3 (18 kDa), E (12 kDa), F (11 kDa) and G (9 kDa) can be distinguished (reviewed by Luhrmann et al., 1990). In high-TEMED, SDS-polyacrylamide gels, the G protein migrates as two closely spaced bands (tentatively denoted G and G') which, on immunoblots, cross-react with anti-G antibodies (Lehmeier et al., 1990; Heinrichs et al., 1992); the exact structural relationship of the two proteins is not yet clear. With the exception of F, which has an apparent pl value of 4.6, all other common proteins are basic (Woppmann et al., 1990). More recently, a 69 kDa protein, which also binds to the snRNP core and may therefore be considered an additional common protein, has been identified. However, the 69 kDa protein is more loosely bound and appears to interact transiently with snRNPs (Hackl et al., 1994). A protein denoted N, which is structurally highly related to B/B', has been identified in snRNPs from neural tissues (McAllister et al., 1988). The Sm proteins appear to be evolutionarily conserved. For example, anti-Sm auto-antibodies have been shown to precipitate spliceosomal snRNPs from all species thus far investigated. Furthermore, biochemical characterization of the protein composition of snRNPs from organisms as diverse as man, mouse, fly, plants and, most recently, yeast, has revealed the presence of a similar set of 6-8 proteins ranging in molecular weight between 10 and 30 kDa, which are shared by the spliceosomal snRNPs (reviewed by Luhrmann et al., 1990; Fabrizio et al., 1994). Most striking in this respect have been the recent demonstrations that the human and yeast Dl and D3 proteins are highly homologous (Rymond, 1993; Lehmeier et al., 1994; Roy et al., 1995) and that the human Dl protein can functionally replace its yeast counterpart (Rymond et al., 1993). The common proteins play an essential role in the biogenesis of the snRNP particles. They associate in the cytoplasm with the Sm site of newly transcribed U 1, U2, U4 and U5 snRNAs, thereby forming the common snRNP

2076 27) Oxford University Press

Conserved sequence motifs in Sm proteins

core RNP structure (Sm core) (reviewed by Mattaj, 1988). The Sm site consists of a single-stranded region with the consensus PuA(U)36 NUGPu, which is often flanked by double-stranded stems (Branlant et al., 1982). The binding of the common proteins to the Sm site is essential for the hypermethylation of the snRNA cap structure to generate the m3G cap (Mattaj, 1986) and probably provides a binding site for the snRNA-(guanosine-N2)-methyltransferase (Plessel et al., 1994). In addition, the common proteins are involved in the formation of a nuclear localization signal on the snRNP core that is essential for the nuclear targeting of the snRNP particles (Mattaj and De Robertis, 1985; Fischer et al., 1993). U6 RNA, which in contrast to the other spliceosomal snRNAs is transcribed by RNA polymerase III, does not leave the nuclear compartment after being synthesized, and thus associates with U4 RNP in the nucleus to form the U4/U6 snRNP particle (Vankan et al., 1990). The assembly of the common proteins onto the snRNA's Sm site appears to occur in at least two steps and involves the formation of RNA-free protein heterooligomers. In particular, pulse-chase experiments have revealed that the snRNP proteins, E, F, G and one or more of the D proteins, form a 6S heterooligomeric protein complex which apparently binds as such to the Sm site (Fisher et al., 1985; Sauterer et al., 1990). More recently, using immunological and biochemical techniques, we have demonstrated a multitude of protein-protein interactions between Sm proteins. For example, we could detect specific interactions between Dl and D2 (Lehmeier et al., 1994) or B/B' and D3 (V.A.R., H.H. and R.L., in preparation). Significantly, each of the common proteins appears to be capable of forming a RNA-free protein heterooligomer with one or more of the other Sm proteins (V.A.R., H.H. and R.L., in preparation). The nature of the numerous protein-protein interactions in the snRNP core, as well as sequence motifs potentially involved in these protein-protein interactions, remains unknown. It is also not clear how the common proteins interact with the Sm sites of the Ul, U2, U4 and U5 snRNAs. Given the evidence for a two-step assembly pathway of the snRNP core (see above), the Sm proteins E, F, G, and one or more of the D proteins, are prime candidates for the initial RNA-protein interactions at the Sm site. Consistent with this idea, a 15-25 nucleotide long stretch of RNA, which contains the Sm site and remains stably associated with the E, F, G and D proteins, is observed after extensive digestion of snRNPs with micrococcal nuclease (Liautard et al., 1982). Moreover, upon UV light irradiation of in vitro reconstituted Ul snRNP particles, the G protein is efficiently cross-linked to the AAU stretch within the 5 '-terminal half of the Sm site (Heinrichs et al., 1992), suggesting that it is directly involved in the recognition of the Sm site. Finally, the common snRNP proteins have also attracted much attention because they represent the major autoantigens against which patients suffering from systemic lupus erythematosus (SLE) often produce the so-called anti-Sm auto-antibodies (Lerner and Steitz, 1979; Tan, 1979). The common proteins are, therefore, also denoted Sm proteins. While the three D proteins and B/B' are the major targets of anti-Sm antibodies, SLE patients also produce auto-antibodies against the E, F and G proteins

(Lemer et al., 1981; Habets et al., 1985; Lehmeier et al., 1990; Reuter et al., 1990). However, the chemical nature of the Sm epitopes is not yet clear (reviewed by Rokeach and Hoch, 1992). The cross-reaction of anti-Sm antibodies with multiple common proteins-the monoclonal anti-Sm antibody Y12, for example, reacts on immunoblots with B, B', Dl, D3 and E (Lerner et al., 1981; Lehmeier et al., 1990)-would suggest that the common proteins share certain structural features. Initial cDNA cloning of the human E, Dl, D2, D3 and B/B' proteins (Stanford et al., 1988; Rokeach et al., 1988; van Dam et al., 1989; Lehmeier et al., 1994; reviewed by Rokeach and Hoch, 1992) has provided some information to support this. For example, B and B' arise from the same gene by alternative splicing and have identical sequences except for nine additional amino acids at the C-terminus of B' (van Dam et al., 1989). Moreover, significant homology (29% identity) was observed between DI and D3 (Lehmeier et al., 1994). Homologous regions include the GR repeat at the C-terminus of DI and D3 which has been suggested to be an Sm epitope in Dl (Hirakata et al., 1993). However, structural motifs shared by all of the common snRNP proteins have not yet been reported. As a prerequisite to address the various questions raised above concerning the structure and function of the common snRNP proteins, we have cloned the cDNAs for the human F and G proteins. Similarly to the D, B/B' and E proteins, the F and G proteins do not possess any of the known RNA binding motifs such as an RNA recognition motif. Alignment of the sequences of all of the human Sm proteins revealed that they share two homologous regions (Sm motifs 1 and 2) that are 32 and 14 amino acids long. The Sm motifs are evolutionarily highly conserved among all of the putative Sm protein homologues from various organisms which were identified in the data base. Finally, we provide experimental evidence that the Sm motifs play a role, at least in part, in Sm protein-protein recognition events. The data presented here may prove important for future studies aimed at determining the B-cell autoimmunizing epitopes of the Sm proteins.

Results Isolation of cDNA clones encoding human snRNP proteins F and G Ul, U2, U4/U6 and U5 snRNPs were isolated from HeLa nuclear extracts by immunoaffinity chromatography with anti-m3G antibodies. The individual snRNP proteins were fractionated by preparative SDS-polyacrylamide gel electrophoresis, and the F and G proteins were electroeluted. The G protein often migrates as a doublet during SDSPAGE; the two closely spaced G bands were eluted individually. Since the N-termini of the proteins were blocked, amino acid sequences were derived from various peptides generated from each protein by tryptic digestion or cyanogen bromide cleavage. Identical peptide sequences were obtained from the two G bands. Based upon the sequence of these peptides, primers were synthesized and cDNA clones were isolated from a HeLa kgtl 0 cDNA library using a RACE PCR procedure (Frohmann et al., 1988). The sequences of the cDNAs for the human F and G snRNP proteins are shown in Figures 1 and 2, respectively. The partial sequences that are underlined indicate the

2077

H.Hermann et al. tctggccatttctcttgaaactgcggctcgggacctgcggtacctgctgtagtcac

56

57

gaggaacgggcggcggctggtcggcagagagtagcctgcaacattcggccgtggtttac

115

1

116

Met Ser Leu Pro Leu Asn Pro Lys Pro Phe Leu Asn Gly Leu Thr ATG AGT TTA CCC CTC AAT CCC AAA CCT TTC CTC AAT GGA CTA ACA

15 160

16 161

Gly Lys Pro Val Met Val Lys Leu LYaTrp ly met G_iu Tyr Lvs GGA AAG CCA GTG ATG GTG AAA CTT AAG TGG GGA ATG GAG TAC AAG

30 205

31 206

Gly Tyr Leu Val Ser Val Asp Gly Tyr Met Asn Met Gln Leu Ala GGC TAT CTG GTA TCT GTA GAT GGC TAC ATG AAC ATG CAG CTT GCA

45 250

46 251

Asn Thr Glu Glu Tyr Ile Asp Gly Ala Leu Ser Gly His Leu Gly AAT ACA GAA GAA TAC ATA GAT GGA GCT TTG TCT GGA CAT CTG GGT

60 295

61 296

Glu Val Leu Ile Arg Cys Asn Asn Val Leu Tyr Ile Arg Gly Val GAA GTT TTA ATA AGG TGT AAT AAT GTC CTT TAT ATC AGA GGT GTG

75 340

76 341

Glu Glu Glu Glu Glu Asp Gly Glu Met Arg Glu * GAA GAA GAG GAA GAA GAT GGG GAA ATG AGA GAA TAG catcttttgtg

86 387

388

ggggattttttttatatatatttctagacaataaagatttgtttgtttttcaacttgaa

446

447

aaaaaaaaaaaaaaaa

462

1

Fig. 1. Nucleotide and predicted amino acid sequence of F. The peptide sequences obtained by microsequencing of the native protein are underlined. The potential poly(A) addition signal in the 3' untranslated region is underlined. The termination codon is indicated by an asterisk. Numbering starts with the first nucleotide or amino acid. The EMBL Data Library accession number of this sequence is X85372. 1

tagacgccgggcctacagcgggag

24

25

gctgaggaaagccgtgcgttgcgttccaaggacatctgtgagcccgcggagtatacacc

83

1

84

Met Ser Lys Ala His Pro Pro Giu Leu Lys Lys Phe Mt Asp Lys ATG AGC AAA GCT CAC CCT CCC GAG TTG AAA AAA TTT ATG GAC AAG

15 128

16 129

s Lau Asn Gly Gla Ar Hi V ln Gl AAG TTA TCA TTG AAA TTA AAT GGT GGC AGA CAT GTC CAA GGA ATA

30 173

31 174

A uLouVal li Asp Glu Cys Leu Arg Gly Phe Asp Pro Phe M TTG CGG GGA TTT GAT CCC TTT ATG AAC CTT GTG ATA GAT GAA TGT

45 218

46 219

Val Glu Met Ala Thr Ser Gly Gln Gln Asn Asn Ile Gly Met_Val GTG GAG ATG GCG ACT AGT GGA CAA CAG AAC AAT ATT GGA ATG GTG

60 263

61 264

Vai Tle Arg Gly Asn Ser Ile Ile Met Lou Glu Ala Lou Glu Arg

75 308

76 309

ya.

GTA TAA ataatggctgttcagcagagaaacccatgtcctctctccatagggcctgtt

365

366

tactatgatgtaaaaattaggtcatgtacattttcatattagactttttgttavataa cttttgtaatagtcaaaaaaaaaaaaaaaaa

425

425

GTA ATA CGA GGA AAT AGT ATC ATC ATG TTA GAA GCC TTG GAA CGA *

455

Fig. 2. Nucleotide and predicted amino acid sequence of G. The peptide sequences obtained by microsequencing of the native protein are underlined. The potential poly(A) addition signal in the 3' untranslated region is underlined. The termination codon is indicated by an asterisk. Numbering starts with the first nucleotide or amino acid. The EMBL Data Library accession number of this sequence is X85373.

various oligopeptides found by microsequencing of peptides derived from the gel-purified F and G proteins; these provide independent confirmation of the identity of the cDNA clones. Additional evidence that the two cDNAs encode bona fide F and G protein was obtained from the SDS-PAGE migration behaviour and immunoreactivity of in vitro translated F and G proteins. In vitro translated F protein co-migrates with F protein isolated from HeLa cells and can be immunoprecipitated with an SLE patient serum (M.M.) that reacts on immunoblots with the F protein (Figure 3, lanes 2 and 4). Interestingly, G protein which has been translated in vitro from a single mRNA, 2078

migrates in high-TEMED, SDS-polyacrylamide gels as a closely spaced double band, and thus exhibits the same behaviour as the G protein from HeLa snRNPs (Figure 3, lanes 1 and 3). A rabbit antiserum raised against a synthetic peptide derived from the C-terminus of the G protein precipitates both G bands (Figure 3, lane 5), indicating that the two bands do not differ substantially in their Cterminal regions. Moreover, rabbit antisera raised against the recombinant G protein reacts on immunoblots with both bands of authentic G snRNP protein (not shown), supporting the idea that the two G bands represent proteins which are structurally related, if not identical.

Conserved sequence motifs in Sm proteins

M

F

M.M. 3805 F (G G

NIS F G

70K- I

B' B

d

C- :af D3 .. .. D2 -W Dl1 E_ F-

F-

ONWAKWO

1

AVW

4..

.W

:::.

2

3

4

5

6

7

Fig. 3. F and G proteins translated in vitro behave similarly to native F and G proteins. Lane 1, total proteins isolated from HeLa snRNPs and stained with Coomassie Blue, shown as a marker; lanes 2-3, 35S-labelled, in vitro translated F and G, respectively, equivalent to 40% of the amount used in the immunoprecipitations assay shown in lanes 4-7; lane 4, M.M. (a SLE patient serum) and F; lane 5, 3805 (a rabbit serum) and G; lane 6, a non-immune serum (NIS) with F; lane 7, NIS with G. Proteins were fractionated by electrophoresis on a high-TEMED, SDS-12.5% polyacrylamide gel and visualized by fluorography.

Primary structures of the human F and G proteins The F cDNA is 462 nucleotides long and contains an open reading frame that encodes a predicted 86 amino acid protein with a calculated molecular mass of 9725 Da and a theoretical pl of 4.6 (Figure 1). The in-frame TAG stop signal at position 374-376 is followed by a putative polyadenylation signal (AATAAA) at position 417-422, 23 nucleotides upstream from the poly(A) tract. The F protein is particularly rich in acidic residues (16% of its residues are Asp or Glu) and is unique in this respect among the core snRNP proteins. We note that glutamic acid (11 residues) is more prevalent than aspartic acid (three residues). Most of these are located within the Cterminal half, with a cluster of six consecutive acidic residues close to the C-terminus (residues 76-81). The F protein is further enriched in aromatic amino acid residues, in particular tyrosines, which are largely concentrated (five out of seven) just before the middle of the protein's sequence. The G clone is 455 nucleotides long and has an open reading frame encoding a predicted protein of 76 amino acid residues, with a calculated molecular mass of 8496 Da and a theoretical pl of 8.9 (Figure 2). An in-frame TAA termination signal at position 312-314 is followed by a putative polyadenylation signal (AATAAA) at nucleotides 420-425. The G protein is rich in hydrophobic amino acids (34%), most of which are located in the C-terminal half (Figure 2). The N-terminal portion of the sequence contains a larger number of positively, as opposed to negatively, charged residues (eight K or R and two D or E residues within positions 1-33), while in the rest of the sequence, the opposite is true (two R and six D or E residues within positions 34-76). Neither F nor G possesses any significant homology with known consensus sequences, such as the RNA recognition motif (RRM), that might give a clue to their function (see also below).

Northern blot analyses performed with radiolabelled F and G cDNAs and poly(A) RNA isolated from HeLa cells, revealed mRNAs of -500 nucleotides in each case, which correlates in both cases with the length of the respective isolated cDNA (data not shown).

Identification of two consensus sequence motifs shared by the human Sm proteins of spliceosomal snRNPs With the cloning of the cDNAs for the human F and G proteins, the sequences of all human Sm snRNP proteins (B/B', Dl, D2, D3, E, F and G) were now available for sequence comparison. We thus investigated, by sequence alignment, whether the core proteins, or subsets thereof, shared certain structural features or even sequence motifs (see also Introduction). The sequence alignments were performed manually and with the DNA Star computer program using the Clustal method. As shown in Figure 4, all Sm proteins share two regions of homology, which we shall refer to as Sm motifs 1 and 2. The N-terminal motif spans 32 amino acids, and 16 positions in this motif are occupied by identical or biochemically related amino acids in the different core proteins. Several positions consist of highly conserved amino acid residues. For example, positions 13, 22 and 23 are invariably occupied by glycine, methionine and asparagine, respectively, with the exception of D2, where at position 22 methionine is replaced by another sulphurcontaining residue, namely cysteine. Positions that are occupied preferentially by hydrophobic amino acids (positions 1, 3, 11, 15, 18, 24, 26, 29 and 32) are found more frequently than positions with hydrophilic amino acids (8, 19, 23 and 31). Positions 19 and 31 are occupied exclusively (19) or preferentially (31) by acidic amino acids, while two others (11 and 18), are often occupied by aromatic residues. The second consensus sequence, Sm motif 2, which is closer to the C-terminus, is shorter, spanning only 14 amino acids (Figure 4). Central to this motif is the consensus sequence IRGXNI. Minor deviation from this consensus sequence is seen in protein E, where lysine replaces arginine, and F, where cysteine replaces glycine. The IRGXNI sequence is usually preceded by three hydrophobic amino acids and followed by two hydrophilic amino acids, most frequently N or D. The other four conserved residues of Sm motif 2 are generally occupied by hydrophobic amino acids (Figure 4; positions 1, 11, 13 and 14 in Sm motif 2). The two consensus sequences are separated by eight spacer amino acids in four out of seven core proteins, namely D1, D3, F and G, and by nine in the case of E. The spacer regions are somewhat longer in B/B' and D2 (18 and 22 residues, respectively) (Figure 4). It is also interesting to note that Sm motif 1 commences within the first 15-20 N-terminal amino acids of B/B', Dl, D2, D3, F and G. In D2 and E, on the other hand, Sm motif 1 is preceded by 30-40 amino acid residues. The Sm motifs are evolutionarily highly conserved Given that all human spliceosomal Sm proteins share the Sm motifs, we were interested in determining whether these consensus sequences were also conserved in snRNP

2079

H.Hermann et aL Sm motif 1

Sm motif 2

.. i' 33 I . 4,1 3 1 14 I8 13 ;i 1Hi 22 i4:' H.sap -MTVGKSSKMLQHIDYRMRCILQDGRIFI ~FKAFDRHM ILCDCDEFR1CIKPKNSKQAEREKRV----LGLVLLRGENLVSMTVEGPPPKDTGIARVPL _

3

B D2

K.

map.

MSLLNKXPEEFETGPLSVLTQS

"'.

97

1 8 NVE118TEVPKSGRGKKKSKPVN1DRY S1( r;xassvIVLRNPLIAGK 1

VXNNTQVLINCRNNKXLL

EMrPEEIQK

DI D3

H.a

H.

-MKLVRFLMKLSHETVTIELKNGTQVH

TITGVDVSN

HLKAV1MTLIKNRHPVQ -L---LTLSIRGNNIRYPILPFDSLPLDTLLVDVEP *

XIGFD3EY

VLDDAIHSXTKSR1Q --------LRI

85

sap.-MSIGVPIXVLHEAEGHIVTCETNTQEVYR

E

H.

F

H.sap.

0

HElap

sap.

MAYRGQGQKVQKVMVQPINLIFRYLONRSRIQVWLYEQVNMRI

MSLPLNPKPFLNGLTGKPVMVKLKWGMEY.K

QLANTEEYIDGALSGH

.......... .........,

w

..,,

..

Core w.U.t1|.... |.

UzUUztfi|%Z..............

....

UA

......

tE.U

..r.

U

.

....E

FD

--------------

L. t E

U ..U Z.U

..

.Z

-

-0E-----I-LGICLYXRLGVEYEEVEEEEEDG1MRE

MSKAHPPELKKFMDKYLSLKLNGGRHVOZLRGFDPFX5WIDECVEMATSGQQNN -w .w

.-

b.......

LVSVDGYN

DNITLLSVSN--

IGMVVXRGNSIIMLEALERV -----------

.....LG.V.IRG.NT L ..Uz ....U .z

......................

9

86

....................

...

..............

Fig. 4. Alignment of the Sm motifs I and 2 in the B/B', D2, Dl, D3, E, F and G core proteins from Homo sapiens. Bold letters shown in white represent 100% conserved amino acids, and those in black, amino acids which are identical in >50% of the sequences; these residues are shown in the uppermost consensus sequence at the bottom ('core'). The positions of biochemically conserved residues in each motif are numbered and highlighted by the following vertical shading: light blue = uncharged, hydrophobic amino acids (L, I, V, A, F, W, Y, C, M; designated U in the consensus sequence of biochemically conserved residues at the bottom) as well as uncharged, hydrophobic amino acids plus T and S (designated Z in the consensus sequence at the bottom); green = acidic amino acids (D, E); dark blue = 100% conserved G or N; violet = basic amino acids (R, K); turquoise = 80% conserved glycine. The position of the last amino acid of each sequence is shown at the right. Sequences marked with a diamond are not shown in their entirety. Sequences were obtained from the following references: B/B', van Dam et al. (1989); D2 and D3, Lehmeier et al. (1994); Dl, Rokeach et al. (1988); F and G, this work; and E, Stanford et al. (1988).

proteins from evolutionarily distant organisms. For this purpose, we carried out an extensive data base search for putative homologues of all of the human core proteins. A data base search with the human F and G sequences revealed several protein sequences from various species that displayed significant homology with either F or G. In Figure 5A, the protein sequences of putative F homologues from Caenorhabditis elegans, Drosophila melanogaster, the rice Oryza sativa and the chinese cabbage Brassica campestris, which display 70%, 78%, 78% and 76% identity, respectively, with the human F sequence, are aligned (see Figure 5 legend for the data base accession numbers or published references). Interestingly, the data base search yielded a partial protein sequence from Saccharomyces cerevisiae, 48 residues in length, that displays conspicuous homology with the human F protein (54% identity) in the region encompassing amino acids 30-77 (Figure SA). In fact, the sequence IRCNNVLYIR (residues 64-73) of the human F protein is completely conserved in the yeast protein (see also below). It is therefore highly likely that this protein sequence represents part of the yeast homologue of the human snRNP F protein. According to the DNA sequence, this yeast protein is located in the 5' untranslated region of a gene encoding a GDP binding protein, dolichol phosphate mannose synthase (Orlean et al., 1988). Recently, the putative yeast F protein gene which encodes this partial protein sequence has been cloned by B.Seraphin. He could show that this protein was stably associated with the spliceosomal U 1, U2, U4/U6 and U5 snRNPs but not with the individual U6 snRNPs, reinforcing the idea that the partial protein sequence is part of the authentic F protein (Seraphin, 1995). Consistent with the clear homology between the human F protein and its putative yeast counterpart, antibodies raised against the human recombinant F protein strongly cross-react on immunoblots with an 11 kDa protein present in UsnRNPs purified from yeast cellular extracts by anti-m3G immunoaffinity chromatography (P.F. and R.L., unpublished data). Three putative homologues of the human G protein core

2080

were identified in the data base: one from the plant Medicago sativa, the second from the rice O.sativa and the third from S.cerevisiae. The two plant proteins display 54% (M.sativa) and 58% (O.sativa) sequence identity with their human counterpart. The yeast protein is 77 amino acids long and has previously been called SNP2. Since SNP2 and the human E protein resemble one another to some extent (38% identity), SNP2 was previously denoted an E-related protein (GenBank L31794). However, given the clearer resemblance to the human G protein (51% identity), it should be more appropriately considered a yeast G homologue. Recently, it has been demonstrated that this protein is indeed an integral protein in yeast spliceosomal snRNPs (Seraphin, 1995). With the exception of the D2 protein, we have also identified one or more highly homologous protein sequence for each of the previously published human core proteins, B/B', N, D1, D3 and E. The putative human E protein homologues from mouse, chicken, rice and Arabidopsis thaliana display between 64% (A.thaliana) and 100% (mouse) identity with their human counterpart. The yeast counterpart of the human D3 protein had previously been identified (Lehmeier et al., 1994; Roy et al., 1995). The yeast D3 sequence was originally published as a 5' flanking gene of the yeast PEP3 gene (Preston et al., 1991). The protein sequences of the putative human Dl homologues from rice and mouse display 78 and 99% identity with the human Dl protein sequence. The overall homology of the yeast Dl protein with that of human is considerably lower (35% identity; Rymond et al., 1993). Using a gene replacement strategy, it has been demonstrated that part of the human DI protein can functionally substitute for its yeast counterpart (Rymond et al., 1993). Finally, Figure SA shows an alignment of the N-terminal -90 amino acids of the B and N homologues from man, mouse, rat and Drosophila, a region where these proteins are almost 100% conserved. The alignment of the sequences of all of the putative spliceosomal snRNP core proteins from various species reveals striking evolutionary conservation of the Sm motifs

Conserved sequence motifs in Sm proteins A

11

Sm motif 1 223 ; 1:3 1 F 8 9 2 2 24 26

Sm motif 2 3 I

A 1 2 4 5 t;18H 711 : 3 8 29 32 B N.sap.-M--TVGXSSKMLQHIDYRMRCILQDORIF I LILCDCDEFRKIKPKNSKQAER!EKRV ----LGLVLLRENLVSMTVEIGPPPKDTGIARVPL * B Mus -mMTVGKSSKMLQHIDYRMRCILQDGRIF: I rFKAFDKHN ILCDCDKFRXIKPKNSKQAER5EKRV ----LGLV3LRLENLVSMTVSGPPPXlTGIARVPL * a. B Droso TIGKCNNXMIQHLNYRVRIVLQDSRTF I LILGDCEEFKXIRSKNSKVPEREEKRV ----LOFVLLWNIVSLTVEGPPPPEEGLPRVPI* N H.sap.-MTVGKSSKMLQHIDYRMRCILQDGRIFI I LILCDCDEFRKIKPKNAKQPEREEKRV----LLVLLRQENLVSMTVEGPPPKDTGIARVPL* N Rat. MTVGKSSKMLQHIDYRMRCILQDGRFF: rFxAFrDXH ILCDCDEPRXIIKPXNNAQPEREEKRV---- LLVLLRG!NLVSMTVEGPPPKDTGIAARVPL* N Mums-------MTVGKSSKMLQHIDYRMRCILQDORIIF lWFKAFDNXHJ ILCDCDEFRXIKPXNAKCQPEREEKRV---- LLVLLRGSENVSMTVEGPPPKDTGIARVPL*

II3

I

I

I

'I

97 97 97 97 97 97

D2 H .sap. MSLLNXPEEEEFNTGPLSVLTQSVKNNTQVLINCRNNXKL It VKAFDRHCN' VLENVKEMWTEVPKSGKGKKXSKPVNXDRYISKMFLKG;DSVIVVLRNPLIAGRK 118 EMTPEELQK Dl H.sap.MKLVRFLMXLSHETVTIELKNGTQVE rITGVDVSN rHLKAVKMTLKNREPVQLETLSIRO;NNIRYFILPDSLPLDTLLVDVEP 85 Dl Mum- MKLVRELMiCLSHETVTIELKDGTQVE !HITGVDVS' r THLKAVKMTLXNREPVQ -LE----------85 L:TLS IGaNNIRYFILPDSLPLDTLLVDVEP DI O.sat. -------MCKLNNETVTIELKNGTVVE --p 69 rHLITVKLTLXGKNPVT-------------- LDIHLtSVGNNIRYYILPDSLNL Dl S.cer. --XLVNXLNXLRNEQVTIELKNGTTVV rLQSVSPQN 6ILTDVKETLPQPRLNKL;QQPTASKDNIASLOYINIIGNTIRQIILPDSLNLIDSLEVD0KQ *113 NSNCSIAMA$I1YL'rG D3 M.sap --8e 8 , MSIGVPIKVLHEAEGHIVTCETNTGEVYF LIEAEDNI. QMSNITVTYRDGRVAQ---LEQVYIRGOKIRPLILPDMLKNAPMLKSMXN D3 S.cer.--------------M4TMNGIPVKLLNEAQGHIVSLELTTGATYI KLVESEDSN VQLRDVIATEPQGAVTH -MDQIrVRGSQIKFIVVPDLLKNAPLFKKNSS* 89 E E E E E F F F F F F

G o G G

I

II

G.gal. 0. sat. Arabi.

MAYRGQGQKVQXVMVQPINLIFRYLQNRSRIQVWLYEQVNMRIE IIIGFDEYN .VLDDAEEIHSKTXSXQ -LGRZMLXGDNITLLQSVSN ---------- IVMVQPINLIFRYLQNRSRIQVWLYEQVNMRI L- RIML1WPDNITLLASVSN eIIGFDEY VLDDAEKIHSKTKSRK2 IIGFDEY MAYRGQGQKVQKVM4VQPINLIFRYLQNRSRIQVWLYEQVNMRI LDDAESIHSKTKSRXQ --- LGRIMNLGDNITLLQSVSN---IIGFDEY IVLDEABEINIXKDTRXS ASTKVQRIMTQPINLIFRFLQSKARIQIWLFEQKDLRI LGRILLKGDNITLMMNTGK ------LRILLKGDNITLMMNTGKVLDEAEEVSSRRTPGKPITGFDEY ----MASTPVQRIMTQPINLIFRFLQSKARIQIWLFEQKDLRI

H.sap.

--------------

H. sap.

Mum

Caele. Droso. O.sat. B. cam. S.car.

-

------------

NATVPVNPKPFLNNLTGKPVIVXLKWGME

-

MATIPVNPKPFLNNLTGKTVIVKLKWGMEY

-_ ---------------

0. sat. S.cer.

RGFDFMIfWI D IGMlV2NSVMIEAU EPMPT -P-----. VW NTVVNGNDKT w IV. VMPANL-GL--------w ------------------ MVSTPELXKYMDKKILLNINGSRXVJ -Q Q VR N T BE L AINQ-----------I IZJDA rLRGYDIFL! WV 6VLDDAMBINGEDPAN HL................ V.G... G...G. .... .FD. MN......E.. ............LG. ... .RGN............

Core

MSKAHPPELKKFMDKKLSLKLNGGRHV MSTSGQPPALKKYMDKQLQINLKANRMI l MSRSGOPPDLKKYMDKMKLOILNANRVI

92 88 88

LRGFDPF4I VDECVEMATSGQQNN -------------- IGMVVIGNSIIMLEALERVIGMWVVIRGNVVTVEALEPVVNRIGLRGIGDQF3. VDNTVlEVNGNEKND -

H. sap .

M.msat.

.

P 81

LVSVDGYM QLANTEEYIDGALSGH -L--LOEVLIKCNNVLYIRGVEEEEEDGEMRE--LVAVDSYM GLAHAEEYIDGNSQGN ---- LGEILIRCNNVLYVGGVDGENETSA ------LGEVLICNNVLYIKGMEDDDEEGECATSQW* LVSVDGYM QLANTEEVIEGSVTGN LVSVDSYM QLANTEZYIDGQFSGN ----LGEILIRtCNNVLYLRGVPEDAEIEDAE---LASVDSYM QLGNTEEYIDGQLTGN -W-E-----EILIRCNNVLYGRGVPEDEELGDAXQD-LVSTDNYF' LNEAEEFVAGVSHGT -------LGEIFIRCNNVLYIRELPN -p-P

MSLPLNPKPFLNGLTGKPVMVXLKWGMEY MSAVQPVNPXPFLNSLTGKFVVCKLKWGMEY MSAGMPINPKPFLNGLTGKPVLVXLKWGQEY

-------------

92 -

It

85

91 B6

88 48

76

81 80 77

-

---------------

-

L

86

---

.Zw..,.UJ.Z.U1J.......... 2.eZ., .1.U.. UZ.ZUl.... U.CJZU .u.t..........1.......,,UU

B Arabi.

Z27273 S. cer.

Z35787

0. sat. D15230 B. cam. L33514

------------------

MLPLSLLRTAQGNPMLVELKNGETYN LVNCDTW

HfLRQVICTSKDGDRLWR -------------

MTECYIRGtNTIKYLRVPDEVIDNVQEEKTRT *

86

MLFFSFFXTLVDQEVVVELKNDIEI4rLQSVDQFIACLDNISCTDEKKYPHLGS -VRNIPFIGtSTVRYVYLNKNMVDTNLLQDATR * B7 79 ---------------MASAGPGLESLVDQIISVITNDGRNI ALRGFDQAI ILDESHERVYSTREGVQQLV --LOLYII)GDNCSVVGEVDEE ---------------- MSGR1ETVLDLAKFVDKGVQVKLTGGRQVT GLK;YDQLLWVLDGALBSVRDHDDPLKTTDQTRR- LGLIVCXIGTAVMLVSPTDGTEEIANPFNQPE * 97

------------------

Fig. 5. (A) Alignment of the Sm motifs 1 and 2 in the core proteins B/B', N, D2, Dl, D3, E, F and G from various species. As described in Figure 4, 'core' is the consensus sequence of identical residues present in >50% of the sequences. The consensus of biochemically conserved residues in each motif (U = uncharged amino acids; Z = U + S and T) is shown at the bottom. Vertical shading is described in Figure 4. The position of the last amino acid of each sequence is numbered on the right. Sequences marked with a diamond are not shown entirely; P indicates partial proteins. Abbreviations used are: H.sap., Homo sapiens; Mus, Mus musculus (mouse); Droso., Drosophila melanogaster, Rat., Rattus norvegicus; O.sat., Oryza sativa (rice); S.cer., Saccharomyces cerevisiae; G.gal., Gallus gallus (chicken); Arabi., Arabidopsis thaliana; Cele., Caenorhabditis elegans; B.cam., Brassica campestris (cabbage); M.sat., Medicago sativa.. The sequences were obtained from the following published references or data base accession numbers: H.sap. B/B', van Dam et al. (1989); Droso. B, Brunet et al. (1993); Mouse B, Griffith et al. (1992); H.sap. N, Rokeach et al. (1989); Rat N, Schmauss and Lerner (1990); Mouse N, GenBank X63730; H.sap. D2 and D3, Lehmeier et al. (1994); S.cer. D3, Preston et al. (1991), H.sap. Dl, Rokeach et al (1988); Mouse Dl, Mitsuda et al. (1992); O.sat. DI, expressed sequence tag (EST), GenBank D15099; S.cer. Dl, Rymond et al. (1993); G.gal. E, Fautsch et al. (1992); H.sap. E, Stanford et al. (1988); Mouse E, Fautsch et al. (1992); O.sat. E, EST, GenBank D23404; Arabi. E, EST, GenBank Z29055 (note that the translation of this expressed sequence tag gave a frameshifted protein that was rectified by deleting one single nucleotide in the DNA sequence); H.sap. F, this paper; Droso. F, Vincent et al. (1990); Cele. F, Wilson et al. (1994); O.sat. F, EST, GenBank D22902; partial sequence for S.cer. F, Orlean et al. (1988); H.sap. G, this paper; M.sat. G, Hirt et al. (1992), O.sat. G, EST, GenBank D15363; S.cer. G, GenBank L31794. (B) Alignments of Sm motifs I and 2 of proteins that probably do not belong to the major Sm protein family. Abbreviations of the species names are as described above; the names in the figure are followed by GenBank accession numbers. The positions of biochemically conserved residues in each motif are highlighted as described for Figure 4. 1 and 2 (Figure 5A). Conservation is maintained across evolution at a number of positions within motifs 1 and 2 that are highly conserved among the human Sm proteins. Most notable is the 100% conservation of the glycine and asparagine residues at positions 13 and 23 within Sm motif 1 (Figure 5A). It is interesting, and not unexpected, that deviations from the human consensus sequence found in particular human Sm proteins are also observed in other species. For example, the human F protein is the only human protein lacking a glycine residue at position 8 in box 2 (in the consensus sequence block IRGXN); instead, it has cysteine (IRCNN). This major substitution 'is also seen in the putative F proteins from Drosophila, C.elegans, rice and yeast, each of which has the sequence IRCNN. A similar principle is seen with the E protein. In this protein, the otherwise universal arginine at position 7 of Sm motif

2 is replaced by lysine in all organisms for which a putative E homologue has been identified (man, mouse, chicken and several plants).

Sm motifs in proteins that possibly do not belong to the major Sm protein family Three hypothetical proteins from the plants A.thaliana, O.sativa and B.campestris and one from the yeast S.cerevisiae that clearly contain the conserved Sm motifs 1 and 2, including the 100% conserved G and N residues at positions 13 and 23 of Sm motif 1 (see alignment of their protein sequences in Figure 5B), were found in the data base (Figure 5B; sequences are denoted by their accession numbers). Their protein sequences, however, exhibited only comparatively low resemblance to one of the human Sm proteins. For example, the Arabi. Z27273 protein displays 44% identity with the human F protein 2081

H.Hermann et al.

in a short overlap of 38 amino acids restricted to the Nterminal portion. However, when the entire sequence is compared, the identity drops to 32%. Given the strong evolutionary conservation of F between man, Drosophila, C.elegans, rice and cabbage (70-78%, see above), it is unlikely that this protein represents a true A.thaliana F homologue. A similar situation is observed when the protein sequence of Arabi. Z27273, as well as that of O.sat. D15230, B.cam. L33514 and S.cer. Z35787 are aligned with the sequence of any other human core protein: the degree of homology generally ranges only between 20 and 35%. Therefore, given the lack of direct experimental data indicating otherwise, we do not consider these four proteins to be true homologues of one of the eight human spliceosomal Sm proteins.

The Sm motifs are essential for complex formation in vitro between the B and D3 proteins The strong evolutionary conservation of the Sm motifs among all snRNP core proteins suggests that these sequences play an important role in the Sm proteins' structures and/or functions. We have been especially intrigued by the possibility that the Sm motifs may, at least in part, be involved in protein-protein interactions between the common proteins, since the structure and biochemical stability of the snRNP core is largely determined by interactions of this kind (see Introduction). Of particular interest in this respect is our finding that each of the snRNP core proteins is capable of forming a RNAfree protein heterooligomer with one or more of the other core proteins. For example, specific interactions have been observed between Dl and D2 (Lehmeier et al., 1994) as well as between the B or B' and D3 proteins (V.A.R., H.H. and R.L., in preparation). We have employed the B'-D3 interaction in order to test whether the Sm motifs are involved in protein-protein complex formation. In the 240 amino acid long B' protein (B' is identical to B except that it is nine amino acids longer, see Introduction), the Sm motifs 1 and 2 are found in the amino-terminal domain of the protein, spanning the region between amino acid positions 17-48 (Sm motif 1) and 67-80 (Sm motif 2) (see Figure 4). If the Sm motifs are essential for the interaction between D3 and B', the C-terminal region of B', consisting of amino acids 93-240 [B'(93-240)] and lacking the Sm motifs, should not be capable of interacting with D3. If, however, the Sm motifs are necessary and sufficient for this interaction, the N-terminal region of B' [B'(1-93)] should stably interact with D3. We have investigated the interaction between B' mutants and D3 by co-immunoprecipitation analysis. Amino acid residue 93 of B' was chosen as the end and beginning of the Nand C-terminal fragments of B', respectively, because the region surrounding amino acid 93 is proline-rich and appears to function as a linker between the N- and Cterminal domains (van Dam et al., 1989; see also Figure 4). Figure 6 shows the specificity of complex formation between the full-length D3 and B' proteins, taking advantage of the observation that monoclonal antibody ANA 125 precipitates full-length B' but not D3 (Figure 6, lanes 7 and 10). When B' and D3 were mixed prior to addition of mAb ANA 125, the antibody precipitated not only B', but also D3 (lane 9). In contrast, when B' and D2 were tested together, D2 was not co-immunoprecipitated, which

2082

demonstrates the specificity of the B '-D3 interaction (Figure 6, lane 14). However, full-length B' protein generally gives rise to several proteolytic fragments which react with mAb ANA1 25 (see lanes I and 7). One of these fragments co-migrates similarly to D3 on SDSpolyacrylamide gels and, thus, interferes with the analysis of B'-D3 complex formation. In order to circumvent this problem, we have incubated in vitro translated, nonradioactive ('cold') B' protein together with [35S]_ methionine-labelled D3 and incubated the proteins with mAb ANA125. As shown in lanes 11-13, D3 is efficiently co-immunoprecipitated with cold B' (compare lanes 1113 with lane 10, in which the D3 was tested alone with ANA125). Monoclonal antibody ANA 125 was used to investigate complex formation between D3 and the C-terminal fragment of B' [B'(93-240)] which still contains the epitope for ANA125. As shown in Figure 6, no co-immunoprecipitation of D3 with B'(93-240) was observed, even when increasing amounts of B'(93-240) were added to the assay (lanes 18-20). Although the truncated B' fragment contains 148 amino acids, in comparison to 126 in D3, the D3 polypeptide migrated slower than B'(93-240). This is due to the aberrant behaviour of D3 (which migrates as a polypeptide of 18 kDa but only has a predicted mass of 14 kDa; Lehmeier et al., 1990) rather than that of the truncated B' polypeptide. Further investigation of the N-terminal region of B' involved in the B'-D3 interaction could not be carried out with mAb ANA125, since it reacts solely with the Cterminal part of B'. Instead, mAb ANA128, which during immunoprecipitation recognizes B, B', D3 and B'(93240), but not the N-terminal part of B' (amino acids 1-93), was used [Figure 6, lanes 22 and 25, for D3 and B'(1-93)]. Thus, if B'(1-93) stably interacts with D3, it should be co-immunoprecipitated with D3 by mAb ANA128. As shown in Figure 6, lanes 24-28, this is indeed the case, indicating that the truncated B'(1-93) can still participate in a protein-protein interaction with D3. Moreover, the degree of co-immunoprecipitation of B'(1-93) with the D3 protein is very efficient, as is that of D3 with the full-length B' (Figure 6; compare lanes 26-28 with 11-13). However, further truncation of the B' polypeptide, so that the Sm motif 2 was removed [B' (1-63)], almost completely abolished the interaction with D3 (Figure 6, lanes 29-33). Note that, although several degradation bands are present in both the B'(1-93) and B'(1-63) translation products (lane 5 and 6; the major bands are marked with an arrow), only the major band of B'(1-93) was co-immunoprecipitated with D3 (lanes

24-28). While it is possible that a prerequisite for the B'-D3 interaction is the formation of B' homooligomers, this idea is not supported by our results. In particular, the B'(1-93) fragment was not co-immunoprecipitated with B' by mAb ANA125 (data not shown), indicating that at least B' and B'(l-93) do not interact with each other. Evidence that the Sm motifs found in D3 are likewise important for the B'-D3 interaction was obtained by ANA125 immunoprecipitation assays with full-length B' and truncated D3 fragments. The mAb ANA 125 was used, since it recognizes B/B' but not D3 (Figure 7, lanes 6 and 8). The interaction between B' and D3 is shown, using

Conserved sequence motifs in Sm proteins AN A' 125

Translates

ANA 125 ,

~

A;

q

4,4A~

q:_

14:

B' c

111) Q P^ P., 4~ ;- Q

ANA 12.8

ANA 125

1% t

o

'l;n;~

~D-3

-4~93-El~~

w4N or z % N

4;

%r

v~A

I;,1z

-oob

-

4'S OR

1

23

4

7

56

8

9

ANA 128

k A

k

I,5 4V4IQ A,

r

-t,A

amm

a.um.

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 3233

Fig. 6. Interaction between D3 and truncated B' polypeptides depends on the presence of the Sm motif 2 region, as analysed in vitro by immunoprecipitation. Both Sm motifs I and 2 are present in the truncated B'(1-93), while the Sm motif 2 has been deleted in B'(1-63) and neither of the motifs are present in B'(93-240). The 35 S-labelled, in vitro translated proteins shown in lanes 1-6 are equivalent to 40% of the amount used in the immunoprecipitation assays (lanes 7-33). Proteins used in each assay are shown above each lane. Non-radiolabelled, in vitro translated proteins are denoted with a 'c' (cold). The amount of translate added in lanes 11-13 (B'c), 18-20 [B'(93-240)], 26-28 (D3c) and 3 1-33 (D33c) is 0.5-, 1- and 2-fold higher than that used in lane 9, 16, 24 and 29, respectively; the increasing amounts are depicted with a triangle. Immunoprecipitations were carried out with mAb ANAl125 (lanes 9-16 and 18-2 1), mAb ANAl128 (lanes 22 and 24-33), or to control for the level of background precipitation, in the absence of antibody (lanes 17 and 23). ANA 1 25 recognizes full-length B' (lane 7), and the fragment B'(93-240) [denoted B'(93-E), lane 16], but not D3 (lanes 10 and 21), B8(1-93) or B'(1-63), in immunoprecipitation; ANA128 recognizes full-length B3', B'(93-240) and D3 (lane 22), but not B'(1-93) (lane 25) or B'(1-63) (lane 30), in immunoprecipitation. Proteins were fractionated by electrophoresis on a high-TEMED, SDS-12.5% polyacrylamide gel and visualized by fluorography.

Translates

ANNAI25

'9~~~~~~~~~~' CQ' -;~

-'1 Nz

S~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...

1 2 345

6

789101011212313441516

Fig. 7. Interaction between B' and truncated D3 polypeptides depends on the presence and 2, while immunoprecipitation. The truncated D3(1-90) contains both Sm motifs the Sm motif translated

increasing

proteins amounts

with that of

8)

or

completely deleted in D3(48-126). lanes 1-6 are equivalent to 40% of the

has been almost shown in

of non-radiolabelled B' (denoted

35 S-labelled

B' (see lane 6).

any of the truncated D3

fractionated

by electrophoresis

a

[D3(1-90),

high-TEMED,

lane

11;

SDS-12.5%

were

of the Sm motifs

and 2,

the Sm motif 2 has been

Proteins used in each assay

are

as

2122223

24

determined in vitro

partially

deleted in

shown above each lane. The

by

D3(1-68),

35S-labelled,

and in vitro

the immunoprecipitation assays (lanes 7-33). Triangles depict whereby the amounts used were 0.5-, I- or 2-fold in comparison with mAb ANA125, which recognizes B' (lane 6) but not D3 (lane

amount used in

added to the assays,

Immunoprecipitations

polypeptides on

'B'c')

171819 20

carried out

D3(1-68),

lane 16; and

polyacrylamide gel

D3([48-126), denoted D3(48-E), lane 21 ]. Proteins

and visualized

were

by fluorography.

2083

H.Hermann et al.

either 35S-labelled B' (lane 7) or non-radioactive B' (lane 9). After deletion of the C-terminal region of D3, a truncated D3 containing both Sm motif 1 and 2 [D3(1-90)] was still co-immunoprecipitated with B' [compare RIPAs of D3(1-90) with B', lanes 10 and 12-14, to D3(1-90) alone, lane 11]. However, when D3 was further truncated so that a part of the Sm motif 2 was removed, generating D3(1-68), co-immunoprecipitation with B' was no longer observed (lanes 15-19). Likewise, removal of the major part of the Sm 1 motif through an N-terminal deletion, producing D3(48-126), greatly reduced the co-immunoprecipitation with B' (lanes 20-24) as compared with that of D3(1-90), which contains both Sm motifs, with B' (lanes 12-14); however, D3(48-126) with B' (lanes 2224) clearly is present above background [D3(48-126) alone, lane 21]. The contribution of individual residues in the Sm motif 2 to the B'-D3(48-126) interaction has to be further analysed by point mutagenesis.

Discussion We have cloned and characterized the cDNAs for the human F and G Sm proteins and, thus, the structures of all core proteins associated with the spliceosomal snRNAs U1, U2, U4 and U5 are now available for comparison. None of the eight core proteins exhibits a canonical RNP consensus RNA recognition motif, suggesting that other types of protein-RNA interactions occur at the snRNA's Sm site. Alignment of all human core protein sequences revealed two conserved regions, 32 and 14 amino acids in length, which we term Sm motifs 1 and 2. The Sm motifs are evolutionarily highly conserved, indicating that the Sm proteins may have arisen from a common ancestor. Finally, we show that the amino-terminal regions of the B and D3 proteins, which contain the Sm motifs, are necessary for a specific B '-D3 interaction in vitro. Given that each of the remaining core proteins (i.e. Dl, D2, E, F and G) also interacts specifically with one or more of the Sm core proteins, we propose that the Sm motifs play an important role in specific protein-protein recognition events.

cDNA cloning of F and G snRNP proteins The authenticity of the F and G cDNAs described in this report is demonstrated in several ways. First, the peptide sequences derived from purified F and G proteins were all identified in the respective cDNA sequence (Figures 1 and 2). Second, the in vitro translated proteins co-migrated on SDS-polyacrylamide gels with authentic HeLa F or G proteins. In addition, the F and G proteins prepared by in vitro translation in the presence of total, purified core proteins were assembled in vitro into Ul snRNP cores (data not shown). Given our previous finding that the authentic G protein present in purified Ul, U2, U4/U6 or U5 snRNPs migrates in high-TEMED, SDS-polyacrylamide gels as a closely spaced doublet, which we tentatively denoted G and G', we were particu1irly interested in determining whether these were two distinct proteins. Several observations suggest that this is rather unlikely. For example, when we microsequenced the individually electroeluted G proteins, identical peptide sequences were obtained. Furthermore, antibodies raised in rabbits against recombinant G protein

2084

reacted with equal efficiency on immunoblots with both bands of authentic HeLa G protein (not shown). Finally, the in vitro translated G protein derived from a single cDNA clone also migrates in high-TEMED, SDS-polyacrylamide gels as a doublet. Based on these observations, we are inclined to believe that the two G protein bands represent conformational isomers of the same protein. The possibility remains, however, that the two proteins are differentially post-translationally modified, although there is at present no evidence for this. Several protein sequences from evolutionarily distant organisms that display significant homology with either the human F or G protein were identified in the data base. Alignment of the protein sequences from the various species (Figure 5A) reveals that the overall structure of both proteins has been evolutionarily highly conserved, although the degree of sequence conservation appears to be slightly higher for F than for G. While the percent of identical amino acid residues in the F protein homologues from organisms as diverse as man, worm, insect and plant ranges between 72 and 78%, this value ranges between 50 and 60% for the various G homologues. It is noteworthy that all putative F proteins share the high content (six or seven) of aromatic amino acid residues (five of these are 100% conserved) and a cluster of 5-7 negatively charged amino acid residues in their C-terminal regions (Figure 5A). As the F protein is the only acidic core protein, it may play an important structural role in the snRNP core by counteracting the high positive net charge of the other core proteins. In particular, it may stabilize and maintain the solubility of the protein heterooligomer consisting of E, F, G and the D proteins (see Introduction). In view of our previous finding that the G protein could be cross-linked to an snRNA's Sm site, it is interesting to note that G does not possess any of the currently known RNA binding motifs such as the RNA recognition motif (for review, see Burd and Dreyfuss, 1994). It should be noted, however, that the G protein most probably does not interact with the Sm site as an individual protein, but rather as part of a pre-assembled heterooligomer consisting of E, F, G and, possibly, one or more of the D proteins (see Introduction). Therefore, it may be envisioned that a more complex Sm site-RNA recognition domain, involving regions not only of the G protein, but also of other core proteins, is generated upon protein complex formation (see also below). A similar situation holds true for the RNA binding proteins, SRP9 and SRP14, and for the ribosomal proteins, S6 and S18, which have been shown to require dimerization for their binding to SRP RNA (Strub and Walter, 1990) and to 16S rRNA (Mizushima and Nomura, 1970), respectively.

The presence of two evolutionarily conserved consensus sequences in all of the snRNP core

proteins Upon alignment of the protein sequences of the snRNP core proteins B/B', N, D1, D2, D3, E, F and G, two regions of homology, 32 and 14 amino acids long, could clearly be discerned in all of them. We have termed the two conserved consensus sequences Sm motifs 1 and 2. The two Sm motifs are strikingly evolutionarily conserved in all of the putative homologues of the snRNP core proteins identified in the data base (Figure 5A). About

Conserved sequence motifs in Sm proteins

half of the positions in Sm motif 1 and even >80% of the positions in Sm motif 2 are occupied by biochemically and biophysically related amino acids. Two positions in Sm motif 1 (residues 13 and 23) which are occupied by glycine and asparagine, respectively, are even 100% conserved. It is interesting that deviations from the consensus sequences which are observed with particular human Sm proteins are also observed in other species. This is most striking for position 7 in the Sm motif 2 of the E proteins, where lysine has replaced the otherwise conserved arginine, and for position 8 in the Sm motif 2 of the F protein family, which contains cysteine instead of glycine (Figure 5A). This reinforces the idea that protein homologues from different species possess functional homology. Upon close inspection of Figure 5A, it becomes clear that there are additional positions which are occupied by evolutionarily conserved residues and, thus, characteristic for a given Sm protein. For example, the amino acids at position 4-7 in Sm motif 1 are highly conserved among the homologues of a particular Sm protein from evolutionary distant organisms; no correspondence, however, is seen at these positions when distinct Sm proteins are compared (Figure 5A). Similar patterns can be discerned at positions 16/17 and 20/21 within Sm motif 1 as well as at some positions immediately preceding Sm motif 1 (Figure 5A). In contrast to the situation described for the Sm motifs, the spacer region separating them is not strictly conserved among homologues of a particular Sm protein (Figure 5A). Therefore, these regions are less likely to play an important role in the individual functions of Sm proteins (see also below). The presence of identical or biochemically highly related amino acid residues in more than half of the positions within Sm motifs 1 and 2 strongly suggests that each motif folds into a similar structure even in different Sm proteins. Secondary structure predictions for the Sm motifs of the various Sm proteins are consistent with this hypothesis (not shown). The possibility that in a given Sm protein the two regions comprising Sm motifs 1 and 2 stably interact with each other and thereby form a functional domain is not unlikely, but has to be tested experimentally. From a strictly biochemical point of view, this would require that the Sm motif-containing regions are largely resistant towards proteinase digestion only in the intact protein but not when tested individually. Interestingly, data base searches revealed three hypothetical proteins from plants and one from yeast which clearly possess the conserved Sm motifs but display comparatively low overall homology with any of the human Sm proteins (Figure 5B). For example, the sequences of the Arabi. Z27273 and S.cer. Z35787 proteins display 32 and 37% identity with the human F protein, while the O.sat. D15230 and B.camp. L33514 protein resemble the human G protein (33 and 35% identity) more than any of the other human Sm proteins. Yet, this degree of conservation is 20-30% lower than even that of evolutionarily distant proteins which we consider to be true homologues of human F and G proteins (see Figure 5A). It is interesting to note that the Arabi. Z27273 and the S.cer. Z35787 proteins share a disinctive feature with the D I protein family, namely the sequence ELKN at position 4-7 of Sm motif 1 (Figure 5). Recently, two

distinct proteins have been characterized in the yeast S.cerevisiae which both possess the conserved Sm motifs and were each found associated with yeast U6 snRNPs (Seraphin, 1995; Cooper et al., 1995). The two U6 proteins from yeast are distinct from the four proteins in Figure SB. It is therefore possible that the hypothetical proteins shown in Figure SB may also be associated with snRNAs other than Ul, U2, U4 and U5. In any case, this suggests that the Sm motifs define a broader protein family that may extend beyond the spliceosomal snRNP Sm proteins.

The Sm motifs are involved in protein-protein interactions We have been unable to find any similarity between the Sm motifs and other known consensus sequences. Their potential function must, therefore, be discussed within the context of the structure and function of the snRNP core. One possibility would be that they represent a previously unobserved snRNA binding motif. However, this does not seem to be the most likely interpretation, for various reasons. First, the Sm site of the snRNAs does not possess repetitive sequence elements, which would be expected if it were recognized by more than one Sm protein. Secondly, only the G protein has as yet been shown to bind directly to the Sm site. Thirdly, nucleic acid binding motifs often contain a significant number of positively charged residues; however, there is little systematic occurrence of positive charges in either Sm motif. We note, however, that a positively charged RNA binding site could arise by the juxtaposition of individual positively charged side chains from different proteins in, for example, a pre-assembled E, F and G oligomer. While we cannot rigorously exclude that the Sm motifs interact with snRNA, based on present evidence it appears more probable that they are involved in protein-protein interactions, in particular as the structure of the snRNP core is determined largely by protein-protein contacts (see Introduction). Our conjecture that the Sm motifs are involved in a number of core protein interactions is strongly supported by our co-immunoprecipitation studies with truncation mutants of the Sm proteins, B' and D3. Specifically, we could show that the amino-terminal 93 amino acids of B', containing the two Sm motifs, were sufficient for stable complex formation in vitro with fulllength D3. The C-terminal part of the B' protein, on the other hand, which lacks the Sm motifs, failed to interact with D3 (Figure 6). Likewise, amino acid deletions from the C-terminus of D3 up to position 90 did not significantly decrease the capability of D3 mutants to interact with B' (Figure 7). We cannot rule out that the C-terminal regions of the B' or D3 proteins are involved in additional aspects of the protein-protein interactions in the B'-D3 heterooligomer or as part of the entire snRNP core. While the above data indicate that the intact Sm motifs are sufficient for the interaction between D3 and B', we have further evidence that they are also essential. This idea is supported by our findings that B' and D3 mutants lacking Sm motif 2 are incapable of interacting with the wildtype D3 and B' proteins (Figures 6 and 7). Also, a D3 truncation mutant lacking Sm motif 1 did not stably bind to the B' protein. Since the deletion of entire Sm motifs represents a drastic change in protein structure, future studies using more subtle Sm motif mutants are necessary

2085

H.Hermann et al.

to determine specifically which amino acids are necessary for the interactions. The studies reported here are limited to the protein-protein interaction between B' and D3; nonetheless, it is likely that the Sm motifs may play an equally important role in the specific heterooligomerization of the other core proteins as well. In summary, our results strongly suggest that the Sm motifs are involved in Sm protein-protein interactions. These interactions, however, are of a specific nature. For example, we have previously shown that D2 specifically interacts with Dl but not with D3 (Lehmeier et al., 1994). Similarily, B' interacts strongly with D3 but not with D2 (Figure 6). Thus, the specificity of the various Sm protein interactions requires that non-identical residues within or abutting the Sm motifs are involved (see above). To fully comprehend the nature and specificity of the numerous Sm protein-protein interactions, the stoichiometrical relationships of the Sm proteins at each step of the core snRNP assembly pathway must be determined. It could be possible that one or more of the Sm proteins forms homooligomers, which regulate further interactions with other Sm proteins. Nonetheless, if the Sm motifs were shown to be involved in homo- as well as heterooligomer interactions, this would not affect our principle conclusion that the Sm motifs are important for Sm protein-protein interactions.

The Sm motifs and autoimmunizing B-cell Sm epitopes The presence of conserved motifs in all of the snRNP core proteins is highly interesting in view of the role which Sm proteins play in autoimmune diseases such as SLE (see Introduction). The most important autoimmunizing B-cell epitopes remain to be identified. A possible candidate is the GR repeat at the C-terminus of DI (Hirakata et al., 1993). However, the fact that both polyclonal and monoclonal antibodies cross-react with various core proteins suggests that they share common structural elements which are important for the Sm autoimmunization process. It is therefore plausible that the Sm motifs comprise, at least in part, one or more Sm Bcell epitopes. The primordial snRNP particle may have consisted of a set of identical core proteins The strong evolutionary conservation of the Sm motifs within the set of human Sm proteins clearly suggests that these proteins may have originated from a single precursor, a 'primordial Sm protein'. This would imply that early snRNP particles contained several copies of a single protein. It can be envisaged that the diversification of Sm protein function (interaction with specific proteins, binding of m3G cap methyltransferase, nuclear localization signal; see Introduction) led to diversification of sequence and structure. This hypothesis would suggest that archaic species may have a simpler Sm protein composition; confirmation of this hypothesis awaits future biochemical investigation of snRNP particles from such species.

Materials and methods Isolation of snRNP proteins and protein sequencing Ul-U6 snRNP

particles were prepared from HeLa cell nuclear extracts by immunoaffinity chromatography with anti-m3G antibodies as 2086

described by Bach et al. (1989). Total snRNP proteins were separated preparatively by electrophoresis on high-TEMED, SDS-12.5% polyacrylamide gels (Lehmeier et al., 1990). The F and G snRNP proteins were electroeluted from the gel and cleaved by cyanogen bromide and/or trypsin, as described previously (Lehmeier et al., 1990). As the G protein generally migrates in high-TEMED, SDS-polyacrylamide gels as a closely spaced double band, each band was electroeluted and fragmented individually. Cyanogen bromide or tryptic peptides from F and G were separated by HPLC; peptide sequencing was carried out with a gasphase sequencer (model 471A, Applied Biosystems) as described by Lehmeier et al. (1990). Peptide sequences obtained for F were MEYKGYLVSVDGY and (R/K)WGMEYK. The two G bands gave rise to identical peptide sequences of MVVIRGNSIIMLEALERV, MDKKLSLKLNGGRHVQGILR, MNLVI, (R/K)AHPPELK and (R/K)KFMDKK. The underlined peptide sequences indicate overlapping amino acid stretches.

Oligonucleotide design Synthetic oligonucleotides used for the F and G RACE PCRs were: Fl (5'-TGTGAATTC-TGGGGNATGGA(G/A)TA(T/C)AA(A/G)GG-3') from the F peptide sequence WGMEYKG, F2 (5'-TTGTCGAC-CTATTCTCTCATTTCCCCATC-3') from the initial F cDNA sequence information, GC (5'-TGTGAATTC-AA(A/G)TT(T/C)GA(T/C)AA(A/G)AA-3') from the G peptide KFMDKK, G2 (5'-TTGTCGAC-TTGTCCACTAGTCGCCAT-CTC-3') from the initial G cDNA sequence information, and the linker primers LI (5'-CCATTGGATCC-TCTGCAGGAATTC-3') and L2 (5'-GAATTCCTGCAGAGGATCC-3'). Restriction sites incorporated into the primers (shown as underlined sequences) are EcoRI in Fl and GC, Sall in F2 and G2, and BamHI, PstI and EcoRI in LI and L2. Synthetic oligonucleotides used as 5' primers for producing truncated B' and D3 mutant cDNA by PCR were B' 1F (5'-ACTTCCACCATGACGGTGGGCAAG-3'), B'2F (5'-ACTTCCACC-ATGGCTCGAGTTCCACTT-3'), D3- IF (5'-ACTTCCACC-ATGTCTATTG-GTGTGCCG-ATT-3') and D3-2F (5'-ACTTCCACC-ATGGTCACATACAGAGATGGC-3'); the 3' primers used were B'3R (5'-TGCAGTCGACCTA-CTCTTCCCTTTCTGC-3'), B'4R (5'-TGCAGTCGACCTAAGCAATACCAGTATC-3'), B'5R (5'-TGCAGTCGACCTAGGG-CCTTGGTGG-3'), D3-3R (5'-TGCAGTCGACTTA-GATTTTGCTGCCACGGATGTA-3'), D3-4R (5'-TGCAGTCGACTTA-GTTTT-TATTTTTCATGCTCTT-3') and D3-5R (5'-TGCA-GTCGACTTATCTTCGCTTTT-3'). Underlined sequences are linker regions containing either a sequence homologous to the 3' end of the expression cassette and an EcoRI restriction site (in the 5' primers) or a stop codon and a Sall restriction site (in the 3' primers).

Isolation of the cDNA clones A kgtlO cDNA library constructed from the poly(A)+ mRNA from HeLa S3 cells was screened for the F and G cDNA clones using rapid amplification of cDNA ends (RACE) PCR (Frohmann et al., 1988). The 3' RACE was carried out with the specific Fl or Gl primer and an oligo(dT) primer containing a Sall restriction site at its 5' end. The amplified DNA fragments (278 nucleotides for F and 350 nucleotides for G) were fractionated on a 1%, low temperature melting agarose gel, digested with EcoRI and Sall and inserted into Bluescript KS+ (Stratagene). Sequencing was performed using the Sequenase system (United States Biochemical). The sequence information obtained was used to design the oligonucleotides F2 (nucleotides 356-376 in Figure 1) and G2 (nucleotides 222-242 in Figure 2). In order to isolate the 5' regions of the cDNAs, a synthetic linker, obtained by hybridization of the oligonucleotides LI and L2, was ligated to the cDNAs. For the 5' RACE PCR, the F2 (or G2) and LI oligonucleotides were used as primers. The PCR products (nucleotides 1-376 for F and nucleotides 1242 for G) were digested with EcoRI and Sall, cloned into Bluescript KS+ (Stratagene), and sequenced. The two fragments of the 5' and 3' RACE of the F and G cDNAs were then digested at an internal NspI site (for the F fragments) or Spel site (for the G fragments), and ligated together to generate the full-length F or G clone. These were then digested with EcoRI and Sall and subcloned into Bluescript KS+ (pBLSF and pBLSG, respectively). Verification of the DNA sequences was obtained through the parallel production of several DNA fragments at each cloning step, whose sequences always proved to be identical.

Expression PCR The cDNAs encoding the B' and D3 truncated mutants were generated from PCR amplification using either pBLSB' [containing a cDNA for B', which was cloned from a HeLa XgtlO cDNA library and proved to be identical to the previously published B' sequence from van Dam

Conserved sequence motifs in Sm proteins et al. (1989) or pBLSD3 (described in Lehmeier et al. (1994)]. Expression PCR was carried out using an expression cassette (which contains a T7 RNA polymerase promoter) and 5' expression primer (which is directed against the 5' end of the expression cassette), provided in the ExpressAmp T7 System from BRL; the PCR conditions recommended by the manufacturer were used. The initial DNA fragments for the expression PCR were produced using the primers B'lF and B'3R for B'(1-63), B'IF and B'4R for B'(1-93), B'2F and B'5R for B'(93-240), D3-lF and D3-3R for D3(1-68), D3-1F and D3-4R for D3(1-90), and D3-2F and D3-5R for D3(48-126). After amplification, the fragments were fractionated on a I%, low temperature melting agarose gel, excised from the gel and used as templates for the second PCR, in which the expression cassette was annealed to the 5' end of the DNA fragment produced in the first PCR. The amplification of the resulting cDNA templates was carried out using the 5' expression primer (BRL) and the 3' primer used in the first PCR. The products of the last PCR were purified over spincolumns (QlAquick, Qiagen) and used directly for in vitro transcription.

Immunization and antisera The open reading frame of F or G was subcloned into the BamHI and EcoRI sites of the expression vector pGEX-2T (Pharmacia). The recombinant plasmids were transformed into Escherichia coli HB101 cells, and the fusion proteins (GST-F or GST-G) were over-expressed and affinity-purified as described by Smith and Johnson (1988). Purified GST-G and GST-F fusion proteins were used for immunization of rabbits. In addition, an anti-G antibody (3805) was raised in rabbits against a synthetic carboxy-terminal peptide from G (MVVIRGNSIIMLEALERV), which was coupled to maleimide-activated keyhole limpet haemocyanin (Pierce) via an amino-terminal cysteine residue (W.Hackl, personal communication). A human patient serum exhibiting anti-F reactivity was obtained from patient M.M. (Reuter et al., 1990).

In vitro translation and radioimmunoprecipitation assays

(RIPAs) Plasmids with the cDNAs encoding D2 (pBLSD2; Lehmeier et al., 1994), D3, F or G were linearized with SalI, and the plasmid with the B' cDNA was linearized with HindIll (restriction enzymes were from New England Biolabs). The linearized plasmids for B', D2, D3, F and G, as well as the expression PCR products for the B' and D3 mutants, were used as templates for transcription with T7 RNA polymerase. One microgram of each in vitro transcribed mRNA was translated in either rabbit reticulocyte lysate (for B', D2 and D3) or wheat germ extract (for F, G and the truncated B' and D3 proteins) with an end volume of 75 Al, according to the manufacturer's instructions (Promega Biotech), in the presence of [35S]methionine (Amersham). Immunoprecipitations were carried out with 2-10 gl of translate. The in vitro translation of nonradiolabelled proteins was carried out parallel to that of the corresponding 35S-labelled protein and, thus, the amounts of non-radiolabelled and 35Slabelled translation products of the same protein were considered to be equal. For the protein-protein interaction studies (shown in Figures 6 and 7), the proteins were first incubated for 15 min at 25°C, followed by 30 min at 4°C, and then an additional 30 min at 4°C together with the appropriate antibody. This was then added to 10 ,ul of protein A-Sepharose (Pharmacia), which had been pre-incubated for 2 h with 1% bovine serum albumin in phosphate-buffered saline (PBS, 130 mM NaCl, pH 8.0). Incubation was continued in a total volume of 400 ,ul PBS for 2 h at 4°C with constant rotation. The Sepharose-bound antibodies were pelleted and washed five times with 1 ml of IPPI50 (10 mM Tris-HCl, pH 8.0, 0.1% Nonidet P-40 and 150 mM NaCl). Proteins were fractionated on high-TEMED, SDS-12.5% polyacrylamide gels, which were subsequently treated with Amplify (Amersham) and analysed by fluorography, normally for 12 h.

Computer analysis Comparisons of a query sequence and either a protein or a DNA data base were performed with the programs FASTA, TFASTA, BLASTP and TBLASTN (Pearson and Lipmann, 1988; Altschul et al., 1990). Multiple sequence alignments were constructed with the Clustal method (Higgins and Sharp, 1989), using the DNA Star program, and optimized by visual inspection.

Acknowledgements We would like to thank D.Williams, I.W.Mattaj, W.IJvan Venrooij and H.Peter for their generous gifts of mAbs and patient sera. We are grateful to D.Meyer, I.Ochsner and M.Wicke for expert technical assistance,

V.Buckow for typing and C.L.Will for critical reading of the manuscript. We also thank B.Seraphin and J.D.Beggs for communicating results prior to publication. This work was supported by grants from the Bundesministerium fur Forschung und Technologie, the Deutsche Forschungsgemeinschaft and the Fonds der Chemischen Industrie.

References Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) J. Mol. Biol., 215, 403-410. Bach,M., Winkelmann,G. and Luhrmann,R. (1989) Proc. Natl Acad. Sci. USA, 86, 6038-6042. Branlant,C., Krol,A., Ebel,J., Lazar,E., Haendler,B. and Jacob,M. (1982) EMBO J., 1, 1259-1265. Brunet,C., Quan,T. and Craft,J. (1993) Gene, 124, 269-273. Burd,C.G. and Dreyfuss,G. (1994) Science, 265, 615-621. Cooper,M., Johnston,L.H. and Beggs,J.D. (1995) EMBO J., 14, 20662075. Fabrizio,P., Esser,S., Kastner,B. and Luhrmann,R. (1994) Science, 264, 261-265. Fautsch,M., Thompson,M.A., Holocky,E.L., Schulz,P.J., Hallett,J.B. and Wieben,E.D. (1992) Genomics, 14, 883-890. Fischer,U., Sumpter,V., Sekine,M., Satoh,T. and Luhrmann,R. (1993) EMBO J., 12, 573-583. Fisher,D.E., Conner,G.E., Reeves,W.H., Wisniewolski,R. and Blobel,G. (1985) Cell., 42, 751-758. Frohmann,M.A., Dush,M.K. and Martin,G.R. (1988) Proc. Natl Acad. Sci. USA, 85, 8998-9002. Griffith,A.J., Schmauss,C. and Craft,J.E. (1992) Gene, 114, 195-201. Guthrie,C. (1991) Science, 253, 157-163. Habets,W.J., Berden,J.H.M., Hoch,S.O. and van Venrooij,W.J. (1985) Eur J. Immunol., 15, 992-997. Hackl,W., Fischer,U. and Luhrmann,R. (1994) J. Cell. Biol., 124, 261-272. Heinrichs,V., Hackl,W. and Luhrmann,R. (1992) J. Mol. Biol., 227, 15-28. Higgins,D.G. and Sharp,P.M. (1989) Gene, 73, 237-244. Hirakata,M., Craft,J. and Harkin,J.A. (1993) J. Immunol., 150, 35923601. Hirt,H., Gartner,A. and Heberle-Bors,E. (1992) Nucleic Acids Res., 20, 613. Lehmeier,T., Foulaki,K. and Luhrmann,R. (1990) Nucleic Acids Res., 18, 6475-6484. Lehmeier,T., Raker,V., Hermann,H., Luhrmann,R. (1994) Proc. Natl Acad. Sci. USA, 91, 12317-12321. Lemer,M.R. and Steitz,J.A. (1979) Proc. Natl Acad. Sci. USA, 76, 5495-5499. Lerner,M.R., Boyle,J.A., Harkin,J.A. and Steitz,J.A. (1981) Science, 211,

400-402. Liautard,J.P., Sri-Widada,J., Brunel,C. and Jeanteur,P. (1982) J. Mol. Biol., 162, 623-643. Luhrmann,R., Kastner,B. and Bach,M. (1990) Biochim. Biophys. Acta, 1087, 265-292. Mattaj,I.W. (1986) Cell, 46, 905-911. Mattaj,I.W. (1988) In Birnstiel,B. (ed.), Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles. Springer Verlag, Berlin, pp. 100-114. Mattaj,I.W. and De Roberis,E.M. (1985) Cell, 40, 111-118. McAllister,G., Amara,S.G. and Lerner,M.R. (1988) Proc. Natl Acad. Sci. USA, 85, 5296-5300. Mitsuda,T., Eisenberg,R.A. and Cohen,P.L. (1992) J. Autoimmun., 5, 277-287. Mizushima,S. and Nomura,M. (1970) Nature, 226, 1214-1218. Moore,M.J., Query,C.C. and Sharp,P.A. (1993) In Gesteland,R.F. and Atkins,J.F. (eds) The RNA World. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp. 303-357. Orlean,P., Albright,C. and Robbins,P.W. (1988) J. Biol. Chem., 263, 17499-17507. Pearson,W.R. and Lipman,D.J., (1988) Proc. Natl Acad. Sci. USA, 85, 2444 2448. Plessel,G., Fischer,U. and Luhrmann,R. (1994) Mol. Cell. Biol., 14, 4160-4172. Preston,R.A., Manolson,M.F., Becherer,K., Weidenhammer,E., Kirkpatrick,D., Wright,R. and Jones,E.W. (1991) Mol. Cell. Biol., 11, 580 1-58 12.

2087

H.Hermann et al. Reuter,R., Rothe,S., Habets,W., van Venrooij,W.J. and Luhrmann,R. (1990) Eur J. Immunol., 20, 437-440. Rokeach,L.A. and Hoch,S.O. (1992) Mol. Biol. Rep., 16, 165-174. Rokeach,L.A., Haselby,J.A. and Hoch,S.O. (1988) Proc. Natl Acad. Sci. USA, 85, 4832-4836. Rokeach,L.A., Jannatipour,M., HaselbyJ.A. and Hoch,S.O. (1989) J. Biol. Chem., 264, 5024-5030. Roy,J., Zheng,B., Rymond,B.C. and Woolford,J.L. (1995) Mol. Cell. Biol., 15, 445-455. Rymond,B.C. (1993) Proc. Natl Acad. Sci. USA, 90, 848-852. Rymond,B.C., Rokeach,L., Hoch,S.O. (1993) Nucleic Acids Res., 21, 3501-3505. Sauterer,R., Goyal,A. and Zieve,G.W. (1990) J. Biol. Chem., 265, 1048-1058. Schmauss,C. and Lerner,M.R. (1990) J. Biol. Chem., 265, 10733-10739. Seraphin,B. (1995) EMBO J., 14, 2089-2098. Smith,D.B. and Johnson,K.S. (1988) Gene, 67, 31-40.

Stanford,D.R., Kehl,M., Perry,C.A., Holicky,E.I., Harven,S.E., Rohleder,A.M., Rehder,K.,Jr, Luhrmann,R. and Wieben,E.D. (1988) Nucleic Acids Res., 16, 10593-10605. Strub,K. and Walter,P. (1990) Mol. Cell. Biol., 10, 777-784. Tan,E.M. (1979) Adv. Immunol., 44, 93-152. van Dam,A., Winkel,I., Zijlstra-Baalbergen,J., Smeenk,R. and Cuypers,H.T. (1989) EMBO J., 8, 3853-3860. Vankan,P., McGuigan,C. and Mattaj,I.W. (1990) EMBO J., 9, 3397-3404. Vincent,W.S.,III., Goldstein,E.S. and Allen,S.A. (1990) Biochim. Biophys. Acta, 1049, 59-68. Wilson,R. et al. (1994) Nature, 368, 32-38. Woppmann,A., Patschinsky,T., Bringmann,P., Godt,F. and Luhrmann,R. (1990) Nucleic Acids Res., 18, 4427-4438. Received on February 16, 1995

2088