fungus Schizophyllum commune - NCBI

The EMBO Journal vol.3 no.9 pp.2101-2106, 1984

Sequence analysis of

a

fungus Schizophyllum

split

gene commune

involved in fruiting from the

J.J.M.Dons*, G.H.Mulder, G.J.A.Rouwendal, J.Springer, W.Bremer and J.G.H.Wessels Department of Plant Physiology, Biological Centre, University of Groningen, Kerklaan 30, 9751 NN Haren, The Netherlands, *Present address: Institute for Horticultural Plant Breeding, Mansholtlaan 15, Wageningen, The Netherlands Communicated by M.Gruber

The sequence of a gene and its mRNA, which is abundantly expressed during fruiting body initiation in the Basidiomycete Schizophyllum commune, is described. This gene (1G2), the first to be analyzed in this group of fungi, contains an open reading frame coding for a polypeptide of 94 amino acids and a mol. wt. of 9842. A possible signal peptide of 20 residues and one glycosylation site were found. The sequence analysis was hampered by a sequence rearrangement in one of the cDNA clones, probably due to base pairing between short complementary sequences present at the 5' and 3' ends of the mRNA. The 5' untranslated leader sequence is 57 bp long and harbors a possible ribosome binding site close to the AUG start codon. A TATA box is found at position - 31 upstream of transcription initiation. The 3' untranslated sequence is 200 bp long and contains the sequence -TATATAAT-, which most likely represents the polyadenylation signal. Some heterogeneity as to the site of addition of the poly(A) tail was observed. The coding region of the gene is interrupted by three very small introns of 53, 49 and 49 bp, respectively. The 5' and 3' splice junctions are conserved: GTGAGT- and -AG-, respectively. Each intron contains a sequence complementary to the 5' end of the intron. These sequences are compared with internal conserved sequences in yeast and fi'lamentous fungi with regard to their possible role in splicing. Key words: cloning artefact/fungal gene/Schizophyllum commune/sequence analysis/splicing -

-

Introduction In the life cycle of the Basidiomycete Schizophyllum commune, a dikaryotic mycelium (two haploid nuclei per cell) is formed by mating two compatible monokaryons (one haploid nucleus per cell) which are hetero-allelic for their two incompatibility genes (Raper, 1983). After growth for -3 days in a surface culture, dikaryotic mycelia but not monokaryotic mycelia develop fruiting bodies. During the formation of the fruiting body primordia, a small number of novel abundant mRNAs appear in the RNA population (Hoge et al., 1982a). A cDNA clone corresponding to a mRNA which is strongly expressed during fructification was isolated from a cDNA library (Dons et al., 1984). This mRNA was 650 nucleotides long and coded for an in vitro translation product of mol. wt. 13 000. The sequence was expressed specifically in the dikaryon and not in the monokaryotic strains, and its concentration increased 20-fold in correlation with the establishment of fruiting body primordia. It was suggested that tran-

© IRL Press Limited, Oxford,

England.

scription of the gene involved (denoted the 1G2 gene) is conditioned by the existence of hetero-allelic incompatibility genes, -but that full expression only occurs during fruiting body formation. In our studies on gene expression in S. commune we have concluded that there is no detectable amount of heterogenous nuclear RNA in this fungus (Zantinge et al., 1981; Hoge et al., 1982b). The same phenomenon has been described for other fungi (cf. Zantinge et al., 1981; Van Etten et al., 1981) and contrasts with plants and animals. In agreement with this, recent data on cloning and sequencing of genes from yeast and other fungi show that most protein-coding genes do not contain intervening sequences (Pikielny et al., 1983; Timberlake and Barnard, 1981). Introns, however, have been found in some genes from Saccharomyces cerevisiae, (cf. Langford et al., 1984), in histone genes and an NADPspecific glutamate dehydrogenase gene from Neurospora crassa (Woudt et al., 1983; Kinnaird and Fincham, 1983) and in the exo-cellobiohydrolase gene from Trichoderma reesei (Shoemaker et al., 1983). In all these cases, however, the number of introns found is low (one or two) and they are relatively small compared with the introns found in plants and animals. Here we present a sequence analysis of the 1G2 gene from S. commune. A comparison of the sequence of a genomic clone with cDNA clones revealed the presence of three very small introns, all located within an open reading frame with a coding capacity for a polypeptide of 94 amino acids. Results Determination of the transcription orientation The genomic clone g6D4 contains a 9-kbp insert on which the 1G2 gene is located. The position of the gene was previously mapped at -2 kbp from one end of the insert by Southern hybridizations using cDNA clone 7D5 as a probe (Dons et al., 1984). Figure 1 shows a partial restriction map of cDNA clone 7D5 and the corresponding part of the genomic clone g6D4. Small deviations in the length of some restriction fragments as well as the presence of additional restriction sites in the genome clone (e.g., an extra Sall site and a SphI site) indicated the presence of small intervening sequences. Fragments of both clones were subcloned in pBR327 and in the M 13 vectors mplO and mpl 1 and used in a sequence analysis. The sequence strategy is outlined in Figure 1. The cDNA clone 7D5 contains an almost full-length copy of the mRNA: the insert length is 620 bp and the mRNA length was determined by Northern blot analysis as 650 nucleotides including the poly(A) tail (Dons et al., 1984). A stretch of A residues was detected at one end of the 7D5 clone and the transcription orientation thus seemed established. However, the presence of a sufficiently long open reading frame only in the opposite direction suggested that the 5' - 3' orientation should be reversed as shown in Figure 1. Comparing the end of the cDNA clone with the genomic sequence, we

J.J.M.Dons et al.

ATG -200

-100

1

_

1

=0

# 400 _\ 7\

\200 \

\

\

\\

\

\

genomic

Hf S

Hp Sl

SL Sp

Hp

DNA clone:

g6D4

AA 700-

600

5

[A]n

\

\I

\\ \\

L-

Hp -

\\ SI

J 41 [T]3-GAGTI -[G]3 16

Ho Hp

16

T'1AA

30

4 4-

._

/_

PI"

,-

..

cDNA clones: 7D5

% IH

[GI,+,I Ii .

I II

[Al [C] [c]

n

5B9

1G2

1. The 1G2 gene of Schizophyllum commune: partial restriction maps and sequence strategy. Only restriction sites mentioned in the text and sites used for end-labelling are indicated. The maps of the different clones are aligned at the HinfI and Sau3A site. Transcription starts at position 1 and its direction is indicated with a heavy arrow. Long arrows indicate the approximate lengths of sequenced parts. Dots represent the end-label used in the Maxam and Gilbert (1980) sequencing method. Abbreviations: Hf, HinfI; Hp, HpaII; S, Sau3A; SI, Sall; Sp, SphI. ATG and TAA indicate the position of the start and stop codon, respectively. (A),, (G)n, (C)n represent the poly(A), poly(G) and poly(C) stretches. Fig.

found a small deviation of only four bp adjacent to the poly(A) tail. Due to some cloning artefact (see Discussion), only these four bp with the poly(A) tail were translocated during the synthesis of double-stranded cDNA to the opposite end of the sequence. In the right orientation (Figure 1) the 5' end of the 7D5 sequence starts, therefore, with the sequence (G)16-(T)60-GAGT, not present at that position in the gene. To confirm that the assumed orientation was correct, a dot-blot hybridization of eight single-stranded M13 clones with 32P-labelled RNA was performed. Only clones containing the 3'-5' strand hybridized with this probe, which indeed proves that the originally found orientation should be reversed. The coding region The complete nucleotide sequence of the non-transcribed strand of the genomic DNA and the deduced amino acid sequence are shown in Figure 2. The open reading frame encompasses a polypeptide of 94 amino acids with a calculated mol. wt. of 9842. The gene is interrupted by three small intervening sequences, all located within the coding region. It is remarkable that the stop codon TAA is generated by removal of the third intron. In the polypeptide sequence, the amino acids glutamic acid and tyrosine are absent. The NHrterminal methionine is the only methionine present which might explain the difficulties we sometimes encountered in visualizing the [35S]methioninelabelled product of the mRNA in hybrid-release translations. Although it is rather speculative to infer the iso-electric point of a protein from its amino acid sequence, the excess of three basic amino acids (6 arg + 3 lys 6 asp) suggests that this protein is basic. This might explain why we were unable to detect it in 2D-gel electrophoresis patterns within a pH range of 4-7. The NH2rterminal end of the polypeptide is highly hydrophobic (17 of the 20 first amino acids) and this possibly represents a signal peptide. Codon usage for the 1G2 gene is presented in Table I. Be2102 -

of the small size of the polypeptide, calculations on codon-preferences should be considered with some reservation. From the 61 possible codon triplets, 37 are used. Preferences apparently exist for CCC (pro), AAC (asn), GAC (asp), TGC (cys), GGC (gly), ATC (ile) and AGC (ser), which means that in the third position there is a bias favoring C, whereas seldom (only five of 94 codons) is A found in the third position. A possible N-glycosylation site is found at base position 361 -369, the sequence asn-cys-ser. The 5' and 3' non-coding regions The location of the 5' end of the mRNA was determined by SI mapping. A fragment covering the 5' end of the sequence and end-labelled at the Sall site (position 172), was hybridized to poly(A) RNA isolated from a fruiting dikaryon. S1 nuclease resistant hybrids were analyzed on a denaturing gel and a small cluster of three fragments (175, 176 and 177 nucleotides) was found. The transcription initiation site could thus be mapped and is indicated with an arrow at position 1 (Figure 2). This means that the mRNA contains a 57 nucleotides long untranslated leader sequence. Upstream of the transcription site we found a TATA box, the sequence -TATAAA- at position - 31. The 3' non-coding region is -200 bp long and inspection of this part of the sequence did not reveal many remarkable stretches, except for the occurrence of a sequence starting 10 bp downstream from the stop codon. This sequence is an II -bp direct repeat (positions 502-512 and 523 -533) containing the palindromic sequence -CAACGTTG-, that could be involved in the formation of a secondary structure. As already mentioned in the section on the orientation of the sequence, the four bases adjacent to the poly(A) tail in cDNA clone 7D5 are -ACTC-. This sequence is found immediately downstream of the inverted repeat of 12 nucleotides at the 3' end of the sequence. Thus, in this case the poly(A) tail is added at position 717. By comparing the same part of the sequence as present in other homologous cDNA cause

Sequence of a split gene from Schizophyllum commune

GGCCGTTCGTCTGGCAAGAAACCTTGGGGAATACAATGGACGCGTCAG -250ATGCCTG ATATACTTGCACGAAGCAGTATGAACTCCCCAGAACGCGACGCTTGAGCGCGCCGACCTATGTACCGATGCCACCGTAGGT -200

--150 TTAGTCGCGTCCGAAGACGGTACTTTGCAGGCAGTTTCAACCGCCGCTCCGTTCTCTGAAGTCATTGTACAGACTTGATGA * ~~~~-100 -50

TCGGCTCTGGACGAGIATAAGCCACGACAAATTGCTGCCATTCCT5C

CAACTCTTGC TCTACATCCCTCGTTTACACT

met arg phe ser leu aZa iZe Zeu ala Zeu pro val Zeu aZa aZa CTCAGCCTCACCAGCAACAACC ATG CCC TTC TCG CTC GCC ATC CTT GCT CTC CCC GTC CTC GCG GCT 50 iii&A

100

aZa thr aZa vaZ pro arg gly gZy aZa ser Lys cys asn ser gZy pro vaZ gZn cys cys

GCG ACT GCG GTT CCC CGC GGC GGC GCT TCC AAG TGC AAC AGC GGT CCC GTC CAG TGC TGC ~~~~~~~~150

*

asn thr

leu val asp

thr

AAC ACC CTG GTC GAC GTGAGTGGTCATTCGCTCATCTCATTCCTTTGCATGCTG

lys

GCCCGCAG ACT AAG * .AGCCA ACAG ~~~~200 asp iys his gin thr asn iZe vaZ gZy aZa leu Zeu gZy Zeu asp Zeu gZy ser Zeu thr

GAC AAG CAT CAG ACC AAC ATC GTC GGC GCC CTT CTG GGC CTT GAC CTC GGC AGC CTC ACC 250

gly leu alGGA CTT GC

gly val asn cys ser C GGT GTG AAC TGC TCT

a

GTGAGTAGCCTGGTTCAGTGAGCCGTCGACTTGCTGATG?GTCCATWAG

gly gZn leu Zeu leu asn ser asp arg val Zeu arg CCC GTT CAG CGT GAT TGG CGT TGG GGG CAA CTC CTG CTC AAC TCA GAC CGT GTG CTG CGA * ~~~~~~400 gly asp pro vaZ stop GGG GAC CCA GTT T GTGAGTGCCATATCTCCTGAACTACCCTCTCTCACTAACCATCCCTAAG AA CGGTCTTGT 450 __500

pro vaZ gZn arg asp trp arg trp

_

_

_

_

_

CAACGTTGGCTGTACGCCCATCAACGTTGGCCTGTAAGCTACTCGACGGCCGTACCTCAGAACCTTGGCTCAGCGAGAACG *

~~~~~~~~550

.

TCCTTGAACTGCTAACCTACGGACTCATTTAATGATCTTCACCGCATACTGTCTTTGACCGCTATACCGTACACCCAAGAA 600 650 AATCATACATGTTCTCGTCGTCTATGCTATATAATCLTGCATGTGTTG(1\CTCTTCGGTGGCGGTCATTCAACGCGTAAGGG 7UJ -

CCGCGAAGTCCAAGAAGGTCGGTATACACGCAC 750

Fig. 2. Nucleotide sequence and deduced amino acid sequence of the IG2 gene. The non-transcribed DNA (mRNA sequence) strand is shown. The transcription start site was chosen as position 1. The three introns are in small capitals and internal sequences possibly involved in splicing are underlined. Possible control sequences are indicated: the TATA box (under- and overlined), the ribosome binding site (A A) and the polyadenylation signal (overlined). The two dashed boxes represent the 12-bp complementarity at the 5' and 3' ends involved in the translocation during cDNA cloning and the dashed arrows indicate a palindromic sequence in the 3' untranslated region.

clones (Figure 3), it was observed that heterogeneity must exist as to the site at which the poly(A) tail is added. In cDNA clone 5B9 it started at a position 3 bp earlier, while in cDNA clone 1G2 we could not find the poly(A) tail at all. The sequence could be read up to position 728 at which site the poly(C) cloning tail was found. Polyadenylation usually occurs at a specific distance (10-30 nucleotides) downstream of a conserved hexanucleotide -AATAAA-. In searching for a comparable signal sequence we found the sequence TATATAAT, -20 nucleotides in front of the poly(A) tail. The intervening sequences The coding region is interrupted by three very small introns of 53, 49 and 49 bp, respectively. Comparison of the intron sequences shows the presence of highly conserved consensus sequences at the 5' (GTGAGT) and 3' (AG) splice junctions. Thus the introns of S. commune obey the general GT/AG rule (Mount, 1982). There seems to be no conservation in the exon sequences bordering the introns. Internal sequences, which might be involved in splicing, are underlined in Figure 2 (see Discussion). Discussion The 1G2 gene analyzed in this report - the first gene from a

Table 1. Codon usage in the 1G2 gene phe phe leu leu leu leu leu leu ile ile ile met val val val val

TrT Trc 1 TTA TTG CTr 4 CTC 7 CTA CTG 4 ATT ATC 2 ATA ATG 1 GTT 3 GTC 4 GTA GTG 2

ser ser ser ser pro pro pro pro thr thr thr thr ala ala ala ala

TCT 1 TCC 1 TCA 1 TCG 1 CCT CCC 4 CCA 1 CCG ACT 2 ACC 3 ACA ACG GCT 3 GCC 3 GCA GCG 3

TAT TAC TAA 1 TAG his CAT 1 his CAC gln CAA 1 gln CAG 3 asn AAT asn AAC 5 lys AAA lys AAG 3 asp GAT I asp GAC 5 glu GAA glu GAG tyr tyr

TGT TGC 4 TGA trp TGG 2 arg CGT 3 arg CGC 2 arg CGA 1 arg CGG ser AGT ser AGC 2 arg AGA arg AGG gly GGT 2 gly GGC5 gly GGA 1 gly GGG 2

cys cys

Basidiomycete to be sequenced - belongs to a family of genes which is abundantly expressed during formation of fruiting bodies in the dikaryon S. commune but not in the coisogenic non-fruiting monokaryons when grown under similar conditions (Dons et al., 1984). Its exact function in the 2103

J.J.M.Dons et al. 7D5

5'-GCTATATAATCTGCATGTGTTGCACTC-(A)o--(C)

16

1G2 5 '--GCTATATAATCTGCATGTGTTGCACTCTTCGGTGGCGG-( C) 5B9 e-GCTATATAATCTGCATGTGTTGCA-( A )-( C) 5

31 3'

Table II. Internal conserved sequences and base complementarities between the 5' enids and internal sequences in introns of fungia

3'

Intron sequence (5 -33')

Saccharomyces g6D4 5 '-GCoTATATAATCTGCATGTGTTGCACTCTTCGGTGGCGGTCATTCAA 690 7 10 700 720 730

3'

Fig. 3. Heterogeneity in the polyadenylation site. The sequences at the 3' ends of the three cDNA clones are presented, including the poly(A) and poly(C) stretches. In reality the 7D5 cDNA clone does not contain the final part of the sequence (Figure 1). However, the 3' end was reconstructed according to the observed translocation in this clone. For comparison the same part of the sequence as found in the genomic clone g6D4 is included. The possible polyadenylation signal is underlined. The sequence is numbered in accordance with the sequence presented in Figure 2.

fruiting process is not yet known but the prevalence of the corresponding mRNA (- 1100 copies/cell) at the time of fruiting, particularly in the fruiting structures (to be published), suggests an important role for the protein. In general, studies on the organization and transcription of eukaryotic genes comprise a comparison of the genomic sequence with a cloned copied mRNA sequence and cloning of a mRNA sequence involves a rather complicated series of syntheses. Although cDNA clones usually faithfully reflect the sequence of the mRNA, in a few cases sequence rearrangements have been observed to arise, and several mechanisms for their generation were described by Fields and Winter (1981) and Volckaert et al. (1981). Inversion of a part of the sequence was shown to be caused by the presence of short complementary sequences which base pair after the synthesis of the first cDNA strand. The gene investigated here happened to contain such a sequence complementarity at the extreme 5' and 3' ends: -GCAACTCTTGCA- and -TGCATGTGTTGC- (dashed boxes in Figure 2). Hairpin formation due to base-pairing between these sequences in the first cDNA strand probably led to the synthesis of a doublestranded cDNA molecule in which the 3' end of the sequence (four nucleotides and the poly(A) tail) was translocated to the 5' end according to the model of Fields and Winter (1981). Consequently, the orientation of the transcript was initially incorrectly interpreted. Only after hybridization of the mRNA to single-stranded Ml 3 clones could the correct orientation of the cDNA clone be established. The open reading frame of the 1 G2 gene starts with the first AUG found in the RNA and encodes a polypeptide of 94 amino acids with a calculated mol. wt. of 9842. This differs considerably from the apparent mol. wt. of 13 000 of the in vitro translation product synthesized by hybrid-release translation (Dons et al., 1984), probably due to aberrant mobility of this polypeptide in SDS-polyacrylamide gels. The aminoterminal region is highly hydrophobic, a characteristic of signal peptides of secretory proteins. The exact length of the signal peptide remains unknown since the in vivo translation product has not yet been identified. However, the synthesis of specific extracellular proteins of similar weights by the dikaryotic mycelium of S. commune has been observed (de Vries and Wessels, 1984). Whether the 1G2 product is equivalent to one of these excreted proteins awaits further characterization using an immunological approach. There is increasing evidence that the eukaryotic fungi differ from plants and animals in many aspects concerning e.g., genome organization, transcription regulation and splicing. Therefore it is tempting to compare the organization of the 1G2 gene of S. commune with the well-documented 'eukary2104

cerevisiae 16 proteins b

GGTATGT- (X)pTACTAACA p

(X) q -AG

GTAAGTT- (X)39 -TGCTAACG GTAAGTT - (X)26 -TTGTAACA -2 GTACGTT -(X)37 AACTAACA NADP-GDHd I GTACGTC-(X)35 -AGCTGACT

(X)26 -AG

Neurospora crassa

histone H3c histone H4c

-1

(X)II -AG (X)14 -AG (X)14 -AG (X)10 AG

-2 GTAAGTG-(X)3 -TGCTGACT Trichoderma reesei cellobiohydrolasee -1 GTAAGT- (X)4 - CAGCTGACTG- (X)9 -AG -2 GTGAGT- (X)36 - CAGCTGACTG - (X)g-AG

Schizophyllum commune

1G2

-1 -2 -3

GTGAGT- (X)33 -ACGCAC GTGAGT- (X)22-ACTTGC GTGAGT- (X)27-ACTAAC

(X)6 -AG

(X)13-AG (X)8 -AG

aConserved internal sequences are as indicated by the various authors. Complementary bases are underlined. bLangford et al. (1984). cWoudt et al. (1983). In addition to the internal sequence shown, intron I contains the internal sequence GACTGAC starting 24 bases from the 3' end and with better complementarity to the 5' end. dKinnaird and Fincham (1983). In addition to the internal sequence shown, intron I contains the internal sequence CGCGGAC starting 35 bases from the 3' end with better complementarity to the 5' end. eShoemaker et al. (1983).

otic' organization. One of the well-defined sequences thought to act as a signal for eukaryotic transcription is the GoldbergHogness or TATA box which is located -30 residues upstream from the start site of transcription (Nevins, 1983). In the 1G2 gene we found the sequence -TATAAA- at this position (-31 - -25). SI nuclease mapping revealed the presence of one defined 5' end of the 1G2 mRNA, in agreement with the presence of only one TATA box. The 5' leader sequence of the mRNA of the 1G2 gene has a length of 57

nucleotides and falls well within the range (20-80 nucleotides) in which >700o of the eukaryotic mRNAs are found (Kozak, 1984). The open reading frame we found starts with the 5' -proximal AUG-triplet of the sequence, as observed in nearly all mRNAs sequenced up to now. In the sequences surrounding the translation initiation site less conservation is observed. According to a compilation made by Kozak (1983) the sequence CCACC-AUG can be regarded as a consensus sequence for the eukaryotic translation initiation site. The most conserved feature is the presence of an A at position -3. The sequence CAACC-AUG found in the 1G2 gene closely resembles this consensus sequence. It has been suggested by Sargan et al. (1982) that this sequence might be involved in ribosome binding due to its complementarity with the base of the 3'-terminal hairpin structure found in all eukaryotic 17S or 18S ribosomal RNA (van Charldorp and van Knippenberg, 1982). The hexanucleotide sequence AAUAAA is located -20 nucleotides from the poly(A) tail of eukaryotic mRNAs. The sequence is highly conserved and is thought to determine the location of the poly(A) tail. It is therefore noticeable that in the 1G2 mRNA such a sequence is not found, but that an

Sequence of a split gene from Schizophyllum commune

octanucleotide with only U and A is found at approximately the same position. Up to now only two other exceptions have been observed, -AUUAAA- and -AAUAUA- (Nevins, 1983). We also found some heterogeneity in the site of the poly(A) addition. A ragged 3' terminus as observed here has been described for the bovine prolactin mRNA (Sasavage et al., 1982) and might have been overlooked in other cases, since usually characterization of sequences is done with single clones. A remarkable feature of the 1G2 gene is the presence of three extremely short introns of 53, 49 and 49 bp, respectively. As far as we know, these are the smallest introns found to date in a nuclear structural gene. The coding sequence is divided by the introns into segments of 120, 74 and 89 bp, respectively. The last alanine codon of the second exon is generated by removal of the second intron. Moreover, the stop codon TAA is generated by removal of the third intron. These features are difficult to reconcile with the notion that the exons represent functional domains of the protein. Sequences present at the splice junctions of intervening sequences are well conserved. The consensus sequences, based on the analysis of a large number of genes (Mount, 1982) are GTXXGT for the 5' end and AG at the 3' end. At present only a rather small number of fungal genes have been analyzed, except for genes of the yeast S. cerevisiae. In most of these genes no introns were found and therefore data on intron sequences is rather limited. Table II reviews the lengths of the introns as well as the conserved sequences found in fungal genes. It is remarkable that in these genes only 1 -3 introns are present and that the size of the introns is small in the filamentous fungi, in contrast with the relatively large introns found in yeast genes. All introns obey the GT/AG rule and in addition GT is always present in position 5 and 6 at the 5' splice junction. The nucleotides at position 3 and 4 vary in different species, but are the same (GA) within the three introns of the 1G2 gene. In plants and animals, small nuclear RNAs are thought to be involved in the splicing procedure due to complementarity between the 5' end of the U1RNA and the 5'-conserved sequence of the intron (Lerner et al., 1980). Yeast introns contain an internal conserved sequence (ICS: -TACTAAC-), resembling the 5' end of U1RNA, and which, it was suggested, regulate in cis the splicing of the yeast intron (Langford and Gallwitz, 1983; Pikielny et al., 1983). This suggests that the splicing mechanism of nuclear genes from yeast deviates from that of plants and animals, explaining why foreign intervening sequences cannot be removed from hybrid transcripts (Langford et al., 1983) unless they contain a TACTAAC sequence by chance (Watts et al., 1983). Internal conserved sequences, more or less resembling the ICS of yeast, have also been noted in introns from N. crassa (Kinnaird and Fincham, 1983; Woudt et al., 1983) and T. reesei (Shoemaker et al., 1983). The hypothesis that the splicing mechanism involves base-pairing between the 5' end of the introns and the ICSs (Pikielny et al., 1983) has been criticized by Langford et al. (1984), based on the effects of induced mutations in the ICS. Yet it is remarkable that in all fungal introns examined hitherto partial complementarity between the 5' end of the intron and an internal sequence can be observed (Table II). In the introns of the 1G2 gene of S. commune we were not able to find a well-conserved internal sequence. Nevertheless, each of these introns harbours near the 3' end a sequence showing partial complementarity to the conserved 5' end to allow for hairpin formation. Since only in yeast do

both GTs at the 5' end pair with the conserved TACTAAC sequence, it may be that this sequence as such is not needed in all cases but that other internal sequences in fungal introns can serve a role in splicing through base-pair formation. Materials and methods Clones The gene investigated in this study was denoted 1G2, according to the cDNA clone originally obtained by differential screening of a cDNA library (Dons et al., 1984). The cDNA library was constructed using the G/C tailing procedure with plasmid pBR327 as a vector. cDNA clone IG2 which had an insert-length of only 290 bp, was used in a re-screening procedure. Other homologous clones were found among which clone 5B9, containing the 3' end of the sequence, and 7D5, containing an almost full length copy of the mRNA. All these clones were used in the sequence analysis. The sequence of the genomic DNA was determined using genome clone g6D4. This clone contains a 9-kbp PstI fragment inserted into pBR327. The gene IG2 was mapped on this DNA fragment (Dons et al., 1984). DNA sequence analysis The chemical DNA sequence method of Maxam and Gilbert (1980) and the dideoxy-chain termination method of Sanger et at. (1977) with M13 vectors mplO and mpl I (Messing, 1983) were used. The sequence strategy is outlined in Figure 1. Analysis of the sequence was done with a computer program (Staden, 1983). Dot-blot hybridization Total RNA was isolated from a surface-grown mycelium of the dikaryon 4-39 (A41B41) x 4-40 (A43B43) after a cultivation time of 96 h at which time the mycelium was covered with fruiting body primordia (Dons et al., 1984). RNA was isolated according to Hoge et al. (1982a) and was end-labelled using [732P]ATP and T4-polynucleotide kinase (Boehringer) according to Maxam and Gilbert (1980). Different single-stranded M13 clones containing fragments of the sense or non-sense strand of the IG2 gene were dot-blotted onto Genescreen. The blots were hybridized with 32P end-labelled RNA using the 50%70 Formamide-42°C procedure as described in the Genescreen manual (New England Nuclear), except that the pre-hybridization was done in the presence of Escherichia coli tRNA (200 yIg/ml) and poly(A) (5 iLg/ml). SI nuclease mapping The 5' end of the gene was expected to be present on a HindlII-Sall fragment (1300 bp) upstream from the SaIl site at position 172. This fragment was isolated from RF-DNA of a M13 clone and 5' end-labelled with polynucleotide kinase and [y-32P]ATP (Maxam and Gilbert, 1980). Hybridization of the labelled fragment with poly(A) RNA isolated from fruiting dikaryon and SI nuclease treatment of the hybrids were done according to Favaloro et al. (1980). The lengths of the SI resistant fragments were determined on a 6%7o sequencing gel. Control hybridizations were performed with E. coli tRNA.

Acknowledgements We are indebted to Mr P.A.J.Vos for his help in sequencing and computer analysis of the clones. This work was supported in part by the Foundation for Fundamental Biological Research (BION), which is subsidized by the Netherlands Organization for the Advancement of Pure Research (ZWO).

References de Vries,O.M.H. and Wessels,J.G.H. (1984) J. Gen. Microbiol., 130, 145154. Dons,J.J.M., Springer,J., de Vries,S.C. and Wessels,J.G.H. (1984) J. Bacteriol., 157, 802-808. Favaloro,J., Treisman,R. and Kamen,R. (1980) Methods Enzymol., 65, 718749. Fields,S. and Winter,G. (1981) Gene, 15, 207-214. Hoge,J.H.C., Springer,J. and Wessels,J.G.H. (1982a) Exp. Mycol., 6, 233243. Hoge,J.H.C., Springer,J., Zantinge,B. and Wessels,J.G.H. (1982b) Exp. Mycol., 6, 225-232. Kinnaird,J.H. and Fincham,J.R.S. (1983) Gene, 26, 253-260. Kozak,M. (1983) Microbiol. Rev., 47, 1-45. Kozak,M. (1984) Nucleic Acids Res., 12, 857-873. Langford,C.J. and Gallwitz,D. (1983) Cell, 33, 519-527. Langford,C.J., Nellen,W., Niessing,J. and Gallwitz,D. (1983) Proc. Natl. Acad. Sci. USA, 80, 1496-1500. Langford,C.J., Klinz,F.-J., Donath,C. and Gallwitz,D. (1984) Cell, 36, 645653. Lerner,M.R., Boyle,J.A., Mount,S.M., Wolin,S.L. and Steitz,J.A. (1980)

Nature, 283, 220-224.

2105

J.J.M.Dons et al. Maxam,A.M. and Gilbert,W. (1980) Methods Enzymol., 65, 499-560. Messing,J. (1983) Methods Enzymol., 101, 20-78. Mount,S.M. (1982) Nucleic Acids Res., 10, 459-472. Nevins,J.R. (1983) Annu. Rev. Biochem., 52, 441-466. Pikielny,C.W., Teem,J.L. and Rosbash,M. (1983) Cell, 34, 395-403. Raper,C.A. (1983) in Bennett,J.W. and Ciegler,A. (eds.), Secondary Metabolism and Differentiation in Fungi, M.Dekker, NY, pp. 195-238. Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl. Acad. Sci. USA, 74,5463-5467. Sargan,D.R., Gregory,S.P. and Butterworth,P.H.W. (1982) FEBS Lett., 147, 133-136. Sasavage,N.L., Smith,M., Gillam,S., Woychik,R.P. and Rottman,F.M. (1982) Proc. Nati. Acad. Sci. USA, 79, 223-227. Shoemaker,S., Schweickart,V., Ladner,M., Gelfand,D., Kwok,S., Myambo, K. and Innis,M. (1983) Biotechnol., 1, 691-696. Staden,R. (1983) in Work,T.S. and Burdon,R.H. (eds.), Laboratory Techniques in Biochemistry and Molecular Biology, Vol. 10, Elsevier Biomedical Press, Amsterdam, pp. 311-373. Timberlake,W.E. and Barnard,E.C. (1981) Cell, 26, 29-37. Van Charldorp,R. and van Knippenberg,P.H. (1982) Nucleic Acids Res., 10, 1149-1158. Van Etten,J.L., Dahlberg,K.R. and Russo,G.M. (1981) in Turian,G. and Hohl,H.R. (eds.), The Fungal Spore: Morphogenetic Controls, Academic Press, NY, pp. 277-302. Volckaert,G., Tavernier,J., Derynck,R., Devos,R. and Fiers,W. (1981) Gene, 15, 215-223. Watts,F., Castle,C. and Beggs,J. (1983) EMBO J., 2, 2085-2091. Woudt,L.P., Pastink,A., Kempers-Veenstra,A.E., Jansen,A.E.M., Mager, W.H. and Planta,R.J. (1983) Nucleic Acids Res., 11, 5347-5360. Zantinge,B., Hoge,J.H.C. and Wessels,J.G.H. (1981) Eur. J. Biochem., 113, 381-389.

Received on 22 June 1984

2106