Nucleotide sequence and secondary structures of ... - CiteSeerX

85 downloads 0 Views 1MB Size Report
Yuan-en Ji,' M. Joseph Colston* and Robert A. Cox'. Author for correspondence: ..... glass milk (Vogelstein & Gillespie, 1979). DNA was resus- pended in 20 pl ...
Microbiology (1994),140, 123-1 32

Printed in Great Britain

Nucleotide sequence and secondary structures of precursor 16s rRNA of slow-growing mycobacteria Yuan-en Ji,’ M. Joseph Colston* and Robert A. Cox’ Author for correspondence: Robert A. Cox. Tel: +44 81 959 3666. Fax: +44 81 906 4477.

Laboratory of DevelopmentaI Biochemistry1and Leprosy and Mycobacterial Research*, Na t ionaI Inst itUte for Medical Research, The Ridgeway, Mill Hill, London NW7 IAA, UK

Slow-growing mycobacteria have a single ribosomal RNA (rrn) operon, with the genes for 165, 235 and 55 rRNA being present in that order. The transcription start site of the rrn operon of Mycobacterium tuberculosiswas identified in Escherichia coli. PCR methodology was used t o amplify parts of the rrn operon, namely the leader region and the spacer-I region separating the 16s rRNA and 235 rRNA genes of Mycobacterium avium, Mycobacterium paratuberculosis, Mycobacterium intracellulare, ‘Mycobacterium lufu ‘, Mycobacterium simiae and Mycobacterium marinum. The amplified DNA was sequenced. The sequence data, together with those obtained previously for Mycobacterium leprae and M. tuberculosis, were used to identify putative antitermination signals and RNase 111 processing sites within the leader region. Notable features include a highly conserved Box B element and a sequence of 31 nucleotides which is common to all eight slow-growers which were scrutinized. A secondary structure for mycobacterial precursor-I65 rRNA was devised, based on sequence homologies and homologous nucleotide substitutions. The 18 nucleotides a t the 5’-end of spacer-I have the capacity of binding sequences close to the 5’- and 3‘-ends of mature 16s rRNA, suggesting that secondary structure is important to the maturation process. All the slowgrowers, including M. leprae, conform to the same scheme of secondary structure. The scheme proposed for M. tuberculosis is a variant of the main theme. The leader and spacer sequences may prove a useful supplement to 165 rRNA sequences in establishing phylogenetic relationships between very closely related species. ‘M. lufu’appears t o be a close relative of M. intracellulare. Keywords : slow-growing mycobacteria, r m operon, precursor 16s rRNA, secondary

structure

INTRODUCTION Mycobacteria are Gram-positive, acid-fast bacteria which include the human pathogens Mycobacterizlm leprae and Mycobacterizlm tzlberczalosis. The slow growth of these pathogens is also a feature of many other members of this genus. There is a broad correlation between the rate of growth of a bacterium and the number of its ribosomes (Bremer & Dennis, 1987; Winder & Rooney, 1970). In turn, the number of ribosomes is determined by the production o f rRNA. The rate at which mature rRNAs

The EMBL accession numbers for the nucleotide sequence data reported in this paper are X74054 t o X74063. 0001-8363 0 1994 SGM

are produced depends on factors which include the number of ribosomal RNA (rm) operons present, the strength of their promoters and the efficiency with which the operons are transcribed and processed. The latter two factors are important to M . leprae, 111.tuberculosis and other slow-growing mycobacteria which have a single rrtz operon (Bercovier e t d.,1986). This operon has the classical structure (Liesack e t d., 1990, 1991 ; Sela & Clark-Curtis, 1991 ; Suzuki e t a/.,1988; Kempsell e t al., 1992), viz. leader region, 16s rRNA gene, intergene spacer-1, 23s rRNA gene, intergene spacer-2, 5s rRNA gene (Fig. 1). The transcription process is fully efficient if each initiation event leads to a complete copy of the operon, precursor 123

Y. J I , M.

I. C O L S T O N a n d

R. A. COX

- y e 3 ' S l a S l b 23s rRNA

16s rRNA

5s rRNA

5' Iw

base pairing

"3'

5' pre-16s rRNA

L-

5' pre-23s rRNA

pre-5s rRNA

Scale : .. ....................................................................

-

..............................................

base pairing -I

..............................................................

-

200 base pairs or nucleotides

..-

Fig. 1. Strategies used in this investigation. (a) rrn operon of M. tuberculosis and other slow-growing mycobacteria. The

M. tuberculosis 1.2 kbp fstl fragment subcloned into plasmid pEJIO6 used to identify the start site of transcription, and the regions amplified and sequenced (see Methods), are indicated. The arrowheads show the locations of the primers used. (b) The transcript, pre-rRNA, of the rrn operon (approx. 5.3 kbp). The interaction through base-pairing between the 5'-flanking (leader) region and part of spacer-1 (spacer-la, Sla) which bring the 5'- and 3'-ends of 165 rRNA together is shown schematically. The interaction between the remaining part of spacer-1 (spacer-1b, 51 b) region and spacer-2 which brings the 5'- and 3'-ends of 23SrRNA together is also indicated. (c) Processing of pre-rRNA yielding intermediate products. The interaction between leader and spacer-la sequences in pre-165 rRNA and between spacer-1b and spacer-2 in pre-235 rRNA are represented by dots. The structure of pre-16s rRNA is the focal point of this investigation.

rRNA (pre-rRNA), from which mature 16S, 23s and 5s rRNA can be produced. Premature termination, which would lower the efficiency of pre-rRNA production, is reduced by means of antitermination signals incorporated in the transcript (see for example Berg e t al., 1989). Three such signals or elements (namely Box B, Box A and Box C, in that order) were found to be present in the leader region (sequences upstream from the 5'-end of 16s rRNA) of transcripts of the rrn operon of M . leprae and also M . tzlberczllosis (Kempsell e t al., 1992). Mature rRNA species are derived from pre-rRNA by successive steps of cleavage by nucleases. For example, part of the leader sequence interacts with spacer-1 sequence to form a long base-paired stem which is cleaved to generate precursor 16s rRNA (pre-16s rRNA) from which mature 16s rRNA is subsequently produced by further RNase action (Young & Steitz, 1978; for a review see Apirion & Miczak, 1993). Here we present the sequence of the leader region and the inter-gene region, spacer-1, of pre-rRNA of the slowgrowing Mycobacterizlm auizlm, My cobacterizlm paratzlbercza-.

124

losis, Mycobacterizlm intracelltllare, My cobacteritlm marinm, Mycobacterizlm simiae and 'Mycobacteritlm lafa' and compare them with the sequences of M . leprae and M. ttlberctllosis (Kempsell e t al., 1992). The results provide insight into sequence elements important to the transcription of the rm operon and into the processing of pre-rRNA and pre16s rRNA. The particular strain of M. simiae investigated is as effective as M . leprae in protecting mice against infection with M . leprae (Singh e t al., 1989), and ' M . ltlftl' shares with hl.leprae great sensitivity to diaminodiphenyl sulphone (Dapsone, DDS ; Portaels, 1980). METHODS Materials. All general chemicals were of analytical grade. Polynucleotide kinase and T 4 ligase were supplied by Pharmacia. Sequenase (U SB) sequencing kit was purchased from Cambridge Biosciences. [35S]dATPolS and [ Y - ~ ~ P I A T P were from Amersham. Geneclean kit was purchased from Bio 101 AmpliTaq D N A polymerase was supplied by Perkin Elmer Cetus. Oligonucleotide primers were prepared with an automated D N A synthesizer (model 370A, Applied Biosystems). I

M yco bacterial pre-rRN A Table 1. The mycobacterial strains used in this study* Species*

Mycobacterizlm avizlm (M.av.) Myiobacterizlm paratzlberculosis (M.pa.) Mycobacterkm intracelhlare (M.in.) Mycobacterium lzlfzl' (M.lzl,) Mycobacterizlm simiae (Mi.) Mycobacterium marinum (M.ma.) Mycobacterizlm leprae (M.le.1 Mycobacterizlm tuberculosis (M.tzl.)

Strain

Approx. time for colonies to develop on agar medium (d)

NCTC 8559 Clinical isolate NCTC 10682

t

14

> 28

TMC 5135 ( M . habana) Clinical isolate Armadillo tissue H3A

14 14 10 7 Non-culturable 21

* The shortened name, shown in parentheses, is used in Fig. 3. M . habana was first described following its isolation from cases of pulmonary tuberculosis (Valdivia Alvarez e t a/., 1971), and has been found to be closely related to M. simiae serovar 1 (Meissner & Schroder, 1975); currently isolates are formally referred to as M . simiae (Bruckner & Colonna, 1993). ' M . lafa' is an environmental mycobacterium isolate from Zaire. t The strain was kindly provided by Dr S. R. Pattyn, Institute of Tropical Medicine, Antwerp, Belgium. Bacterial strains and cultures. Mycobacterial strains are listed in Table 1, together with an indication of the growth rate of the parent mycobacterium. All mycobacterial cultures were maintained on Lowenstein-Jensen agar slopes and grown for use in liquid culture at 37 OC on Modified Dubos medium (Dubos & Davis, 1946). T o modify mycobacterial cell walls for subsequent genomic DNA isolation, glycine was added to 200 mM after 2 weeks growth and incubation continued for a further 2 weeks, after which time the cells were harvested. Other bacterial cells were grown on LB medium (Maniatis e t al., 1982) at 37 "C. Competent cells of Escbericbia coli strain DH5aF' were used for transformation with the recombinant plasmids pUC8, pUCl8 and pBluescriptI1 I Tween 80, 10 mM EDTA, 1 mM 2mercaptoethanol and left at -20 "C for 16 h. An equal volume of chloroform was added to the suspension and mixed vigorously. After centrifugation, the supernatant was removed to a fresh tube, overlaid with 2.5 vols ice-cold ethanol and mixed by inversion. The D N A precipitate was recovered by centrifugation and washed with 70% (v/v) ethanol. This was then resuspended in 10 mM TrisfHCl (pH 7-6), 10 mM EDTA, 100 mM NaCI, 1.0 % (w/v) SDS, 0.5 mg proteinase I< ml-' at 37 "C for 4-6 h. The SDS was removed by incubation at 4 OC

for 2 h followed by centrifugation. The supernatant was subjected to two rounds of chloroform extraction ; the DNA was then precipitated and resuspended in TE buffer by conventional means. Isolation of RNA from E. coli. Exponential-phase cells (10 ml) were collected and resuspended in 15 mM Tris/HCl, 0.45 M sucrose, 8 mM EDTA, 1 YO(w/v) SDS and 100 pg proteinase I< ml-'. After incubation for 15 min on ice, the protoplasts were harvested and resuspended in 10 mM Tris/HCl, 10 mM NaC1, 1 mM sodium citrate and 1.5% (w/v) SDS, mixed gently and incubated at 37 OC for 5 min. SDS, protein and D N A were precipitated by adding 250 p1 saturated NaC1. After centrifuging, RNA was precipitated with ice-cold 100 YOethanol. The precipitated RNA was washed with 7 0 % ethanol, and then redissolved in diethyl pyrocarbonate-treated water. The RNA was stored at -70 OC until use. Primer extension. The oligonucleotide primer, 5'-CTA T T G A G T T C T CAA ACA AC-3', which is complementary to nucleotides 37-56 (Fig. 3) was end-labelled with 32Pby means of T 4 polynucleotide kinase, then gel-purified and dissolved in double-distilled H,O. The 32P-labelled primer (10 ng) was added to 7 pl H,O containing 20-30 pg total RNA. The mixture was denatured at 97 OC for 10 min and 2 pl of 2 x annealing buffer (50mM PIPES pH 6.4, 2 M NaC1) was added. The nucleic acids were annealed at 52 "C for 30 min, then at 42 "C for 20 min, after which 90 p1 reverse transcriptase buffer (60 mM NaC1, 10 mM Tris/HCl p H 8.3, 10 mM D T T , 8 mM MgCI,, 1 mM [dCTP, dGTP, dATP and dTTP] and 50 pg actinomycin D ml-l) containing 100 units Avian Myeloblastosis Virus (AMV) reverse transcriptase (Pharmacia) ml-' was added to each reaction. The reaction mixture was incubated at 42 "C for 1 h, extracted twice with phenol/chloroform, ethanol-precipitated and washed twice with 70 % ethanol. The extension products were separated on a 12 % polyacrylamideurea gel and visualized by autoradiography. DNA polymerase chain reaction (PCR). Bacterial DNA (1-100 ng) was subjected to PCR in a total volume of 50 pl, with 1 unit of T q polymerase (Advanced Biotechnologies), 50 mM KC1, 10 mM Tris/HCI p H 8-3, 1-5mM MgCI,, 0.01 YO gelatin, 100 pmol of each of two appropriate primers and

125

Y. JI, M. J. C O L S T O N a n d R. A. C O X

200 pM of each d NT P (dATP, dCTP, dGTP, 'T'TP). The 50 pl mixture was covered by 50 pl of light mineral oil. The relevant gene fragment coding for spacer-1 was synthesized using primer combination 1 (5'-GCC AAG GCA TCC ACC A T G C-3' and 5'-ATT GAC GGG GGC CCG CAC AAG CG-3') and the gene fragment coding for the leader region was synthesized using primer combination 2 (primer L1 5'-GGG T T G CCC CGA AGC G-3' and 5'-CAC TGC TGC CTC CCG T A G G-3'). Amplification was achieved using 36 cycles. The reaction mixture was heated to 94 OC for 1 min, kept at 58 OC for 1 min, then heated to 72 "C for 2.5min, and this cycle was repeated 35 more times. Finally, the solution was heated to 94 "C for 15 s, then kept at 58 "C for 1 min and at 72 OC for 5 rnin. PCR cloning. The products of PCR were separated by agarose gel electrophoresis. DNA was recovered from the gel by using glass milk (Vogelstein & Gillespie, 1979). D N A was resuspended in 20 pl water. Bluescript plasmid was digested with EcoRV and incubated with Tag polymerase (1 unit per pg plasmid per 20 pl vol.) using standard buffer conditions (50 mM KC1, 10 mM Tris/HCl p H 8.3, 1.5 mM MgC1, and 200 pg BSA ml-l) in the presence of 2 mM d T T P for 2 h at 70 OC. The addition of a single thymidine at the 3'-end of each restricted vector will facilitate the ligation with PCR products w h c h are found to have an overhanging adenosine residue at the 3'-end (Clark, 1988). After phenol extraction and ethanol precipitation, the vector was ready for cloning. The ratio of vector to insert was kept 1/1, 1 pl 10 x ligase buffer (100 mM Tris/HCl p H 8.0, 75 mM MgCl,, 10 mM ATP, 100 mM DTT and 2 mg gelatin ml-l) and 1 p1 ligase enzyme (1 unit p1-l) were added and the mixture was incubated overnight at 16 "C. Sequencing double-stranded DNA. At least three colonies from the same ligation reaction were selected, cultured and plasmid DNA isolated. Sequences of the inserted DNAs were determined. This procedure reduces the possible errors caused by PCR amplification (Saiki e t al., 1988). The DNA was denatured in 0.2 mM EDTA (30 min at 37 "C). The mixture was neutralized with 3 M sodium acetate and DNA was precipitated with ethanol. DNA was resuspended in 7 pl H,O, then 2 pl Sequenase reaction buffer and 1 pl 3 pM primer were added. The annealing reaction was heated to 65 "C for 2 min and slowly cooled to 37 OC, then 5.5 p1 of labelling mix containing 25 mM DTT, 10 pCi (370 kBq) [35S]dATPolS,1:5 d N TP mixture and 2 units T7 DNA polymerase was added. This mixture was divided equally into four tubes each containing 2.5 pl of an appropriate termination mix (ddNTP), which were incubated at 37 "C for 5 min. The reactions were stopped by addition of 4 pl stop mix and samples were analysed by denaturing polyacrylamide gel electrophoresis on 8 M urea/6 YO polyacrylamide/0*5-2.5 9'0 TBE buffer-gradient gels. Computer-aided analysis of the alignment of nucleotide sequences of the leader and spacer regions was performed using PILEUP, which is part of the GCG sequencing analysis software package (Devereux e t al., 1984). PILEUP can perform automatic multiple sequence alignments according to the relationship between the sequences.

RESULTS The 5'-end of pre-rRNA of M. tuberculosis The transcription start site of the rnz operon is marked by the 5'-end of pre-rRNA. An attempt was made to identify this site by the primer extension method using M . tzlberczllo.ris rRNA (25 pg per assay) as the substrate. No product was detected. This negative result suggests that 126

processing rapidly follows transcription so that the ratio pre-rRNA :rRNA is likely to be inversely proportional to the doubling time of the bacterium. For slow-growers like M . tzlberculosis it appears that pre-rRNA is not sufficiently abundant to be detected by our procedures. An alternative strategy was used, namely the promoter region and part of the 16s rRNA gene of the mycobacterial rm operon was cloned into a plasmid which contained a cat gene without its promoter as described in Methods. The recombinant plasmid pJYlO4 was used to transform E. coli and transformants were identified by

T C G A 1 2 3

I......,

............................................

fig. 2. Identification of the start of transcription of the rrn operon of M. tuberculosis expressed in E. coli. The isolation of RNA, primer extension analysis and electrophoresis of the products are described in Methods. Lane 1, extension products from RNA (12 pg) isolated from E. coli transformed with plasmid ply104 with the promoter in the correct orientation to allow expression of the cat gene. The faster-moving component corresponds to free primer. Lane 2, as for Lane 1 but using 24 pg RNA as substrate. The faster-moving component corresponds t o a partial product terminating close t o the end of the Box B sequence (nucleotide 19, Fig. 3). Lane 3, extension products using RNA (20 pg) isolated from E. coli transformed by the parent plasmid, which does not contain the M. tuberculosis promoter. The size of a product was deduced from the sequencing ladder obtained by using the same primer for the sequencing reactions as for primer extension. The arrow indicates primer extension products of interest.

sequenoe

-Primer

Mycobacterial pre-rRNA

-

4-

=-.%ern

Helix L 1 101

..CTAGCACI

C C C C G T L T G T G G G T A T G G - C l AA....-TTT

-Helix L2 -+

8 u1

b 0 (0

a v,

-helix

53 -stern

M tw M lfl M av M lu

M SI M le M tU

4

1853

GC.CAGACAC GC.CAGACAC GC-CAGACAC GC.CAGACAC TC.CAGACAC CCACCAACAC

ATtATGCCAE

Helix L3 __H_ Hehx L4-

helixS2

stern-^

M le M tu

c 0

-

-

=G

C-helixS1

Stem (invanant)-

-helix

S4c

-

stem ___C

-

Stem c(invanan1) ACTATTGGGC CCTGAGACAA ACTATTG6GC CCTGAGACAA ACTATT6G6C CCTGA6ACAA A C f A T T G C G t CtTGAGACAA A C l G T T 6 G G T CCTGACGCAA A C T G T l G G 6 1 CCTGAGGCAA

CA CA CA CA CA CA

Fig. 3. Sequences o f the 5’-flanking, 165 rRNA coding and spacer-I regions of the rrn operon of several slow-growing mycobacteria. The sequences were aligned using PILEUP (see text). Except for M. leprae and M. tubercu/osis, gene sequences (see Fig. 1) were amplified and sequenced as described in Methods. (a) See Liesack et a/. (1990, 1991), Sela & Clark-Curtiss (1991). (b) See Kempsell et a/. (1992). t The available sequence for the rrn operon of M. bovis (nucleotides 95-1902) (Suzuki et a/., 1988) is the same as the M. tubercu/osis sequence. * Denotes the consensus site for the start of transcription (see text). 0 , Deletion o f nucleotide residue. Nucleotide positions are measured from the 5‘-end o f primer L1. Positions 1-216 are 5‘-flanking sequences. Nucleotides 3-216 (Ll-L214) inclusive comprise the leader region o f pre-rRNA, which extends from the 5’-end, the start of transcription, t o nucleotide 1 o f the 165 rRNA coding region. Positions 1753-1902 (51-51 50) span spacer-la sequences. Box A, Box B and Box C are putative antitermination elements implicated in transcription of the gene. The possible secondary structure o f the transcript (see Fig. 4a, b) are delineated by vertical divisions and horizontal arrows. Stem sequences have the potential t o form base-pairs with spacer-I sequences (and vice versa). Stem (invariant) marks a 31 nucleotide sequence which is identical in all eight species and which may interact with the complementary, c (invariant) sequence o f spacer-la. The 165 rRNA coding region (nucleotides 217-1752) is boxed and it i s numbered within the box

their resistance to chloramphenicol. The RNA fraction was isolated and used as a substrate in the primer extension assay. The results presented in Fig. 2 indicate a leader region of 191 nucleotides. The transcription start site is found at the 5’-end of the Box B element identified previously (Kempsell e t al., 1972). This site also corresponds to the third nucleotide from the 5’-end of primer L1 which was used to amplify the leader region of r m sequences (see below). Nucleotide sequence of the 5’-flanking, 165 rRNA and spacer-I regions of pre-rRNA Two parts of the r m operon of slow-growing mycobacteria (see Fig. 1) were investigated because of their importance to pre-rRNA structure. Both parts were amplified by PCR, and the products were cloned and sequenced as described in Methods. One part, of approx. 572 bp (positions 1-572, Fig. 3), comprised 5’-flanking sequences (approx. 216 bp) and part of the 16s rRNA coding region (approx. 356 bp). The other part, of approx. 936 bp (starting from position 1133, Fig. 3), comprised part of the 16s rRNA coding region (approx. 616 bp), the spacer-1 region (approx. 280 bp) and the 5’-end of the 23s rRNA coding region (approx. 40 bp). The principal results are summarized in Fig. 3. 1. The 165 rRNA coding region. The sequence data are considered in three sections. One section reveals that the 5 ’-terminal sequence is highly conserved. The 5’-terminal residue of mature 16s rRNA (position 1 in the boxed part of Fig. 3) corresponds to the 5’-terminal residue reported for M . h i s on the basis of primer extension experiments (Suzuki e f al., 1988). Another section confirms that the 3’terminal sequence, the anti-Shine-Dalgarno sequence, implicated in the binding of mRNA to the ribosome (Shine & Dalgarno, 1774) is identical in all the species studied.

The remaining partial sequence data for M . avium, M . parafzzberczzlosis,M . intracellzzlare, M . simiae and M . marinum are in accord with the data of Rogall e t al. (1970b), who amplified and sequenced the coding region of 16s rRNA downstream from the 5’-end and upstream from the 3’end. The mature 16s rRNAs of the species studied are very closely related in nucleotide sequence and there are approximately 20 differences on average in the sequence of one species compared with another (see Table 2). A high proportion of these differences are located in helix 10 of the variable2 region (see Fig. 3 of Kempsell etal., 1792, for scheme for 16s rRNA secondary structure). In general, the sequence of helix 10 is a characteristic feature according t o the scheme for mature M. tubercu/osis 165 rRNA (Kempsell et a/., 1992). Position 1 was assigned by analogy with the 5’-end reported for the very closely related 165 rRNA of M. bovis (Suzuki et a/., 1988). The broken line marks the hypothetical 5’-end of the 165 rRNA coding region (see Fig. 4a, b). Positions 179-202 of the 165 rRNA coding region comprise helix 10 according t o the scheme for 16SrRNA secondary structure proposed in Fig. 3 of Kempsell e t a/. (1992). Helix 10 provides a useful signature sequence which i s characteristic of a particular species (Rogall eta/., 1990a).

127

I‘. JI, M. J. C O L S T O N a n d R. A. COX

Table 2. Comparison of properties of the leader region of mycobacteria Species

Leader (bases)

Sequence differences Leader*

First 150 nucleotides

16s rRNA (helix l0)t

(W$

of spacer-1 region

M . avium M . paratuberculosis M. intracellzalare ‘M . hfu’ M . simiae M. marinum M. leprae M . tuberczilosis

187 183 189 188 205 198 207 191

0 4 10 16 44 45 64 82

* See Fig. 3. t See Fig. 3, $ See Rogall et al. (1990b). available. I( Not available.

0

s

5 9 22

s

29 34

0 0 6 2 6 6 4

5

100 99.9 99.4

II

96-3 98.6 > 98.5 98.6

%Thesequence of the 16s rRNA gene is not

of a particular 16s rRNA. Data for helix 10 are presented as evidence which confirms the identity of the mycobacterial species studied. II. 5’-flanking region. The principal features are as follows. (i) The Box B antitermination element (see Fig. 3) is conserved among all the species studied, since an appropriate product was obtained by PCR in each case. O n the basis of our previous work (Cox e t al., 1991) we infer that there are unlikely to be more than four mismatches between primer L1 and its target. (ii) A stretch of 31 nucleotides was found to have an identical sequence in each of the species studied. This region includes the Box A and Box C antitermination signals and the RNase 1x1 cleavage sites which were identified previously (Kempsell e t al. , 1992). (iii) Downstream from the conserved section of 31 nucleotides the leader region has the capacity to fold back upon itself to form five helices (helix L1 to helix L5) as judged by sequence homologies and homologous substitutions. M. tzhrczrlosis was found to be exceptional since it appears to form three rather than five helices. The partial sequence of the leader region of M. bovis (Suzuki e t al., 1988) is identical with the corresponding sequence found for M. tzlberczdosis. Sequence data for other members of the M. tztberculosis family are not yet available. In all of the species studied, including M . ttlberczrlosis, the sequence 5’-(T/C)(T/C)CG-3’ (nucleotides 110-1 15, Fig. 3 ) was found in the hairpin loop region of helix L1. The sequence might be implicated in a tertiary interaction (see below). 111. The spacer-I region. The spacer-1 region of rm operon was amplified, cloned and sequenced as described in Methods. This region, which extends from the 3’-end of 16s rRNA to the 5’-end of 23s rRNA, comprises approximately 280 bp. The sequence data €or the part of the pre-rRNA transcript (approx. 145 nucleotides) impli128

Sequence similarity of 16s rRNA

cated in the interaction with the leader region are presented in Fig. 3. The number of nucleotide differences, compared with respect to M . avium, is given in Table 2 for each of the species studied. The principal features are set out below. (i) There are two purine-rich tracts within the first 18 nucleotides at the 5’-end of spacer-1. The first tract at the 5’-terminus has the potential to interact with the antiShine-Dalgarno sequence (Shine & Dalgarno, 1974) at the 3’-end of 16s rRNA, and the second tract has the potential to interact with the 5’-end of 16s rRNA. Thus it is possible that these first 18 nucleotides of spacer-1 play a key role in the processing of pre-rRNA. (ii) Sequence homologies and homologous substitution provide evidence for the potential of spacer-1 to form three helices, viz. helix S2, helix S3 and helix S4. (iii) The hairpin loop region of helix S4 contains the sequence CGGG (nucleotides 1830-1 833, Fig. 3), which is complementary to the sequence 3’-GC(T/C)(T/C)-5’ found in the hairpin loop region of helix L1 of the leader region. This complementarity suggests the possibility of interaction between helix L1 and helix S4 in pre-rRNA. (iv) The putative stem formed by interaction between the leader and spacer-1 regions is stabilized by 50 or more base-pairs. The spacer-1 sequence complementary to the leader sequences includes a highly conserved region of 31 nucleotides, with 27 positions being the same in all cases. Two transitions and two deletions account for the four differences. One difference (U t)C transition) leads to an extra A - C mismatch in the putative RNase I11 processing site. A scheme for the secondary structure of pre-16s rRNA The data presented in Fig. 3 for the leader and spacer-1 regions of pre-rRNA form the basis for a scheme of secondary structure which is supported by sequence

Mycobacterial pre-rRNA tuberculosis

(a) M. leprae

- Y-end

Y-end

helix, s1

helix S1

helix S 1

s2

helix 52

helix 52 20

I

helix L5 helix L3

helix L4 helix L3 helix L2

helix Lm

helix L2

UA !U? U? ?C fUC G. G I

30

U U U

53

helix 53 helix L 1

s4

helix LI helix 5 4

Fig, 4. Possible scheme for the secondary structure of part of the transcript of the rrn operon of slow-growing mycobacteria, and comparison with B. subtilis. (a) M. leprae, (b) M. tuberculosis, (c) 6.subtilis (rrn0). The schemes for (a) and (b) are based on the data of Fig. 3. Scheme (a) is typical of the slow-growers studied and (b) is the only variation. Scheme (c) is based on the sequence data of Ogasawara et a/. (1983). The 5’- and 3’-ends of 165 rRNA are shown by arrows and the 165 rRNA sequences are boxed. Part of the loop of helix L1 and of helix 54 is boxed t o indicate a possible interaction between the two motifs in the tertiary structure. The capacity of the 5’-terminal sequence of spacer-1 to interact with both the 5’- and 3’-ends of 165 rRNA i s illustrated. L10, etc., refer to the leader region starting from the consensus 5’-end of pre-rRNA. 510, etc., refer to the spacer sequences starting from the 3’-end of mature 16s rRNA (see also Fig. 3). In (a) and (b) arrows between residues L40 and L50, and between 5115 and 5120, indicate RNase Ill cleavage sites. The arrow following 5130 indicates the continuation of the polynucleotide chain. In (c) the direction of the polynucleotide chain is shown by arrows which point towards the 3‘-end.

homologies and homologous substitutions. The scheme allows for a wide variation in nucleotide sequence. For example, M . az&m and M . leprae have leader regions of 187 nucleotides and 207 nucleotides, respectively, there being a minimum of 64 nucleotide differences. In contrast, the 16s rRNA sequences are more than 98.5% similar (see Table 2), there being three differences in helix 10 of the V2 region of the mature 16s rRNA (see Table 2). This scheme offers insight into the processing of pre-rRNA to yield 16s rRNA. The scheme proposed for 211. leprae (see Fig. 4a) is common to all the species studied with the exception of M . tzaberczalosis (see Fig. 4b), which has two fewer helices in the leader region. The scheme has two novel features. First, the 18 nucleotides at the 5’-end of spacer-1 have the ability to bring the 5’- and 3/-ends of 16s rRNA close together. Secondly, sequences in the hairpin loop region of helix L1 and helix S4 are complementary and therefore have the potential to interact. The high proportion (more than 75%) of

nucleotides involved in base-paired interactions is also noteworthy. DISCUSSION The methods used in this study make the leader region of the rrn operon as readily available as 16s rRNA sequences. It is now feasible to amplify the leader region, the 16s rRNA coding region, the spacer-1 region and the 5’-end of 23s rRNA in one o r two simple steps. Previously, it was necessary to follow the traditional cloning procedures in order to obtain the sequence of the leader region. The results presented above extend our knowledge of the rrn operon of slow-growing mycobacteria, including its transcription start site, and provide further insight into the phylogeny of slow-growing mycobacteria, including M . leprae and M . tziberculosis. 129

1’. JI, M. J. C O L S T O N a n d R. A. COX

-35 box

-10 box

1 M. tuberculosis 5 ‘ GGTCTTGACTCCATTGCCGGATTTGTATTAGACTGGCAGGGTTG 3 I M. leprae

S‘GGACTTGACTCCTCTGCTGGATCTGTATTAATCTGGCTGGGTTG

B. subtilis

5’AAAGTATTGACCTAGTTAACTAAAATGTTACTATTAAGTAGTCG 3 ‘

3‘

................................. ..................,...,.............................. ............... ...................... .... .................... ........ . ............................................. ........ Fig. 5. Sequence comparison of the promoters and transcription start sites of rrn operons of M. tuberculosis (Kempsell et a/., 1992), M. leprae (Sela & Clark-Curtiss, 1991) and B. subtilis (Ogasawara et a/., 1983). 1, Transcription start site inferred by Kempsell et a/. (1992) by analogy with the start site for other rrn operons and tRNA genes. This site i s within one nucleotide of the start sites observed for M. tuberculosis (see Fig. 2 ) and M. leprae (Sela & Clark-Curtiss, 1991) shown in bold type.

., ..............................................................................

Table 3. Comparison of the invariant Box A and Box C motifs of mycobacterial rrn operons with those of other species Species* Gram-positive Actinomycetes Mycobacteriaa Streptomycesb Bacilli B. szlbtilis‘ M ycoplasma M . capricolzld M . lyopneztmoniae“ Gram-negative

E. calif

Nucleotide sequencet +

Box A

+

-+

Box C

+

5’-TGTTGTTTGA GAAJCTJ,CAATA GTGTGTTTGG T-3’ 5’-CGTTCCTTGA GAA CT CAACA GCGTGCCAAA A-3’ 5’-AGTTCCTTGA AAA CT AAACA AGACAAAACA A-3’ 5’-CGATCCTTGA AAA CT AAACA AAACAAAAAT A-3’ 5’-AGATCCTTCA AAA CT AGGCA TATAAAAAAA A-3‘ 5’-AGCTCCTTAA CA

NI4

CATAAAAGGA C-3’

*For details see: a, Fig.3; b, Wezel etal. (1991), Pernodet etal. (1989); c, Ogasawara etal. (1983); d, Iwami et al. (1984); e, Taschke & Herrmann (1986); f,Li et al. (1984).

t The vertical arrows indicate RNase 111 cleavage sites in M . smegmatis (data not shown) and B. szlbtilix (Ogasawara et al., 1983).

Functional aspects of the rrn operon I. Transcription start site. The primer extension experiment (Fig. 2) shows that the rrn operon of M . tubercdosis, expressed in E . coli, has a single promoter which is situated close to the Box B element. A similar location was reported for the promoter activity of the M. lfprae rrn operon (Sela & Clark-Curtiss, 1991). There are clear homologies between the promoters of rrn operon of M . t.ubercu1osi.r and M . leprae. There are also homologies between the mycobacterial promoters and the second promoter (P2) of the rrnO operon of Bacill~sszibtilix (Fig. 5). Although the rrnO operon has two promoters, examination of the transcripts revealed that P2 is predominant (Ogasawara e t al., 1983). Further study of the mycobacterial promoters in a mycobacterial expression system is needed to confirm the deductions made from the expression of the mycobacterial rm operon in E. coli.

II. Production and processing of pre-16s rRNA. Pre16s rRNA is likely to be produced by RNase 111, which introduces a double cleavage in the stem (see Fig. 4a, b). Taking M . leprae as an example, pre-16s rRNA is likely to comprise 16s rRNA with approximately _ _ 160 nucleotides 130

upstream from the 5’-end and approximately 113 nucleotides downstream from the 3’-end. Two other RNases are implicated in the maturation process: one which cleaves R,U to generate the 5’-end, and another which cleaves U,A to generate the 3’-end of 16s rRNA. This pathway, leading to mature 16s rRNA, is in accord with that proposed for E. coli (for a review see Apirion & Miczak, 1993). However, except for RNase 111, few details are known about the enzymes that are involved. (i) Generation of the 5’-end of 16s rRNA. The potential 5’-end of 16s rRNA is marked by helix L5, helix Sl and helix S2, as illustrated in Fig. 4(a). In each case there is either a G,U or an A,U bond in which the G or A residue is not base-paired but the U residue is paired with an A residue in the first base-pair in a run of either six or seven base-pairs. We propose that the 5’-end of 16s rRNA is generated by the cleavage of this RpU bond. The predicted 5’-end of 16s rRNA is ,UUUUUGU. . . for M . leprae, M. marinam, ‘M . luf.’ and M. intracelidare,and is ,UUUUGU for M . tidercdosis, M. bovis, M. avhm, M . paratubercdosis and M . simiae. The 5’-end of M . bovis 16s rRNA, determined by the primer extension method (Suzuki e t al., 1988), was reported as ,UUUGU. The difference of one U

Mycobacterial pre-rRNA residue in the observed and predicted sequence is within experimental error. (ii) Generation of the 3’-end of 16s rRNA. The sequence of the anti-Shine-Dalgarno element at the 3’-end of 16s rRNA and the contiguous purine-rich tract at the 5’end of spacer-1 are the same in all the slow-growers studied. The two regions have the capability of interacting to form a helical loop as shown in Figs 4(a) and 4(b) respectively for M. leprae and M . tuberculosis. According to this scheme, the 3’-end of 16s rRNA of the slow-growers is generated by the cleavage of a U,A bond, the U residue being located within a loop of non-base-paired residues. 111. Comparison of mycobacteria with other genera. The single r m operon of slow-growing mycobacteria has features in common with the rrnO operon of B. subtilis (Ogasawara e t al., 1983). The rrnO operon is also transcribed mainly from a single promoter (P2), the leader region comprises 176 nucleotides, and the Box A element and RNaseIII processing site leading to pre-16s rRNA formation are homologous with the mycobacterial sequences. Other Gram-positive bacteria, including Streptomyces and Mycoplasma, share homologies with mycobacteria and B. subtilis (see Table 3). These homologies extend over part of the invariant 31 nucleotide stretch present in mycobacteria.

The sequence near the 5’-end of spacer-1 of B. subtilis is capable of binding both 5’- and 3’-ends of 16s rRNA (see Fig. 4c) in a structure which is comparable with the same structure of mycobacteria (see Fig. 4a, b). A number of strategies may be envisaged for highlighting the 5’- and 3’-ends of 16s rRNA in preparation for enzymic cleavage from pre-16s rRNA. For example, it is possible that in E . coli it is sequences at the 3’-end of the leader region which play a key role in presenting the 5’- and 3’-ends of 16s rRNA for processing. Phylogeny Comparison of 16s rRNA nucleotide sequences is an accepted method for establishing phylogenetic relationships in mycobacteria (see for example Rogall e t al., 1990b). Analysis of 16s rRNA sequences and the ranking of relatedness of the leader and spacer-1 sequences by PILEUP (see Fig. 3) are in accord for M. avium, M . paratuberculosis and M . intracellnlare. The PILEUP procedure places ‘M. lufu’ as a close relative of M. intracelldare (Table 2, and Fig. 3). However, there are major differences in the ranking of M . leprae, M . marinam, M. tuberculosis and M . simiae based on 16s rRNA sequences (Rogall e t ak., 1990b) and the ranking of M . simiae, M. marinum, M . leprae, M. tuberculosis given by PILEUP (Table 2, Fig. 3). Particular points of interest include the following. (1) M. leprae, although an extremely slow-grower, has a typical mycobacterial rm operon (see Figs 3 and 4 and also Sela e t al., 1989; Estrada-G e t al., 1989; Teske e t al., 1991). (2) Ranking according to 16s rRNA sequence data sets M. simiae apart as being the most distantly related to the other slow-growers whereas PILEUP (Fig. 3) places M. simiae between ‘ M . lufu’and M . marinum. The close relationship between M . simiae and M.leprae suggested by this ranking

is compatible with the observation that the M . habana strain of M. simiae very effectively protects mice against infection by M. leprae (Singh e t al., 1989). (3) hl.marinum and M. tuberculosisare as closely related as are Ad. avium and M. intracellulare (see Table 2). However, the supplementary data reveal a different situation. In total there are 10 nucleotide differences in the leader regions of M . avium and M . intracellulare but many more (82) differences between M . marinum and M . tuberczalosis which are spread throughout the region (see Fig. 3). (4) The leader sequence of M.tuberculosis (and M. bovis) downstream from nucleotide 147 (see Fig. 3) diverges from the sequences of the other slow-growers studied to an extent which sets M . tuberculosis apart, particularly with respect to the putative secondary structure of the leader region (compare Figs 4a and 4b). Thus, sequence data for the leader and spacer regions may modify and refine the view of interspecies relationships provided by comparison of 16s rRNA sequences alone. If the 16s rRNA sequences represent the hour hand of a clock measuring evolutionary time then the leader and spacer regions may represent the minute hand. Finally, the accessibility of the leader and spacer region of the rm operon to amplification by PCR has revealed new targets for species specific probes and has extended the range of options for identifying slow-growing mycobacteria.

ACKNOWLEDGEMENTS This investigation received support from the British Leprosy Relief Association (LEPRA). Yuan-en Ji is supported by a grant from LEPRA. We thank M r Joe Brock for his help in preparing . ... the illustrations.

REFERENCES Apirion, D. & Miczak, A. (1993). RNA processing in prokaryotic

cells. BioEssqs 15, 113-1 20. Berg, K. L., Squires, C. & Squires, C. L. (1989). Ribosomal RNA operon anti-termination : function of leader and spacer region Box B-Box A sequences and their conservation in diverse microorganisms. J Mol Biol209, 345-358. Bercovier, H., Kafri, 0. & Sela, S. (1986). Mycobacteria possess a surprisingly small number of ribosomal RNA genes in relation to the size of their genome. Biocbem Biup(?ys Res Cummztn 136, 1136-1 141, Bremer, H. & Dennis, P. P. (1987). Modulation of chemical composition and other parameters of the cell growth rate. In Escbericbia coli and Salmonella t_phimztrizam: Celldar and Molecztlar Biology, pp. 1527-1542. Edited by F. C. Neidhardt and others. Washington, DC : American Society for Microbiology. Bruckner, D. A. & Colonna, P. (1993). Nomenclature for aerobic and faculative bacteria. Clin Infect Dis 16, 578-605. Clark, 1. M. (1988). Novel non-templated

nucleotide addition reactions catalyzed by prokaryotic and eukaryotic DNA polymerases. Nzacleic Acids Res 16, 7677-9686. Cox, R. A., Kempsell, K., Fairclough, L. & Colston, M. J. (1991). The 16s ribosomal RNA of Mycobacteriztm leprae contains a unique

sequence which can be used for identification by the polymerase chain reaction. J &fed Microbioi 35, 284-290.

131

J’. JI, M. J. C O L S T O N and R. A. C O X

Devereux, J., Haeberli, P. & Smithies, 0. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 12, 387-395.

Rogall, T., Wolters, J., Flohr, T. & Bdttger, E. C. (1990b). Towards a phylogeny and definition of species at the molecular level within the genus Mycobacterium. Int J Syst Bacteriol40, 323-330.

Dubos, R. J. & Davis, B. D. (1946). Factors affecting the growth of tubercle bacilli in liquid media. J Exp Med 83, 409-423.

Saiki, R. K., Gelfand, D. H., Stoffel, S., Sharf, 5. J., Higuchi, R., Horn, G. T., Mullis, K. B. & Erlich, H. A. (1988). Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487-491.

Estrada-G, 1. C. E., Colston, M. J. & Cox, R. A. (1989). Determination and evolutionary significance of nucleotide sequences near to the 3’-end of 16s ribosomal RNA of mycobacteria. F E M S Microbiol Lett 61, 285-290. Iwami, M., Muto, A., Yamao, F. & Osawa, 5. (1984). Nucleotide sequence of the rrnB 16s ribosomal RNA gene from Mycoplasma capricolum. Mol & Gen Genet 196, 317-322. Kempsell, K. E., Ji, Y-E., Estrada-G, 1. C. E., Colston, M. J. & Cox, R. A. (1992). The nucleotide sequence of the promoter, 16s rRNA

and spacer region of the ribosomal RNA operon of Mycobacterium tuberculosis and comparison with Mycobacterium leprae precursor rRNA. J Gen Microbioll38, 1717-1727. Li, 5. C., Squires, C. L. & Squires, C. (1984). Antitermination of E . coli rRNA transcription is caused by a control region segment containing nut-like sequences. Cell 38, 851-860. Liesack, W., Pitulle, C., Sela, S. & Stackebrandt, E. (1990). Nucleotide sequence of the 16s rRNA from Mycobacterium leprae. Nucleic Acids Res 18, 5558. Liesack, W., Sela, S., Bercovier, H., Pitulle, C. & Stackebrandt, E. (1991). Complete nucleotide sequence of the Mycobacterium leprae

23s and 5s rRNA genes plus flanking regions and their potential in designing diagnostic oligonucleotide probes. FEBS Lett 281, 114-118. Maniatis, T., Fritsch, E. F. & Sambrook, 1. (1982). Molecular Cloning: a Laboratoy Mangal. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. Meissner, G. & Schroder, K.-H. (1975). Relationship between hl_cobacterium simiae and Mycobacterium habana. A m Rev Respir Dis 111, 196-200. Ogasawara, N., Moriya, 5. & Yoshikawa, H. (1983). Structure and organization of rRNA operons in the region of the replication origin of the Bacillus subtilis chromosome. Nucleic Acids Res 11, 6301-631 8. Pernodet, J.-L., Boocard, F., Alegre, M.-T., Gagnat, J. & Guerineau, M. (1989). Organization and nucleotide sequence analysis of a ribosomal RNA cluster from Streptomyces ambofaciens. Gene 59, 33-46. Portaels, F. (1980). Unclassified mycobacterial strains susceptible to dapsone isolated from the environment in central Africa. Int J Lepr 48, 330. Rogall, T., Flohr, T. & Bdttger, E. C. (1990a). Differentiation of hfycobacterium species by direct sequencing of amplified DNA. J Gen hlicrobiol136, 1915-1920.

132

Sela, 5. & Clark-Curtiss, 1. E. (1991). Cloning and characterization of the Mycobacterium leprae putative ribosomal RNA promoter in Escherichia cob. Gene 98, 123-127. Sela, S., Clark-Curtis, J. E. & Bercovier, H. (1989). Characterization and taxonomic implications of the rRNA genes of Mycobacteritlm leprae. J Bacterioll71, 70-73. Shine, J. & Dalgarno, L. (1974). The 3’-terminal sequence of Escherichia coli 16s ribosomal RNA : complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci U S A 71, 1342-1 346. Singh, N. B., Lowe, A. C. R. E., Rees, R. J. W. & Colston, M. J. (1989). Vaccination of mice against Mycobacteritlm leprae infection. Infect Immun 57, 653-655. Suzuki, Y., Nagata, A., Ono, Y. & Yamada, I. (1988). Complete nucleotide sequence of the 16s rRNA gene of Mycobacterium bovis BCG. J Bacterioll70, 1631-1636. Taschke, C. & Herrmann, R. (1986). Analysis of transcription and processing signals of the 16s-23s rRNA operon of Mycoplasma h_yopneumoniae.Mol & Gen Genet 205, 434-441. Teske, A., Wolters, J. & BBttger, E. C. (1991). The 16s rRNA nucleotide sequence of Mycobacterium leprae : phylogenetic position and development of D N A probes. FEMS Microbiol Lett 80, 231-238. Valdivia Alvarez, 1.. Suarez Mendez, S. & Echemendia Font, M. (1971). Mycobacterium habana : probable especie dentro de las micobacterias no clasificadas. Bol Hig Epidemiol9, 65-73. Vogelstein, B. & Gillespie, D. (1979). Preparative and analytical purification of DNA from agarose. Proc Natl Acad Sci U S A 76, 615-619. Wezel, G. P. V., Vijgenboom, E. & Bosch, L. (1991). A comparative study of the ribosomal RNA operons of Streptomyces coelicolor A3(2) and sequence analysis of rrnA. Nucleic Acids Res 19, 4399-4403. Winder, F. G. & Rooney, 5. A. (1970). Effects of nitrogenous components of the medium on the carbohydrate and nucleic acid content of Mycobacterium tuberculosis BCG. J Gen Microbiol63, 29-39. Young, R. A. & Steitz, 1. A. (1978). Complementary sequences 1700 nucleotides apart form a ribonuciease I11 cleavage site in Escherichia coli ribosomal precursor RNA. Proc Natl Acad Sci U S A 75, 3503-3597. Received 21 May 1993; revised 17 July 1993; accepted 22 July 1993.