Structure, evolution, and regulation of a fast skeletal muscle - NCBI

2 downloads 0 Views 1MB Size Report
carboxyl-terminal halves of troponin I isoforms and that the amino-terminal extension of the cardiac isoform originated by splice-junction sliding. Comparison of ...
Proc. Natl. Acad. Sci. USA Vol. 82, pp. 8080-8084, December 1985 Developmental Biology

Structure, evolution, and regulation of a fast skeletal muscle troponin I gene (muscle genes/exon organization/functional domains/homologous sequences/coordinate transcription)

ALBERT S. BALDWIN, JR.*, ELLEN L. W. KITTLER,

AND

CHARLES P. EMERSON, JR.

Department of Biology, Gilmer Hall, University of Virginia, Charlottesville, VA 22901

Communicated by Oscar L. Miller, Jr., August 8, 1985

ture of this troponin I gene now provides a basis for examining the evolutionary origins of the multiple functional domains of troponin I, the evolutionary origins and divergence of troponin I protein isoforms, and the molecular basis of the coordinate transcriptional regulation of troponin I and other muscle genes during muscle development.

ABSTRACT The complete structure of a quail fast skeletal muscle troponin I gene was determined by nucleotide sequence comparison of troponin I genomic and cDNA sequences. This 4.5-kilobase troponin I gene has eight exons. The actin-binding domain of troponin I is encoded by a single exon, whereas the troponin C-binding domain is split into at least two exons. The exon organization of the fast troponin I gene suggests that gene conversion directs the nonrandom conservation of the carboxyl-terminal halves of troponin I isoforms and that the amino-terminal extension of the cardiac isoform originated by splice-junction sliding. Comparison of the structure of the troponin I gene with the structures of other contractile protein genes reveals homologous sequences in their 5' flanking regions and similar large introns that separate protein-coding exons from 5' nontranslated exons. These common structural features may function to coordinate the activation of contractileprotein genes during myogenesis.

MATERIALS AND METHODS Isolation and Mapping of the Troponin I Gene. The isolation of overlapping quail fast skeletal muscle troponin I cDNA clones has been described (7). One of these clones, cC120, was used as a hybridization probe (8) to screen a genomic library of -20-kilobase (kb) partial EcoRI restriction fragments of quail embryo DNA cloned in the A Charon 4A vector

(9-12).

DNA Sequence Analysis. Genomic and cDNA nucleotide sequences were determined by the method of Maxam and Gilbert (13). Sequences were 95% confirmed by sequencing both DNA strands. DNA sequences were edited and analyzed by use of the Stanford Molgen computer system programs, a VAX 11/750 computer, and the GenBank data base (distributed by Bolt, Beranek and Newman, Inc.,

Troponin I is a family of three muscle-specific myofibrillar proteins involved in the calcium regulation of contraction in cardiac and in skeletal muscle (1). Troponin I proteins have multiple functional domains that are distinct and bind with high affinity to actin (2) and troponin C (3) (see Fig. 1). The interactions of these domains regulate actomyosin ATPase activity in resting and contracting muscle (4). Troponin I also interacts functionally with other muscle proteins, including troponin T (5). The specific domains involved in these other interactions have yet to be identified. Amino acid sequence studies have revealed that avian and mammalian muscles differentially express three related troponin I isoforms specific to cardiac and to fast and slow skeletal muscles (1). Comparison of their amino acid sequences indicates that these three troponin I isoforms are encoded by separate genes that arose by gene duplication prior to the divergence of birds and mammals more than 250 million years ago. However, the evolution of these troponin I protein isoforms has been strikingly nonrandom (Fig. 1). Their amino-terminal halves are highly divergent, whereas their carboxyl-terminal halves are highly homologous (1). Furthermore, the cardiac isoform has a 26-residue aminoterminal extension. The functional significance and evolutionary origin of the divergent and homologous sequences in troponin I isoforms are unknown. The troponin I gene is one of the set of muscle-specific genes that are coordinately activated during the differentiation of embryonic myoblasts (6). A previous report described the isolation and nucleotide-sequence analysis of cDNA clones encoding quail fast muscle troponin I and other regulated muscle protein mRNAs (7). Here we report the complete structure of the quail fast skeletal muscle troponin I gene and its upstream transcriptional promoter. The struc-

Cambridge, MA). Nuclease S1 Mapping. The 5' and 3' gene transcript boundaries were mapped by nuclease S1 analysis (14), as described

(15). RESULTS Cloning of the Quail Fast Skeletal Muscle Troponin I Gene. Partial DNA sequence analysis of three cDNA clones isolated from a cDNA library of quail myofiber-specific RNAs revealed that these clones encode the fast skeletal muscle isoform of troponin I (7). The cDNA clones, cC106, cC112, and cC120, are overlapping and encode all of the proteincoding region of the troponin I mRNA as well as 5' and 3' nontranslated sequence. Troponin I genomic clones were isolated by screening an embryonic quail genomic DNA library in X Charon 4A, using cC120 as the probe. One genomic clone recovered in this screen, gClTnI4 (previously referred to as XQETnI4), has a 16-kb genomic DNA insert that includes the complete troponin I gene. The Nucleotide Sequence of the Troponin I Gene. The nucleotide sequence of the region of gClTnI4 homologous to fast muscle troponin I cDNA clones was compared with the nucleotide sequences of the cDNA clones cC106, cC112, and cC120. The sequences of these cDNA clones are overlapping and include 43 base pairs (bp) of 5' nontranslated sequence, the entire 546-bp offast skeletal TnI protein coding sequence, and 140 bp of 3' nontranslated sequence. Fig. 2 shows the

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Abbreviations: bp, base pair(s); kb, kilobase(s). *Present address: Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139. 8080

Proc. Nati. Acad. Sci. USA 82 (1985)

Developmental Biology: Baldwin et al. 23

Fast

4

5

6

7

NH2-H--if+-II--+

Slow NH2-

TnC-binding domain

UU+ COOH

11 I+-f-U -U-U--COOH

'HIH-H

m

IHi

Cardiac NH2

8081

:u

Actin-binding domain

U

H

COOH

Ilqa~

FIG. 1. Amino acid sequence comparison of rabbit cardiac and fast and slow skeletal troponin I protein isoforms. These data, based on the studies of Wilkinson and Grand (1), show troponin I protein sequences aligned for maximum homology. With the exception of two residues, the chicken fast skeletal troponin I sequence is identical to the conserved residues of the three rabbit isoforms. Residues shared by all isoforms are shown as vertical lines, deletions are shown as ( ), and the positions of introns in the quail fast skeletal troponin I gene are numbered and shown as v (see Fig. 3). The actin-binding and troponin C (TnC)-binding domains were established by peptide-affinity binding studies (1).

sequence of 4500 bp of genomic to the TnI cDNA clones.

gClTnI4 DNA homologous

An unambiguous exon arrangement of the troponin I gene by comparing genomic DNA and cDNA sequences and by nuclease Si mapping of the 5' and 3' gene boundaries. The quail fast skeletal muscle troponin I gene is dispersed in eight exons (Figs. 2 and 3). In all cases introns contain consensus 5' and 3' splice junctions (16). Genomic troponin I exon sequences and cDNA sequences are identical, establishing that the gClTnI4 genomic segment is a was derived

functional fast skeletal muscle troponin I gene. The quail troponin I protein sequence derived from the nucleotide sequence (Fig. 2) is identical to that of the chicken fast skeletal muscle troponin I protein (1). Genomic blotting experiments showed that the restriction fragments of troponin I sequences in gClTnI4 match those in quail genomic DNA, indicating that fast troponin I is a, single-.!copy gene (data not shown). A computer search of the protein-coding potential of the intron sequences of the fast muscle troponin I gene did not reveal additional exons

CTGCTGGGATAGGCTGGGAATATTCCAGGTCATTCCATCGCCTCTGCTTGGCTGTGGTATGGGCTCTGGGTGACTGCACAGCTTGGGGTGCCTCCATACCAG CAGTGGCTCCATAGCTTTGAGTAGTTTTGCTTTTTTTTCTTAATAAAAACTTAACATCTGGTGTTAAAATATTTCACATGG

GGAC-GACCAGCAAATAT

TTTCCTGCATGCTAATCTGA

GGTAAAAAGGATCCAGCCTGAATACTCAGAATGCTCTCCACGTGCGCCAGGGCTGATGTTGTGGGATATGCAATGCTGAGGCTCTTCCACCAGTTTCCCCAG

jC~n GCCC TfA-AAGAGAAGGTGG Exonl GATTTTCTGCTGAGCCACCCGAGAAGCC TTAACCTCCTTCCCCAGCCAGCTGCTT GCGGCTTGCCAGTCAGTTCTGGCCAGCCGGCGGGGCTCTCTGTTCCTTCTCCGGACTCG GGCAGAGCTGCCTCTCCTGTGGAGCTGCCTAGCCTGCCCCAGCCAAGGGAGGTGCATGGGGGCTCTTGCACA ~~IGGTCTGAAGGGGTCT

G/GTGGCCCTATGCAGTCTCTACTGCCAGCTGTTCTACCGGGTGGAGGGAGCACTTTGCCACCTAATGTGGGGAGTAAAGCTGGTGAGGAATCTCTTGGG

C~A-TCGGGTCTGATGGTCACCTCGTCCTTGCTGTCAGGGGTCCGTGATTGGAGCTCGTCTTCCCACTGCCTCTTCCCTTAGTTTATCCCACTTGTCTTG

CGCTCTCTTAGTTCCCTACTCCTCCTAGAGGAAAAGCATCCCTGCGTGGCACTTGCGTTCTCCCATCGCACCCTCTCTTGCCTCTGTGTTAGCTCTTCCTCC CAAATTGCCCCTCAGCACCTGAGTATCCGT---

22 bp

---GGACCGTAACTCAAGGGCTGGAGTCCCTTGTACTGACTATACCCTGCTATGGGACTTTTCTGGCATTG

CTTCCCACGCCCTTGCGTTTCAGGGATAGGGAATCCAGCAAAATACTCCTGTCCTCATCCGTGTGACTTTGCCACTGTCCCCCACCTCCCGAGTCCGGGGCT GGCTGCGTCTGAGGAGA---

23 bp

---CACCTTGTGCAGCTCCCCAGCCATTTCCAGAAGCACTATCAGAGCACTTCCCCCCACCCCTTGCTCTTCCCAGCAATGTG

TGTGCCACACATTTCTAGATAAGGTTCCTCGGGAGCTGGCCCGCAGCCTTACCTCTCCTCGGAAACACCCTGTAACATTTGCCTCTAGGTTGCAGACGCTGTT ATTAGGGGTTCATTCTCTTAAGGGATGGCTCATTTGGCTTGACAGTGCCCTGCTTACCCATGATGGTATGAGAGGAAAACTGCTTAATAATAAAGCGGATTTC AGGGAAAACTGGACCGAAGGTAACTTCCCTAGAAATTCTCCAGCCTTTCCAACTAGCAATGCTTTCGCCCATCTTCATATCTGTCTGAGGTTCTCCCTGCCCA TGCAATTACACCATTCGGCAACCTTGTACCTAGGTCATGCCTGCTCTGTCCTTTGCCAAAGACCCCCAGCTTTGCTGATCTTGTCATGACGTGTCGTGCA TTTCCCCTTTGGAGGCAAAAGATTAAGCCTGAGTATCTTGAGCCTCCGAAGCACACAGAGTGAAGTGCTTTTTCTTCCTCTTACTCATCAGTAAAACGCATAC GTCCATCAAGAAAAGCAAGCTATAGAGTACGTCACATCTCTTTTTCAGCCTGCACACTTGTCCGGGCATAGTGGCTGAGCGGCAGTGCTTCCTTCTCAT ATAGATGCAGCTAGGTGGAATGAGATGGAAATGGCAGCGTTTCCTGGCGCAGGGAAGCTGCATCAAGGCCGTGCACATGATAATTTGAATGTTTTGTTC Met Ser As ExonII GACACAG/GTTTGTAACCATCTAAAGCAAG ATG TCT GA/GTAAGTTCCTATTTTCTGTTTCCAGTATGTCGCTGGTGTTCAAACTTTTGTTCAGCTGCCAGCTTTTACATACATACT ExonIII p Giu Glu ATTCTGCTTGTTTTCAATGAGGATGCTAACTGCTTTTTTICTCCTCTTTCTCTTCCTTCCCTTCCTTGCCCATCCGCTCTCCCTGCACAG/T GMA GAG/GTAAGTGTGTCCTGCTGGGC

TTTTCCCTTTAGCAGAGCTCATAGCATCGATCCTCGCTTTGTAGGAAAAGCAGTTATGGCGTGGTGGGAGGAAGGAATGTCCCATGGTCTGGCTTTCCTGATACCTTCTCGAGAAGMAGA

GAAGACGAGCAAATCCTTTTAAGGGCTGCTGCAGTGGATGTTTCCCCTTTCTCGTCCCTACCGCCCGAGGGGTGCGAGAACGCTGCCGAACAAACCAAG CCACACTCAATGCCACAGCGGGATCAAAAATCCCCTATGTGCTATGCCAATGGTGCATGGGGCCCTTGGCCGGTGCGCAGGAGGGTGGGAGGCACCTG ACAAATAGCATTTTCGAAGAGAACGCCACGGCTGGAGACTCGAGAATTGAGCAGATTCTCTGTTAAGGTGCTGATTGTCTAGCTAACAAGATGAGCAG

ExonIV Lys Lys Arg Arg A AGGGGA TTGGGACAAATTGTACAGCTGGGGATTGGTGTGAGAGGAAAATGAGGATCCTCATACTACCTGTTTATTTTCCAMGGCAGGA Ala Ala Thr Ala Arg Arg Gin His Lou Lys GCA GCC ACC GCC CGG CGG CAG CAC CTG AAG/GTACGTGGCCCTGCTGGGGCTGGGCGCGGGTGTGTCCTGATCATCCCATCTCCCACCCAGCAGTTGTCACTCCACCTGCC

encodling amino acid 120 238 358 472 589 708 828 948 lo68 1188 1308 1428 1548 1668

17888 1908 20282

21484 2263

2379 2499

26191 27393 28595 29737 3083

TCCAAAGCCATGTCGACGTGTGGGTTCCTATACCTTCTCTCACCTACCAGTACACATAAGACGGTGCGGGGGAGCTGGGTCAGCTCATCCGACCTCCCAGCG 3203 ExonV Ser Ala Met Leu CTCT C 33191 AGTGG TCCCATGATCCCTTTGTCAGGACCTGGATTTGCATGAGACTCCATCTTCTCTTGCCCGCCCTGTTTCTCCTTCGTTGGCTCAG Gin Leu Ala Val CAG CTT GCT GTC Leu Pro Gly Ser CTC CCA GGA TCC

Thr ACT Met ATG

Glu Ilie Giu Lys Giu Ala Ala Ala Lys Giu Val Giu Lys Gin Asn Tyr Leu Ala Giu His Cys Pro Pro Leu Ser GMA ATA GAA AAA GAA GCA GCT GCT AMA GAA GTG GMA MG CAA AAC TAC CTA GCA GAG CAT TGC CCT CCT CTG TCC Gin Glu Leu Gin CAG GAA CTT CAG/GTMAGAGCTGCTTTAGCCCTTTCAGAAAGTATTAGGTTCCACTACTCAGCCTGCTGCATTTAGATGTCCTTCCCATCTTGACCA

TTTCCTGTCCCTGGGGGCCTCTCTTCACCATGGAG6'C-A-GGACAGGGACTGGCACCTCTGGGCTGTGTTGGGTGGTGGCTGGAGTACAAGGATATGAGGGCAGCATTTGTGTGAAGGTG

GGTGCCAAAGCCAGAGCTGGTTGCAAATACTTTCACATCTCAGTTTTTGGATTGATGGTGTCTGACCTTCTCAGCTTCGGAGCTGGTTGCAAATACTTT TCATCTCAGTTTTTGGATTAATGGTGGCTGATCTTCTCAGCCTTGGGATAAGAATGTAAGGGGCTTTCAAGATAAGGAGAGGAATGGTCAGAGAGGGAAT ExonVI Giu Leu Cys Lys Lys Leu His Ala Lys Ilie Asp CCTAGCGTGCTCTATTATTCAACATCCGTCTACTTTTCCTCTCTATTATTCAACATCCGTCTACTTTTCCTCCCAG/GAA CTG TGC AAA AAG CIT CAC GCC MAG ATA GAT Ser Val Asp Glu Glu Arg Tyr Asp Thr Giu Val Lys Leu Gin Lys Thr Asn Lys Glu7 ICA GIG GAT GAG GMA AGG TAT GAC ACA GAG GIG AAG CIA CAG AAG ACT MAC AAG GAG/GTGAGGTCAGCATGCGAGCATCTGGCTCATCCTTTGTTCCTCTC ExonVII Lou Glu 7Asp Leu Sor Gin Lys Lou Phe Asp Lou Arg Gly CATGCTGCTTGCCATGGTCTCTCICACTCATGCCCTGCTCTTTCTGCCCGCTTCTCTCIIGCCCACAG/CTG GAG GAC CIG AGC CAG AAG CIG TTT GAC CIG AGG GGC Lys Phe Lys Org Pro Pro Lou Arg Arg Val Arg Met Sor Ala Asp Ala MetLou Arg Ala Lou Lou Gly Sor Lys His Lys Val Asn Hot AAG TIC MAG AGG CCA CCC CTG CGC AGG GIG CGI ATG ICC GCI GAT GCC AIG CTG CGI GCC CTG CIA GGC ICC AAG CAC AMA GTC MAC AIG Asp Lou Arg Ala Asn Lou Lys Gin Val Lys Lys Glu Asp Thr Glu Lys GAC CII CGG GCC AAC CTG AAG CAA GTC AAG AAG GAG GAC ACA GAG AAG/GTACCACTTTCATCCCATTAAGGCATAAGCTTCCAACTTTTGGGGATGACATCTCC

CTGCAGCAGACATGACTTATTCAGTATGTCTTCCGCCTCTGTTTCTCCTCAAC~'C-TATGTATTGTAACTCGAGTCCGCCCGAAAACATACGCAGGGTGA

CTGGGGTCCCAGCTAGTTTTCAAGGAGGTGAGTGTGAGCATTGAATAAGGTCCCAAGGTGATGGGAAAAGGCACTTTTGGTGTCATACCCATGGGTTCA ExonVIII Glu Lys Asp Lou Arg Asp Val Gly Asp Trp Arg Lys Asn Ilie Glu Glu Lys Ser Gly Met Glu Gly Arg TCGTCATGCACACTTATCCCTCTGCAG/GAG AAG GAC CIC CGT GAT GIG GGT GAC TGG AGG AAG AAC All GAG GAG MAA ICC GGC AIG GAG GGC AGG Lys Lys Met Phe Glu Ala Gly Glu Ser * AAG AAG ATG TTT GAG GCT GGC GAG ICC TAA GCACTGGTCTCTCCACTCTTGCCATTTCCGCCCTCTTCCATCCCCTCCTGAGCATGGCCACAGCTGTGAGCATGGCCACC TCCCTGCCCTGAACCTCAACACGTCCTCACCATGCATTGAACCCACTGACCGGGTGCTGTTTCTCTGGCTTTGTGAAGAGCTGCAGTCTGAAAGAGCAGTGTA AA T-AA GCITTCATG GGAYGAGYGGGGATGTGGCCTGCTCTGGTGGGG8CTGAGGGTGCTTAGGGCTGTGGGAACACACTAAGGATAC

3409 3520 364

37376 38828 3991 4092 4199

4289 4393

45131

46333 4729

4839 4957 5027

FIG. 2. Nucleotide sequence of the quail fast skeletal muscle troponin I gene. The derived troponin I amino acid sequence is shown above the nucleotide sequence; the termination codon is indicated by three asterisks. The GT and AG intron splice junctions are underlined. The major transcription start site is indicated by a horizontal arrow, and poly(A)-addition sites are marked by arrowheads. The upstream "TATA," "CCATT," and muscle gene homologous sequence and the downstream AATAAA polyadenylylation consensus sequence are boxed. Two short gaps (22 and 23 bp) in the sequence of the first intron are indicated.

8082

Proc. Natl. Acad. Sci. USA 82

Developmental Biology: Baldwin et al. IV

11 III

V

VI VII

(1985)

Vill

AATAAA 5 ATG T-

TATA

I

I

I II

1-2 2-4

0

1

2

5-18

3

INm 19-61

_

* m

62-91 92-150

4

151-182

5kb

Troponin C binding domain

M

Actin binding domain

FIG. 3. The intron/exon organization of the fast troponin I gene. Exon sequences are shown as blocks and introns as lines. The positions of the TATA homology upstream, the ATG translation initiation codon, the TAA stop codon, and the AATAAA polyadenylylation consensus sequence are indicated. The amino acid codons in exons are numbered below. The positions of the troponin C- and actin-binding domains are indicated by the hatched areas.

of cardiac or slow muscle troponin I isoforms. The 5' and 3' Boundaries. Upstream of the genomic sequences homologous to the 5' nontranslated sequences of cC106 are the sequences TTTTATA and TAAA, similar to the TATA consensus sequence present in the upstream promoter regions of most eukaryotic, polymerase II-dependent structural genes (17). Identification of this region as the troponin I promoter was further established by nuclease S1 mapping from a Bgl I site in the 5' nontranslated sequence of both the genomic DNA and cC106 (15). The predominant 5' mRNA terminus, accounting for -'60% of the Si-protected fragments, is located at the G residue 30 nucleotides downstream of the first T of the sequence TTTTATA (Fig. 2). Another 5' terminus was detected by S1 analysis =43 nucleotides upstream of the major terminus and 29 nucleotides from a TAAA sequence, suggesting that the fast muscle troponin I gene has two transcription start sites. The predominant 5' terminus detected by S1 analysis has been confirmed by primer-extension analysis, but the putative upstream second terminus has not been detected by this method and requires further investigation. The 3' boundary ofthe troponin I gene was identified by the presence of an AATAAA sequence, characteristic of the polyadenylylation consensus sequence of eukaryotic polymerase II-dependent structural genes (18), located 190 nucleotides downstream of the TAA translation stop codon (Fig. 2). S1 mapping analysis (15) reveals that troponin I RNAs have three termini that map 12, 15, and 43 nucleotides 3' of the final A of the AATAAA sequence (Fig. 2). Thus, the 5' and 3' boundaries of this fast skeletal muscle troponin I gene are within a 4500-bp region that encodes 830 bp of troponin I mRNA sequence dispersed in eight exons. The troponin I mRNA encoded by this gene includes 82 bp of 5' nontranslated sequence from the predominant transcription start site, 546 bp of protein-coding sequence, and =200 bp of 3' nontranslated sequence. Intron/Exon Arrangement of the Troponin I Gene. Exons of the quail fast skeletal muscle troponin I gene range in size from 300 bp for exon VIII to only 7 bp for exon III, and all are bounded by GT/AG splice junctions. To our knowledge, exon III is the smallest yet reported for any gene (19). The first exon encodes only nontranslated RNA sequence and is separated by a relatively large (1700-bp) first intron separating the translation start codon in the second exon (Figs. 2 and 3). The first five exons and their associated introns comprise '80% of the gene sequence. This gene organization concentrates the DNA encoding the carboxyl-terminal half of the protein (Fig. 3). The carboxyl-terminal 32 amino acids, the translation stop codon (TAA), and the entire 3' nontranslated sequence are included in exon VIII. Troponin I proteins have two known functional domains. The troponin C-binding domain of fast muscle troponin I is located between amino acid residues 10 and 22 (1). This domain is encoded by exons IV and V (Figs. 1 and 3) and thus sequences

is split by an intron. Actin-binding domains and actomyosin ATPase-inhibition domains of fast troponin I are located in the region of amino acids 98-119 (1). These domains are encoded entirely within exon VII (Figs. 1 and 3). 5' Flanking DNA: Sequence Homology with Muscle a-Actin Genes. Nuclease S1 mapping identified a major transcription start site for troponin I mRNA (Fig. 2). Upstream of this site is a sequence, TTTTATA, that is similar to the TATA homology (17) found in approximately the same location in most RNA polymerase II-dependent genes. Two sequences, CCATT and CCAT, similar to the CCAAT homology (23), are located 100 and 80 nucleotides upstream of the major start site (Fig. 2). Thus the immediate 5' flanking DNA of the troponin I gene resembles the promoter regions of other eukaryotic genes. Troponin I gene expression is coordinately regulated with other muscle-specific genes during myogenesis (6), and thus it is of interest to examine whether these coexpressed genes share upstream muscle-specific sequence homologies. A computer search of troponin I 5' flanking DNA reveals a sequence homologous to a 20-bp sequence found 100 bp upstream of the mRNA start site of both chicken and rat skeletal muscle a-actin genes (20, 24, 25). This homologous troponin I sequence is located 329 bp upstream of the major mRNA transcription start site, matches the chicken a-actin sequence in 12 of 17 positions, and is related to sequences in the 5' flanking regions of a skeletal muscle myosin light chain 3 gene (21) and a cardiac myosin heavy chain gene (22) (Fig. 4).

DISCUSSION The structure of a quail fast muscle troponin I gene has been determined unambiguously by comparing nucleotide sequences of fast skeletal muscle troponin I cDNAs (Fig. 2) with homologous troponin I genomic nucleotide sequences. This sequence comparison, along with nuclease S1 mapping to define the 5' and 3' transcribed gene boundaries, demonstrates that the 830-bp troponin I mRNA is encoded by a 4.5-kb gene comprised of eight exons (Fig. 3). The exon sequences of this quail troponin I gene are identical to those of fast skeletal muscle cDNAs and encode a protein identical to the chicken fast troponin I isoform (1). The structure of troponin I, as well as other vertebrate myosin and actin muscle genes (21, 22, 24-27), now provides a basis for understanding the evolution of contractile protein gene families and their muscle-specific regulation. The structure of the fast troponin I gene reveals a different exon organization for the functional actin-binding and troponin C-binding domains of the troponin I protein. The troponin I actin-binding domain is encoded exclusively within one exon, exon VII (Fig. 3), and thus exhibits an exon-domain relationship common to many proteins (28, 29). It will be of interest to determine whether the variety of genes

Developmental Biology: Baldwin et A

Proc. Natl. Acad. Sci. USA 82 (1985) 5'

3,

Chick a-actin

-100 G C C C G A

Rat a-actin

-100

Quail Troponin I Chick Myosin Light Chain 3 Rat Cardiac Myosin Heavy Chain

IG

c a c c

C C C a Aj c a c

CAAA TA T

c

CAAA TA T

-329 G g a C G A

c c a g

C A A A T A T

-256 G C C C G g

a c a a g g

-411 a g a C a g

g g g

I

Consensus Sequence

8083

a

A A T A T

C A A A T A T

2

3

|Ga gC aC C aG gA ac ca ac ac gC gA A A T A T 9 9 9 9 g 2

3

Fia. 4. Sequence homologies upstream of quail fast troponin I, chicken and rat skeletal muscle a-actin (20), chicken skeletal myosin light chain 1 (21), and rat cardiac myosin heavy chain (22) genes. Sequences are aligned directly with the chicken actin gene homologous sequence and are subdivided into two homologous regions (1 and 3, boxed) that flank a more variable central core sequence region (designated 2). Uppercase letters represent nucleotides identical with nucleotides in the chicken actin sequence. Lowercase letters indicate variability at these positions. Numbers to the left of the sequences show distance (bp) from the 5' border of homologous sequences to transcription start sites. The consensus sequence below shows the most prevalent nucleotide at each position in the five homologous muscle gene sequences and the variable nucleotides at each position.

encoding other actin-binding proteins have protein domains and exons similar in structure to troponin I exon VII, consistent with an exon-shuffling model of protein evolution (30). It also will be of interest to compare exon structure of the actin-binding domains of the genes encoding the slow skeletal and cardiac troponin I isoforms, since the amino acid sequences of actin-binding domains of these three troponin I isoforms are highly conserved (Fig. 1). In contrast to the actin-binding domain, the troponin C-binding domain of the fast skeletal muscle troponin I gene (1) is split between exons IV and V. The exact borders of the troponin C-binding domain are uncertain and might extend into exon III (1). The troponin C-binding domain is located in a region of variable and conserved amino acids in the three troponin I isoforms (Fig. 1). The partial conservation of amino acid residues among all isoforms in the region split by exons IV and V of the fast troponin I gene suggests that the troponin C-binding domain originated prior to the duplications of the ancestral gene that gave rise to this gene family. Future comparative analysis of the exon organization of cardiac and slow troponin I genes in this troponin C-binding-domain region, as well as the other homologous protein-coding regions, should reveal the evolutionary history of the troponin I gene family and the origins of the functionally specialized domains of the cardiac, fast, and slow troponin I isoforms of vertebrate muscles. Structural comparison of cardiac, fast, and slow troponin I proteins indicates that these isoforms are evolving nonrandomly. The amino-terminal halves of these isoforms are highly divergent, whereas their carboxyl-terminal halves are highly homologous over long stretches of protein sequence having no known functional or structural importance (ref. 1 and Fig. 1). The cardiac troponin I isoform also has a 26 amino acid amino-terminal extension perhaps functionally important in cardiac muscle function through phosphorylation of a serine in this extension-peptide region (31). The exon structure of the quail fast troponin I gene suggests that gene conversion might maintain the homology of the carboxyl termini of these three isoforms and that exon-junction sliding may have played a role in the origin of the cardiac isoform amino-terminal extension. The introns of the fast troponin I gene are distributed unevenly along the RNA coding sequence, leading to a heterogeneous distribution of exon sizes (Fig. 3). Exon sizes

range from 7 bp to 300 bp. Only two of the eight exons (exons V and VII) are in the 140-bp size class, the most prevalent in eukaryotic genes (32). The first five exons are flanked by the majority of intron DNA (Fig. 3). This gene organization disperses the sequences encoding the amino-terminal half of the protein over 3.5 kb of DNA and concentrates the exons encoding the carboxyl-terminal half of the protein into =1 kb of DNA. This organization could favor gene conversion at the 3' ends of the isoform genes, thereby maintaining the striking homology of the carboxyl-terminal halves of the isoforms. Gene conversion is one mechanism known to act on gene families to maintain their sequence homogeneity (33). Gene conversion has been proposed to account for sequence homologies in two t-globin genes (34) and in the variable regions of immunoglobulin genes (33). Such a gene-conversion mechanism predicts that the exon/intron organization of the three troponin I isoform genes is similar in their carboxylterminal gene regions and that sequences of intron 7 of these genes are more homologous than those of introns in the more divergent 5' regions. In this regard, it also will be of interest to determine whether the troponin I isoform genes are closely linked, an organization that might enhance such geneconversion events. The intron/exon organization of the fast troponin I gene also suggests an origin for the 26-residue amino-terminal peptide in the cardiac troponin I isoforms (Fig. 1). If the cardiac troponin I gene is shown to have an intron positioned immediately downstream of the ATG initiation codon, similar to intron 2 of the fast troponin I gene, then the cardiac amino-terminal extension likely arose as an insertion by a splice-junction-sliding mechanism that created an exon larger by 78 bp than the corresponding exon in the fast muscle gene. This type of mechanism has been proposed to account for insertions in the serine protease family (35). Alternatively, the amino-terminal differences between the cardiac and skeletal muscle isoforms may have evolved by exon insertion or by exon loss. The troponin I gene shares two structural features with evolutionarily unrelated but muscle-specific contractile-protein genes. These shared features may be functionally and evolutionarily significant in directing their muscle-specific expression. First, troponin I and other muscle genes have a homologous sequence upstream of their promoters. This 17-bp sequence is located 100 bp upstream of the transcrip-

8084

Developmental Biology: Baldwin et al.

tion start site in both chicken and rat a-actin genes (20) and at -329 in the fast troponin I gene (Figs. 2 and 4). These sequences align into three regions: a G+C-rich region and an A+T-rich region that flank a 4-bp central core region of more variable sequence (Fig. 4). Single-base shifts and inversions in these alignments increase the extent of homology of these sequences. Homologous sequences are also located at -411 in the 5' upstream region of the cardiac myosin heavy chain gene (22) and at -256 of the myosin light chain 3 gene (21). The identification of homology between cardiac and skeletal muscle gene promoters is not unexpected, since cardiac and skeletal genes are coexpressed in embryonic skeletal muscle (36-38). A homologous sequence has not yet been identified in the flanking regions of myosin light chain 1 and myosin light chain 2 gene promoters (21, 26), but data on the 5' flanking sequences of these genes are limited. Based on the distant location ofthis sequence from the troponin I gene, this sequence could be further upstream than the published sequences of these genes. Extensive computer analysis of mammalian library gene sequences revealed that these homologies in the 5' flanking regions of muscle genes are muscle-specific and are not found in nonmuscle gene sequences. Furthermore, recent gene-transfection studies show that the region of the troponin I homologous sequence at -329 is required for the muscle-specific expression of troponin I genes (15) and that muscle-specific regulatory sequences also are localized in the skeletal a-actin gene promoter region (39). We suggest that these homologous sequences are cis-acting transcriptional control elements (see ref. 40 for review) involved in the coordinate control of muscle genes, analogous to consensus promoter sequences involved in the regulation of heat shock (41) and steroidinduced genes (42). The second structural feature that the troponin I gene shares with a-actin (24, 25, 27), cardiac myosin heavy chain (22), myosin light chain 2 (26), and myosin light chain 3 genes (21) is that a large intron separates the promoter and first exon encoding nontranslated RNA sequences from the proteincoding region in the second exon. This gene organization could facilitate recombinational genome "shuffling" of muscle gene promoters and their transcriptional regulatory elements and raises the possibility that the first intron of muscle genes has a regulatory function. We thank Irene Althaus and Maggie Ober for their expert assistance with the DNA sequence and computer analysis of the troponin I gene and Dr. William Pearson for generously providing his sequence homology computer programs and his expertise for analysis of the troponin I promoter sequence homology. We thank Dr. Stephen Konieczny for generously providing his data on the 5' and 3' boundaries of the troponin I gene and for extensive discussion and thank Gladys Bryant for preparing this manuscript. Data handling and analysis were made possible in part by the use of hardware from the University of Virginia Clinical Research Center, Grant MO1 RR 00847. This investigation was supported by a research grant from the National Institutes of Health. 1. Wilkinson, J. M. & Grand, R. J. A. (1978) Nature (London) 271, 31-35. 2. Potter, J. D. & Gergely, J. (1974) Biochemistry 13, 2697-2703. 3. Head, J. F. & Perry, S. V. (1974) Biochem. J. 137, 145-154. 4. Wilkinson, J. M., Perry, S. V., Cole, H. A. & Trayer, I. P. (1972) Biochem. J. 127, 215-228.

Proc. Natl. Acad Sci. USA 82

(1985)

5. Horwitz, J., Bullard, B. & Mercola, D. (1979) J. Biol. Chem. 254, 350-355. 6. Devlin, R. B. & Emerson, C. P., Jr. (1978) Cell 13, 599-611. 7. Hastings, K. E. M. & Emerson, C. P., Jr. (1982) Proc. Nati. Acad. Sci. USA 79, 1153-1157. 8. Rigby, P. W. J., Dieckmann, M., Rhodes, C. & Berg, P. (1977) J. Mol. Biol. 113, 237-251. 9. Lawn, R. M., Fritsch, E. F., Parker, R. C., Blake, G. & Maniatis, T. (1978) Cell 15, 1157-1174. 10. Benton, W. D. & Davis, R. W. (1977) Science 196, 180-182. 11. Maniatis, T., Hardison, R. C., Lacy, E., Lauer, J., O'Connell, C., Quon, D., Sim, G. K. & Efstratiadis, A. (1978) Cell 15, 687-701. 12. Southern, E. M. (1975) J. Mol. Biol. 98, 503-517. 13. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol. 65, 499-560. 14. Berk, A. J. & Sharp, P. A. (1977) Cell 12, 721-732. 15. Konieczny, S. F. & Emerson, C. P., Jr. (1985) Mol. Cell. Biol. 5, 2423-2432. 16. Sharp, P. A. (1981) Cell 23, 643-646. 17. Goldberg, M. (1979) Dissertation (Stanford Univ., Stanford,

CA). 18. Proudfoot, N. J. & Brownlee, G. G. (1976) Nature (London) 263, 211-214. 19. Tate, V. E., Finer, M. H., Boedtker, H. & Doty, P. (1983) Nucleic Acids Res. 11, 91-104. 20. Ordahl, C. P. & Cooper, T. A. (1983) Nature (London) 303, 348-349. 21. Nabeshima, Y., Fujii-Kuriyama, Y., Muramatsu, M. & Ogata, K. (1984) Nature (London) 308, 333-338. 22. Mahdavi, V., Chambers, A. P. & Nadal-Ginard, B. (1984) Proc. Natl. Acad. Sci. USA 81, 2626-2630. 23. Benoist, C., O'Hare, K., Breathnach, R. & Chambon, P. (1980) Nucleic Acids Res. 8, 127-142. 24. Chang, K. S., Rothblum, K. M. & Schwartz, R. J. (1985) Nucleic Acids Res. 13, 1223-1237. 25. Zakut, R., Shani, M., Givol, D., Newman, S., Yaffe, D. & Nudel, U. (1982) Nature (London) 298, 857-859. 26. Nudel, U., Calvo, J. M., Shani, M. & Levy, Z. (1984) Nucleic Acids Res. 12, 7175-7186. 27. Fornwald, J. A., Kuncio, G., Peng, I. & Ordahl, C. P. (1982) Nucleic Acids Res. 10, 3861-3876. 28. Steinmetz, M., Moore, K. W., Frelinger, J. G., Sher, B. T., Shen, F.-W., Boyse, E. A. & Hood, L. (1981) Cell 25, 683-692. 29. Tung, A., Sippel, A. E. & Schutz, G. (1980) Proc. Natl. Acad. Sci. USA 77, 5759-5763. 30. Gilbert, W. (1978) Nature (London) 271, 501. 31. Solaro, R. J., Moir, A. J. G. & Perry, S. V. (1976) Nature (London) 262, 615. 32. Naora, H. & Deacon, N. J. (1982) Proc. Natl. Acad. Scs. USA 79, 6196-6200. 33. Baltimore, D. (1981) Cell 24, 592-594. 34. Slightom, J. L., Blechl, A. E. & Smithies, 0. (1980) Cell 21, 627-638. 35. Craik, C. S., Rutter, W. J. & Fletterick, R. (1983) Science 220, 1125-1129. 36. Minty, A. J., Alonso, S., Caravatti, M. & Buckingham, M. E. (1982) Cell 30, 185-192. 37. Toyota, N. & Shimada, Y. (1983) Cell 33, 297-304. 38. Hallauer, P. L. & Emerson, C. P., Jr. (1985) J. Cell. Biochem. 9, 65. 39. Melloul, D., Aloni, B., Calvo, J., Yaffe, D. & Nudel, U. (1984) EMBO J. 3, 983-990. 40. Davidson, E. H., Jacobs, H. T. & Britten, R. J. (1983) Nature (London) 301, 468-470. 41. Pelham, H. R. B. (1982) Cell 30, 517-528. 42. Grez, M., Land, H., Giesecke, K., Schutz, G., Jung, A. & Sippel, A. E., (1981) Cell 25, 743-752.