type and ?-type Gliadin DNA Sequences - The Journal of Biological ...

Val. 260, No. 13, Issue of July 5, pp. 82034213,1985 Printed in U.S.A.

THEJOURNAL OF BIOLOGICAL CHEMISTRV 0 1985 by The American Society of Biological Chemists, Inc.

Evolution and Heterogeneity of the a-/@-typeand ?-type Gliadin DNA Sequences* (Received for publication, August 2, 1984)

Thomas W. OkitaS, Valerie Cheesbrough, and Christopher D.Reeves From the Institute of Biological Chemistry, Washington State University, Pullman, Washington 99164-6340

Near full length cDNA clones for both a-/& and ytype gliadins were isolated and studied for sequence diversity. Based on restriction site polymorphism and cross-hybridization studies, a-/& and y-type clones could be divided into five and three homology classes, respectively. Clones representing each of the different classes weresequenced and compared. Sequence divergence between the classeswas due to single-base substitutions and to duplications or deletions within or near direct repeats. Thus, through numerous duplications and subsequent divergence, the gliadin multigene family encodes a polymorphic set of polypeptides differing in both isoelectric point and molecular size. Southern blot analysis of wheat DNA suggested that the number of genes encoding the a-/&type gliadins was extremelylarge (>lo0 copies/haploid genome). Inasmuch as hybridization patterns were the same using DNA isolated .from seeds or leaves, amplification or rearrangement of DNA does not occur during development. The complete coding sequence of a y-gliadin was similar to that observed for the a-/&gliadins, but with several notable differences. Comparison ofytype gliadin cDNA sequences showed that, unlike the conserved dodecamer repeat common to all the a-/& gliadins, the tandem repeat unit differed among ygliadin clones.

The alcohol-soluble proteins of wheat endosperm, the gliadins, are highly polymorphic and number 40 or more on twodimensional polyacrylamide gels (1, 2). The major mRNA transcripts that accumulate during the middle stages of seed development encode the ..-/@-type and y-type gliadins (3). Recent chemical (4-8)and genetic (1, 9) evidence indicates that the various gliadins are related and encoded by at least three gene subfamilies: those encoding the a-lp-type,y-type, and w-type gliadins (8).DNA and protein sequencing (10, 11) has shown that gliadins have a unique primary structure displaying six peptide domains. These domains consist of a signal peptide (domain I), a N-terminal region composed of a tandem repeat that is rich in glutamine and proline residues (domain II), and two polyglutamine stretches (domains 111 and V) interdigitated by two regions of biased amino acid composition (domains IV and VI) (10). However, it is not clear whether the structural domains displayed by the few *This is Scientific Paper 6893, Project 0590,of the College of Agriculture Research Center, Washington State University. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18U.S.C. Section 1734 solelyto indicate this fact. $ This work wassupported in part by National Science Foundation Grant PCM-8215772. To whom correspondence should be addressed.

gliadins so far studied are auniversal feature or whether some polypeptides lack one or more domains. Gene loci for the cY-/P-type subfamily are clustered on the short arms of homeologous chromosomes 6A, 6B, and 6D whereas the y- and w-type subfamilies are clustered on the short arms of chromosomes lA, lB, and 1D (9). The w-type gliadin polypeptides are distinguished by a low sulfur-containing amino acid content. In addition to these monomer proteins, an aggregate protein fractionlabeled the “low molecular weight glutenins” (12) are also encoded on the short arms of chromosomes lA, lB, and1D and appear to be tightly linked to they- and w-gliadins (12). This evidence together with the amino acid compositional data(13) suggests that the low molecular weight glutenins could be related to they-gliadins. We have made an extensive comparative analysis of cloned gliadin messenger RNA sequences in order to understand the evolution of these genes and to aid our studies of developmental expression and gene organization within this highly diverse but related family. Sequence comparison of cDNA clones containing the complete coding sequence for the CY-/@type and y-typegliadins shows that although their sequences can be divided into closely related homology classes, all of the clones code for a nearly identical primary structure containing six peptide domains. Diversification of the coding sequences has occurred through base substitutions and DNA segmental mutations (14), the latter being responsible for the size heterogeneity of the encoded polypeptides. Furthermore, unlike the conservation of the repeated dodecamer peptide observed in domain I1 of the a-/&gliadins, the repeated peptide is variable among the y-gliadins. EXPERIMENTAL PROCEDURES AND RESULTS’ RESULTS AND DISCUSSION

Hybridization Analysis of Gliadin Sequences-Gliadin mRNAs coding for a set of translation products in the size range expected for unprocessed a-/&gliadins or y-gliadins were found to be eluted collectively from either pTO-A10 ( a /@-gliadin cDNA clone) or pB48 (y-type cDNA clone) immobilized on nitrocellulose (10). Thus, the gliadins are presumably encoded by at least two large subfamilies of closely related genes. The extentof homology among and between al p - and y-type gliadin cDNA clones was determined by hyPortions of this paper (including “Experimental Procedures,” part of “Results,” Figs. 1-6 and Tables I and 11) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are available from the Journal of Biological Chemistry, 9650 Rockville Pike, Bethesda, MD 20814. Request Document No. 84M-2401, cite the authors, and include a check or money order for $8.00 per set of photocopies. Full size photocopies are also included in the microfilm edition of the Journal thatis available from Waverly Press.

8203

Heterogeneity of Wheat Gliadin Genes

8204

A.

A42

207

735

Tm - I 8°C

....

Tm -8°C 26 42 212 216 751 915 1072 1235

Be

201 633 959 B48

207 735 1037 pUC

Probe

212

735

201 205 207 211 212 216 751 915 959 1037 114 735

1072 22 633 42

1235

26 81133 B48

FIG. 7. Blot hybridization analysis of a-/b-gliadin cDNA sequences. Plasmid I)NA was denatured and applied to replicate nitrocellulose filters in the patterns shown in the keys at the hottom o f each set. In panel A , t,he filter was hyhridized to the nick-translated "'€"labeled cDNA insert indicated a t the top. Hyhridization and washing temperatures are indicatedat the left. In punel H, the filters were hyhridized with the indicated probes at T,,, - 11 "C for 12 h, washed at the T,,,(determined under the experimental conditions tobe 5 1 "C) for I 5 min. and autoradiographed.

Heterogeneity of Wheat GenesGliadin bridizing purified, nick translation-labeled cDNA inserts to nitrocellulose filters containing equivalent amounts of each plasmid (Figs. 7 and 8). At moderate hybridization criteria (7'" - 18 "C) thereis extensive cross-hybridization among all the sequences within one or the other subfamily and, therefore, extensive homology exists among the members within each subfamily. However, although there was partial hybridization between a-/& and y-type clones, the reaction was always significantly less than thatobserved between members of the same subfamily. A t high stringency (T,,, - 8 'C) the members of both subfamilies were resolved into additional hybridization classes. Five classes were seen for the a-/@-type subfamily. Class A-111, consisting of clones A201 and A1235, and class A-V consisting of a single clone, A42, displayed the least cross-hybridization with the otherclasses. Membersof classes A-I1 and A-IV cross-hybridized with each other at T,,, - 8 "C, but could be distinguished by washing the filters a t the T,,, (Fig. 7B), which was57 "C under the conditions of the experiment as determined from the thermal elution profile. Thermal elution profiles were obtained by hybridizing 3ZP-labeled A212 cDNA with dots containing the clones A207 (class A-I), A212 (class A-II), A1235 (class A-HI), A735 (class A-IV), and A42 (class A-V) and then individually washing the dots containing the hybrids at progressively higher temperatures and determining the proportion of 32P-labeledprobe released at each step using liquid scintillation counting. Relative to the self-hybrid, the.AT,,,of the cross-hybrids of A42, A735, A207, and A1235 were similar, differing by no more than 2.5 "C for the most divergent species A42 and A1235. However, under

820s

these same conditions that did not saturate the DNA dots (15),theextent of hybridization differed at T,,, - 14 "C. Relative to the total cpm measured for the A212 self-hybrid, the extent of cross-hybridization was about 69% for A735 47% for A207 and A42, and 33% for A1235. Class A-I contained 10 of the 20 a-/&typegliadin sequences analyzed and was, thus, the largest class of this subfamily. However, the relative intensity of cross-hybridization at high stringency among the members of class A-I differed depending on the specific class A-I probe used (results not shown). This suggests that most of the cDNA clones within this class were derived from distinct mRNA sequences and were not independent isolates of the same DNA sequence. This was verified by the observation that with the same restriction endonuclease different clones yielded different restriction fragment sizes on polyacrylamide or agarose gels. The y-type gliadin subfamily resolved into three hybridization classes (Fig. 8). Sequence B10-48 formed its own class, B-111, because it did not cross-hybridize with other y-gliadin probes at high criteria. Similarly, sequences B3-12 and R1133 formed class B-I while class B-I1 contained the remaining seven clones. However, even within class B-I1 differences in relative cross-homology are evident between probes B4-19 and B7-46. The grouping of the gliadin clones into homology classes was further substantiated by restriction enzyme mapping and DNA sequencing studies (see Miniprint Supplement). Size Estimation of a-/@-ISlpe Gliadin Gene Subfamily-Fig. 9 depicts the results of a reconstruction experiment in which wheat leaf DNA wasdigested with severalrestriction enzymes

37°C

47°C 848 3-12

7-46

11-35

9-73

12-73

4-19 10-48

I 5-37

A207 11-33

pUC9

FIG. 8. Blot hybridization analysis of y-type gliadin cDNA sequences. Hybridization was carried out with nick-translated 32P-labeled cDNA inserted indicated at the top of the figure. Hybridization and washing temperatures are indicated at theleft.

Heterogeneity of Wheat Gliadin Genes

8206

Probe A42, however, gave an altered pattern in which only the high molecularweight classes (1.7-30 kb) were evident when thefilters were washed at a high criterion in 0.1 X SSPE, 0.1% SDS at 68 "C (results not shown). This is consistent with the cross-hybridization studies and further supports ourconclusion that A42 represents a unique n-/&gliadin sequence. Structural Analysis of DNA Sequencing-DNA sequence XHind analyses (see Miniprint Supplement) of cDNA clones representing the various homology classes show that all of the gliadinssharethesameprimarystructure of six peptide domains. Comparison of the gliadin cDNA clones indicated that thisfamily of genes has diverged through timeby tandem duplication/deletion of DNA segments and point mutations resulting in genes that code for polypeptides of different molecular size and charge. Almost all of the DNA segmental mutations detected in the gliadin sequences are restricted to the CAA/G repeats (domains I11 and V ) and certain regions of domain 11 containing the tandem repeat. The duplication/ deletion of direct repeats and nontandem repeats that we have observedfor the gliadin coding sequenceshave also been demonstrated in many instances in Escherichia coli (16, 17). and in severaleukaryotic genefamilies (12, 17). Repeated -4.3 sequences,especially those arranged in tandem, are highlv I)prone to mutationswhich can occur bv unequal crossing over during general recombination or by slippage and mispairing 10 during replication/repair as first suggested hv Efstratiadis et al. (18). Point mutations have also contributed todivergence within both subfamilies.Although thesemutationsare randomly .. distributed throughout the cDNA sequences, most of those in the coding region are silentor lead to conservative aminoacid FIG. If. Estimation of gliadin genecopy number by Southern replacements. Certain point mutations are far less frequent blotting. High molecular weight DNA from wheat leaves was dithan would be expected if they occurred randomly. For exgested with HindllI. HnmHI, and b,'roRI. Resulting fragments ( 5 p g ) ample, 30% of all the codons are thosefor glutamine (CAA or were resolved on 0.6% agarose gels and transferred to nitrocellulose. I.anrs labeled 2, IO, and PO on the samegel contained EroRI-digested CAG), yetof the 45 point mutations ohserved in these codons, A207 representing 2, 10, and 20 copies of theinsertsequence per 29 (64%) are silent and9 (20%)lead to conservative replacehaploid wheatgenome.Southernblots were hvhridized with nick ments. The termination codons TAA and TAG have not been washed. and autoraditranslation-laheled A207 cDNA insert, observed in glutamine-rich regions. The stronglybiased usage ographed. of codons CCA and CCG for proline is also noteworthv (Table 11, Miniprint Supplement). The few mutations that result in and probed with nick-translated "P-labeled A207, a member a radical amino acid replacement are scattered throughout of homology class A-I. An amount of clone A207 DNA equivthe polypeptides, yet all of the sequences have maintained a alent to2, 10, and 20 copies was run alongside the wheatDNA similar low overall percentage of charged amino acids. Possifragments. A complex pattern of wheat DNA fragments conbly, this is a feature essential for efficient self-assemhlv of taining cu-lb-type gliadin gene sequences was observed. The gliadins into proteinbodies. Hind111 digest of wheat DNA gave a t least 10 distinct fragI t is interesting to note that the tandemly repeated peptide ment.s together with a large cluster of fragments between 13unit is a highly conserved dodecamer within the n-//f-gliadin 23 kh' which could not be resolved(Fig. 9). Many of the subfamily, but is not conserved between different y-gliadin molecular weight classes of restriction fragments contained cDNAs (see Miniprint Supplement).In clone R48 the consenmore than one copy. For instance, the 3.9- and 3.0-kb fragsus peptide is 8 amino acids long ( 3 ) ,whereas clone R11-33 ments of the Hind111 digest each contain40-60 copies/haploid contains a 14-aminoacid consensus sequence nearlyidentical genome. The hybridization pattern and signal intensities were to that seen for a y-gliadin clone from the wheat cultivar similar when wheat endosperm DNA was analyzed under the Chinese Spring (19). This 14-amino acid repeat is also seen same experimental conditions. This indicates thatamplificain clone €33-12, although it is contained within a larger 112tion and/or rearrangements of gliadin DNA segments does bp duplication separated by additional short (15-bp)duplicanot occur during seed development. tions. Furthermore, the duplications and deletions in H.7-12 Thebandpatterns were notsignificantlyaltered when have led to an apparent shift in the reading frame relative to filters were washed a t a higher criterion (0.1 x SSPE, 0.1% Rll-33. Unfortunately, R3-12wasmissing 5' sequence inSDS, 68 "C versus 2 X SSPE, 0.1% SDS, 68 "C). The hybridcludingtheinitiator codon. The high degree of homology ization patternsobtainedwith A123.5, A212, and A735 as between clones R3-12 and Rll-33 in the C-terminal region probes were similar to that displayed by A207 (Fig. 9), alsuggests that these genes are a productof a recent duplication though some differences in the band intensities were observed. within event. Thus, it appears that duplications and deletions the N-terminal region may be more frequent and less precise 'The ahhreviations used are: kh, kilohase;hp, base pairs; SDS, in the y-gliadin subfamily thanin the cu-/&gliadin subfamily. sodium dodecyl sulfate; SSPE, 0.18 M sodium chloride, 1 mM ethylenediaminetetraacetic acid, 10 mM sodium phosphate, pH 7.0. Fig. 10 compares the derived N-terminal amino acid se-

-23.7 -9.5

-6.6

20

2

f

.

~

~~

~~

~

-2.3 4-2.0

8207

Heterogeneity of Wheat GliadinGenes

A TAGAAGC C

T

T

C TACTTG CTCCCT C G

Val

L PTL eh M hu yee rst

E ”:

l

aLl$u: $ Ala

Phe

Ala

IL l ee u

Val

Val T h rA l a

fl r l rl

80

A TGCCCAAAGTGGAAGCATGTCGACTTCCGTG T 60

8”’ Ile

AS l ae r

TTG

Ala TG hMrlG ueltn

100

TTA

G AAGGCACTAGCGACGACGACAC A Trp

Pro

L e uG l nG l nG l nS e rP r oA r gG l u

FIG. 10. Comparison of a-/& and 7-gliadin cDNA sequences at the 5‘ end and comparison of the derived amino acid sequence of B l l - 3 3 to the N-terminal amino acid sequence of a ”low molecular weight glutenin”.The a-/P-gliadin sequence is derived from clone A42 (lo),the y-gliadin sequence is from clone Bll-33, and the N-terminal amino acid sequence(AG) is of the “aggregated gliadin fraction” from Shewry et al. (13). Dashes indicate gaps introduced to maximize homology.

\ \\ \ \

1 1 I I 71 %’\

I

1 r-type-

I

1

; I

! I

I

\

64%

‘62

\\

\ I

I

ti

I % II II

I

56%

I I

I

I I

I

FIG. 11. Degree of sequence homology between a-/& and y-gliadins. Diagram (not to scale) depicts the domain structure of these proteins. The per cent sequence homology detected in localized regions between clones A42 (u-lo-type) andB l l - 3 3 (y-type) is indicated.

quence of Bll-33 to the N-terminal sequence of an “aggregated gliadin fraction” (13, 20). This fraction has been shown to be equivalent to the low molecular weight glutenins (12). Since the low molecular weight glutenins andy- and wgliadins are located on the short arms of chromosomes lA, 1 8 , and lD, the relationship of these proteins is important to clarify (12). Our sequence data (Fig. 10) indicate that low molecular weight glutenins and y-gliadins are related. Only 4 of 15 N-terminal amino acids differ, three of which could be the result of a single base change. Therefore, the data shown

here together with results from genetic (9) and amino acid compositional comparisons (13) suggests that the low molecular weight glutenins are a variant form of y-gliadins. The regions of homology between y- and a-/@-gliadin sequences were discussed in a previous publication (3).Sequencing of the full length y-gliadin clone Bll-33 has extended the comparison by demonstrating that both the 5’ untranslated regions and the signal peptides are homologous between the two subfamilies. Fig. 10 shows a direct comparison of the DNA sequences at the5’ ends of Bll-33 and the a-/@-gliadin

Heterogeneity of Wheat GenesGliadin

8208

clone A42 (10). When gaps are introduced to maximize homology there is about 70% relatedness. The differences are due to single base changes that are either silent or lead to conservative amino acid replacement within the signal peptide region. As summarized in Fig. 11, there is a significant degree of sequence homology in localized regions of the a-/P- and ytype gliadins. Sequence divergence between these two subfamilies is most profound in the tandem repeat region (domain 11), although all of the peptide repeats are high in glutamine and proline residues. Recent studies have shown that the same octapeptide repeat observed for clone B48 (3) is present in B-hordein and C-hordein (21), barley proteins analogous to the y-gliadins and w-gliadins (22). Thus, the cumulative evidence supports the notion that all of the prolamine genes evolved in the Triticeae from a common ancestor (10, 23). The mechanisms and evolutionary pressures involved in this divergence remain to be elucidated. REFERENCES 1. Wrigley, C. W., and Sheperd, K. W. (1973) Ann. N. Y. Acad. Sci. 209,154-162 2. Mecham, D.K., Kasarda, D. D., and Qualset, C. 0. (1978) Biochem. Genet. 1 6 , 831-853 3. Okita, T. W. (1984) Plant Mol. Biol. 3, 325-332 4. Bietz, J. A., and Heubner, F. R., Sandersen, J . E., and Wall, J. S . (1977) Cereal Chem. 54, 1070-1083 5. Byers, M, Smith, S. J., and Miflin, B. J. (1983) J.Sci. Food Agric. 34,447-462 6. Autran, J.-C., Lew, E. J.-L., Nimmo, C . C., and Kasarda, D. D. (1979) Nature 2 8 2 , 527-529 7. Shewry, P. R., Autran, J.-C., Nimmo, C.C., Lew, E. J.-L., and Kasarda, D. D. (1980) Nature 286, 520-522 8. Kasarda, D. D., Autran, J.-C., Lew, E. J.-L., Nimmo, C. C., and Shewry, P. R. (1983) Biochim. Biophys. Acta 747, 138-150 9. Payne, P. I., Holt, L. M., Lawrence, G. J., and Law, C. N. (1982) Qual. Plant. Plant Foods Hum. Nutr.3 1 , 229-241 10. Kasarda, D.D., Okita, T. W., Bernardin, J. E., Baecker, P. A.,

Nimmo, C. C., Lew, E. L.-J., Dietler, M. D., and Greene, F. C. (1984) Proc. Natl. Acad. Sci. U. S. A. 81,4712-4716 11. Rafalski, J. A,, Scheets, K., Metzler, M., Peterson, D. M., Hedgcoth, C., and So11, D. G. (1984) EMBO J 3,1409-1415 12. Jackson, E. A., Holt, L. M., and Payne, P. I. (1983) Theor. Appl.

SUPPLEMENTARY MATERIAL TO: EVOLUTION AND HETEROGENEITY OF o-18-TYPE and I-TYPE GLIADIN DNA SEQUENCES

Thomas W. Okita. Valerie Cheesbrough and Christopher D. Reeves

EXPERIMENTAL PROCEDURES

Preparation of Wheat Nucleic Acid.

Winter wheat (cv. Cheyenne) heads

were collected 14-18 days after flowering. inrmersed into liquid nitrogen. and

stored at-8O'C. Total RNA and poly(A)+RNA fractions were obtained by guanidine-HC1 extraction and oligo(dT)-cellulose chromatography, described earlier (24). Poly(A)+RNA was further respectively, as fractionated by denaturing sucrose density centrifugation (24), and RNA species sedimenting between 10s-30s were pooled and ethanol precipitated. Wheat nuclear DNA was purified from leaf or 14-18 day old developing seed tissue by the method described by Kislev and Rubenstein (25). Construction of =DNA Librarr. Poly(A)+RNA, purified by sedimentation. used 8 8 the initial template for cDNA synthesis. The =DNA recombinant c l o n e s were obtained by employing the procedure of Heidecker and Messing ( 2 6 ) using pUC9 as the cloning vector. After second-strand =DNA synthesis, Escherichia coli _ 71-18 (27) competent cells were transformed with the =DNA _ _ plasmids and plated on enriched media containing 50 uglml empicillin, 4 0 umlml 5-bromo-4-ehloro-3-indoyl-8-Dgalactoside and 10 p H isopropylthio-8galactoside. was

Genet. 66, 29-37 13. Shewry, P. R., Miflin, B. J., Lew, E. J.-L., and Kasarda, D.D. (1983) J. EXP. Bot.3 4 , 1403-1410 14. Jones, C. W., and Kafatos, F. C. (1982) J. Mol. Biol. 1 9 , 87-103 15. Beltz, G.A., Jacobs, K. A., Eickbush, T. H., Cerbas, P. T., and Kafatos, F. C. (1983) Methods Enzymol. 100, 266-285 16. Farabaugh, P. J., Schemeissner, U., Hofer, M., and Miller, J . H. (1978) J. Mol. Biol. 126, 847-857 17. Pribnow, D., Sigurdson, D. C., Gold, L., Singer, B. S., Napoli, C., Brosius, J., Dull, T. J., and Noller, H. F. (1981). J. Mol. Biol. 149,337-376 18. Efstratiadis, A,, Posakony, J. W., Maniatis, T., Lawn, R. M., O'Connell, C., Spritz, R. A., DeRiel, J. K., Forget, B. G., Weissman, S. M., Slightom, J. L., Blechl, A. E., Smithies, O., Baralli, F. E., Shoulders, C. C., Proudfoot, N.J. (1980) Cell 2 1 , 653-668 19. Bartels, D., and Thompson, R. D. (1983) Nucleic Acid Res. 11, 2961-2977 20. Bietz, J. A., and Wall, J. S. (1980) Cereal Chem. 57,415-720 21. Miflin, B. J., Forde, B. G., Kries, M., Rahman, S., Forde, J., and Shewry, P. R. (1984) Philos. Trans. R. SOC. Lond.B Biol. Sci. 304,333-339 22. Shewry, P. R., Miflin, B. J., and Kasarda, D. D. (1984) Philos. Trans. R. SOC.Lond. B Biol. Sci. 304, 297-308 23. Kasarda, D.D., Bernardin, J . E., and Nimmo, C. (1976) Adu. Cereal Sci. Technol. 158-236 24. Okita, T. W., and Greene, F. C. (1982) Plant Physiol. 69, 834839 25. Kislev, N., and Rubenstein, I. (1980) Plant Physiol. 6 6 , 11401143 26. Heidecker, G . , and Messing, J. (1983) Nucleic Acid Res. 11,48914906 27. Messing, J., Gronenborn, B., Muller-Hill, B., and Hofschneider, P. H. (1977) Proc. Natl. Acad. Sci. U. S. A . 74, 3642-3646 28. Maxam, A., and Gilbert, W. Methods Enzymol. 65,499-560 29. Poncz, M., Solowiejcazyk, D., Balantine, M., Schwartz, E., and Surrey, S. (1982) Proc. Natl. Acad. Sci. U. S. A . 79,4298-4302 30. Messing, J., and Vieira, J . (1982) Gene (Amst.) 19, 269-276 31. Sanger, F., Micklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U. S. A . 74,5463-5467 32. Dretzen, G., Bellard, M., Sassone-Corsi, P., and Chambon, P. (1981) Anal. Biochem. 1 1 2 , 295-298 33. Ricciardi, R. P., Miller, J. S., and Roberts, B. E. (1979) Proc. Natl. Acad. Sci. U. S. A . 76,4927-4930 34. Geraghty, D., Peifer, M. A., Rubenstein, I., and Messing, J. (1981) Nucleic Acid Res. 19, 5163-5174

Screeninp, Methods. Colony hybridization and hybrid-selected translation were performed as described previously ( 3 ) .

DNA Sequence Analysis. The =DNA insert was removed from the pUC9 cloning vector by double digestion of recombinant plasmids with BamHI and Hind 111. restriction enzyme sites that are absent from any gliadin DNA The )'-termini were then end-labeled with sequence so far analyzed. la-32PldATP (Hind I11 site) or lo-3PPldGTP ( B a m HI site) using the DNA polymerase I large fragment. The terminal sequences were then determined using the procedure of Maxam and Gilbert (28). The DNA sequences of the internal portions of the cDNA inserts were determined by subcloning either restriction DNA fragments or successive deletion fragments (29) into M13mp8 or Ml8mp9 (30) and carrying out the chain- termination reactions (31). Blot Hybridization. Selected plasmids carrying a-18-type gliadin =DNA inserts were applied to nitrocellulose using the method of Belt= % &. (15). To make hybridization probes. eDNA inserts were excised from plasmid DNA as described above. resolved by agarose gel electrophoresis. and recovered on &. (32). The =DNA was then uniformly DEAE paper 88 described by Dretzen labeled by nick translation (10' cpmlug DNA). Hybridization was carried out in heat-sealable bags using the procedure of Beltz % &. (15). except that 102 dextran sulfate was added to the hybridization buffer. The melting temperature (Tm) of the self- and cross-hybrids was determined by measuring the amount of probe released at increasing temperature (15). Restriction Mapping. Inserts were excised with Hind111 and BamHI, labeled at the Hind111 site by 3'-end labeling with [ O - ~ ~ P I ~ using A T P the DNA polymerase I large fragment. Single and double restriction enzyme digests were analyzed by electrophoresis on 1.5% agarose or 5% acrylamide g e l s .

Heterogeneity of Wheat GliadinGenes RESULTS

8209

IO o b p

ofthecDNA Employment Isolation of Gliadin =DNA Clones. cloning procedure of Heideckerand Messing (26) resulted in the isolation of a large number of clones ( > 5 x 104/yg =DNA) representing the major mRNA transcripts of developing wheat seeds at mid-maturation, More importantly, about 10% of the clones had inserts of 1000 bp or larger. Although =DNA clones for the a-18- andy-gliadins had already been isolatedin our laboratory. several more were independently identified by hybrid-selected translation (33) and used to screen more than 1000 clones obtained by the Heidecker=DNA library vas screened with several Nessingcloningtechnique. The different 0 - 1 8 - and y-gliadin probes to ensure the isolation of all sequences representative of these gene subfamilies. DNA from clones that yielded weak to strong autoradiographic signals after colony hybridization were then analyzed by agarose g e l electrophoresis. Thirty-six a-/@-type clones contained =DNA inserts of 1100 bpor larger. Since the estimated size of the inserts was similar to the size ofa-16-gliadin mRNAs (lo), these clones were judged to containthecompletecodingsequenceforthe 0-/@-gliadin polypeptides. Twenty of these clones were randomly selected and further analyzed as discussed below. Similarly, ten y-type clones containing inserts of 1100 bp of larger were selected. Characterization of Distinct Gliadin Clones by Restriction Endonuclease Analysis. The 20 o-18-type =DNA clones were initially characterized by the on presence and number of specific restriction endonuclease sites. Based cleavage by three of the restriction enzymes, Pst I , Hpa I1 and Taq I , these clones could be arranged into five classes as shown in Table I . Class I clones (the most abundantclass) usually contained two Pst I one Hpa 11, and three Taq I sites. although two clones had only a single PstI site. Class I1 could be distinguished from Class I by the presence of a single Taq I site and the absence ofa Hpa I1 recognition sequence (Table I ) . Similar analysis distinguishes class 111, IV and V from the others.

Fig. 1. Restriction endonuclease site maps of a-18-gliadin =DNA clones. Five sequences were selected as representing distinct classeson the basis of preliminary restriction fragment patterns (TableI). The inserts were mapped by single and double enzyme digests and by labeling the Hind111 or BamHI termini by fill-in synthesisandlocatingtheterminalfragment by autoradiography. The 5 ' to 3' orientation as well as verification of the maps was determined by DNA sequencing.

TABLE I Minimum Number of Restriction Enzyme Sites in GliA =DNA Sequences Class

Clone

I1

212 216 1072

2 2 2

0

0

1 1

0

0

1

1

201 1235

1 1

0

0

3

0

0

3

1 1

42

2

0

0

2

1x1

v

PStI HpaII

BglI

0

TaqI

1 1

1

0

MinfI

7-Gliadin clones representing each of the three hybridization classes were also analyzed by restriction endonuclease sites mapping (Fig. 2). The of the sequences (848, 83-12 and 811-33) were verified by maps for three sequencing. establishing the correct 5'-t0-3' orientation as well 8 8 the location of the 3' untranslated region (hatched areas. Fig. 2). The three remaining restriction maps were aligned on one of the AluI sites which is located in B highly conserved region of these genes. They were also oriented with the TaqI sites on the right, as these are located ac the 3' end of 811-33. 83-12, and 848 sequences. Assuming this orientation is correct, all the sequences share the cornon AluI site a s well as another AluI site about 4 0 0 bases to the 5' side and a TaqI site (or sites) about 180 basesto the 3'

Clones representing these five different classes were randomly selected endrestrictionmaps were constructed. The presence and distribution of restriction enzyme sites within the insertions are markedly distinct for each of the clones representing the five classes (Fig. 1). These results indicate that the molecular size and net charge polymorphism of gliadin polypeptides is accounted forby tl.e many distinct. but related gene sequences.

KEY:

A

Alu I

H'

A'

Avo I

H"

B

BPI

n

K

D

Ode I

T

H

HOO

Hha I Hhc I Kpn I Taq I

m

Fig. 2. Restriction endonuclease site maps of 7-gliadin sequences. Clones were selected that representedeach of the hybridization classes as indicated on the left (see text). Inserts were excised from the plasmids usingHind111 and B d I . and mapped using single and double digests with the enzymes shown was not tested with clone 7 - 4 6 ) . The orientation was in the key (DdI determined by DNA sequencing analysis of the top three clones and alignment of certain sites on the remaining clones to these sequenced clones. The hatched area indicates the3 ' untranslated regionon the sequenced clones.

Heterogeneity of Wheat Gliadin Genes It ie interesting to note that sequences vhich cross-hybridized to other r-gliadin clones at moderate ( 8 1 0 - 4 8 ) or even high ( 8 7 - 4 6 ) stringency gave significantly different restriction maps. Although the map shown for 8 7 - 4 6 is similar to 04-19 and 8 4 8 vith respect to A l u I . HhaI, and TaqI sites. no A v a I . BglII. or KpnI sites vere found (DdeI sites Vere not tested for in B7-46). 810-48 had no Avai, Bglll. HhaI, or KpnI sites and the sites for the other enzymes are quite different from those for the other 7-gliadins.

sequences of the five clones ahove that they are closely related, although a number of mtations distinguish them. In addition to numerous single base substitutions. several gaps had to be introduced to obtain maximum homology among the Various cDNA clones. Ignoring regions containing gaps. the sequences were highly homologous. differing about 6 . 6 - 7 . 8 2 when compared to A42. The point mutations vere distributed fairly randomly throughout the coding and untranalated sequences.

Analysis of o-lE-Tvpe Gliadins by DNA Sequencing. The DNA sequences of A 2 6 , A212. A 7 3 5 and A 1 2 3 5 were determined as described in the Experimental Procedures. The DNA and amino acid sequences of those clones are shown in Figs. 3 and 4. respectively, together with these of A 4 2 , vhich were reported earlier (10). All of the clones. except A 2 6 , contained the total coding and 3'untranslated sequences as vel1 as variable portions of the 5'-untranslated region. A26 vas missing a terminal end of the 3'-untranslated region but contained a part of the 5"untranslated region and the total coding sequence for an m-/e-cype gliadin polypeptide. A camparison of the nucleotide

The mein cause for the divergence of a-18-gliadin sequences is the These apparent apparent duplication or deletion of DNA segments. duplication/deletion events take place in regions displaying repeat units. i.e.. peptide domains 11, I11 and V (Figs. 3 and 4 ) . These apparent DNA segmental mutations ( 1 2 ) ranging from 3 bp to 21 bp are evident in the coding region of domain 11. a region displaying a tandem repeat of 4 2 bp. while segmental differences of 12-21 bp and 5 1 - 7 5 bp are noted in the polyglutamine repeats of domains 111 and V. The apparent duplication of a DNA segment (triplet codons 284-2903 in A 2 6 results in a shift in the reading frame and premature termination (Figs. 3 and 4 ) .

10 I.EU ALA

1

A4 2 A26 A212 A735 A1235 A4 2 A26 A212 A735 A1235 A4 2 A26 A212 A735 A1235

PllE I,Ell 11.1:. I.EU ALALEU

MET LYSTHR

- 03 VAL PRO VAL PRO GLN LEU C1.N PROCLN

20 11.E VAL A L A THRTHR

ALA THRTHRALAVALARC

LEU

PRO

I LE

ARG 40 G1.N PRO GLN G1.U C1.N

LY s

GLN CLNPROTYRLEU

PRO

VAL PRO I.EU VAL CLN GLN C1.N GLN PRO GLU

-

MET

80

-

""__ _"

GLY GLN GLN G1.N C1.N

LEU CLN

-

GLU 90 GLN PRO PHE PRO PRO GLN LEUPROTYRPRO

""-_

"_ "_ "_ "_ "_ _" "- ""- _" "_ "_ _" "_ "_ "_ "_ _""_

PRO GLN GLN PROTYRPRO

nfr

G1.N

GLN PROCLN

air

--- --- SER "_--_" --- --- --"_ --"- ""- PRO _" "_ "_ "-"- _" "- PRO LEU PROCLN PROPRO TYR CLNPRO _""_ "_ _" "- _" "- PRO

LOO PHEPRO ARG ARC ARG

130

ALA GLN CLN GLN CLN GLN GLN CLN GLN GLN GLN CLN C1.N CLN GLN "_ CLN

"_ ___ "_ _" "_

160

H I S ASN I L E ALA H I S ALASERSER CLYLYS VAL GLY SER GLY

170 CLN VAL LEUCLN

(X&

CIS!

200 GLN ALA I L E HIS ASN VAL ALA H I S ALA I L E 1 1 . r VAL VAL VAL VAL

EIE~' H I S LEU I.F.11

LEU LEU

230

210 G1.N GLN CLN CLN

LYS GLN GLN LEU GLN GLN GLN GLN CLN GLN GLN GLN CLN LEU GLN GLN GLN GLN GLN "_ GLN "_ GLN "_ GLN "__" GLU GLN "- "- ""_ _" "- "_ _" "__" "- "_ _"_" "- -" -"

"_ "__"

70 PHO PIII' !'KO SER

H I S C1.N

SER

a

A42 A26 A212 A7 35 A1235

60 CLN GLN PRO TYK PRO CLN PRO C1.N

PHEPROPRO

PRO PRO

GLN

CLN LEU GLN PRO PHEPROCLNPRO LEU

MET PRO

5 0 ,

--- PHEPRO

110 b I20 GLN CLN GLN PRO CLN TYRLEU GLN PRO GLN GLN PRO I L ES E R GLN GLN C1.N A26 PRO SER A212 PRO SER A7 35 SER SER PRO A1235 PRO PRO . 140 150 A4 2 CW CLN GLN CLN I L E LEUCLNCLN I L E LEUCLN GLN GLN LEU I L E PROCYSARCASPVALVALLEUGLN A26 MET A212 A735 A1235 I 90 IBO A4 2 CLNLEU CYS CYS GLN CLN LEU LEU GLN ILE r R n GLU GI.N SER GLN CIS A26 CLUT R P HIS A212 TRP ARG TRP A7 3 5 PHE ARC A1235 PHE ARG A4 2

-

ASN PRO S E R CLN G1.N

2R0 250 GLN GLN PRO CLN CLN GLN TYRPRO LEU

A42 A26 A212 A735 A1235

GLN VAL SERPHE

A4 2 A26 A212 A735 A1235

PROCLNPHE

A4 2 A26 A2 12 A735 A1235

"I_L ES E_"R _" _" "_

SER

CYS TYR

CLU 290 ALA GLU 1I.E ARG ASN I.EU ALA --- --ARC L y s GLU "GLU "GLU "GLN 310 C1.Y THR ASN

"_ "_ "_

GLN CLN GLN C1.N C1.N LYS

PRO SER SER

270 260 SERSER GLN VAL SER PHE GLN PRO SER CLN LEU ASN PRO GLN ALA C1.N GLY SER VAL CLN PRO GLN GLN I.EU LEU GLY ARC GLN GLY GLN PHE GLY CLY PHE GLN GLY GLY GLN CLY SER 100 ala 310 LEU CLNTHRLEUPRO ALA MET CIS ASN VAL T Y R I L E PROPRO HIS CYS SEKTHRTHRILK ALA PRO PHF C1.Y

______ ___ ___ ___ "_ "_ "_ _" _" GLU

"-

"__"

"- "-

"_ "_ "_ "_ "_ ""_ _" TYR ___ ___

"- "-

"_

VAL

TYR

PHE PHE PHE

Fig. 3 . Comparison of the nucleotide sequences of .-/B-gliadin =DNA clones The coding region is grouped into triplet codons which are numbered starting from the initiator codon. Non-coding regions are grouped into 10 base pair lengths and are not numbered. The complete sequence of A42 is given with base substitutions indicated for the other sequences. Dash lines

represent Start of Consensus indicates

gaps introduced to maximize homology. The asterisks indicate the the 5' end of the cloned sequence. the poly G-C region, k. polyadenylarian signal sequences are underlined in A 4 2 . The arrow the 36 bp tandem repeat observed in domain 11.

8211

Heterogeneity of Wheat GliadinGenes A4 2

A26 *ATC A212 A7 35 *TCATCC A1235

GTAGA

ATTCTCATM

C C

A C

m CCA CTG

A C T CCA CTA AGA

GCC ACA

C C C

A

C

A26 A2 12

40

CCA C M T T G CAC CCA C M M T CCA TCT CAG CM CAG CCA C M GAG C M CTI CCA TTC GTA C M C M C M CM

T T

T

A1235

T

G

G

T

A A

A

A

A42 CM A26 A212

-------

A735

CTA CCA TAT CCC CAG CCG

CM

A

T A

----- --___ ___ --___ --___ ___ ___ "_ ___ "- "_"_ "_ "_ "- "- _" _" "- "_ --- ---

A

I20 130 I60 TCC CAC CM CM GCA CM CM CM CM CM CM CM CM CM CM CM C M C M C MC M ATT A C GCAG G C G C A212 G GCAG G G G "C A735 G GCAG C G C C G c

A42

"_

"-

A1235

"-

"_

"-

"_ "_

"_

160 I70 A42 T T G CM CM U C M C ATA GCG CAT GCA AGC TCA CM GTT TTC C M CM ACT ACT TAC CAG A26 c T C M A212 G G

c

A735

G

A

A26

C C C C

G

A212 T A735 T

G G

A1235 G

A42

CTG

___

A26 A212

-----

C

G

T

C

T

c

A735 ~1235

___ ___ ___

A

___ ___

80-

T

A

A

"-

"_

"_

"_ "_ ""__ "_"_

"- ""- ""- ""c

TAT C T A C M CCA C M C M CCA ATT TCC c c TCC T C TCC C CC 150 FTT C M C M ATT TIC CM CM C M CTG ATT CCA TCC AGC CAT CTT CTC T A A A A A

C

I80 I 90 CTA TTG CM C M TTC TGT TGT C M C M CTC TTG CAG ATC CCT GAG CAG TCG c G G C A C G G G G G C G G T C G A C C A G T ".

T

250 CM CAG CCT CAG CAG C M TAT CCA TCA AGC CAG

A

T

T T

AC

"_ c "_ c "-c "- c "-"_ "- "- ----- "_ "- "- "- --- "- "- "_ "_ "_ "_ --- "- "_ _""_ "- --- --- "- "_ "_ "_ _" "_ _ ___"_ "____ _"___ "-______"- "- ___ "____ "____ "____ ______"_ ___ ___ ___ "-___ "-___"-___ --___ "-___ "___

"- "-

260 240 CM C M CM CM CM CM CM CM CM C M CCC TCG AGC CAG CTCTCCTTC

___ ___

-

CCA GCC T

" " " " " "

-" "-

A1235

"_ --"_

110 C M CM CCA TAT CCA CM CAG CM CCA CAC C

c c

A26

c 50)

--- TTT

CTG C M CCA TTT CCG CAG CCG C M CCA TTT CCG CCA

--

100 TCA TTT CCA CCA C C

CTA CCA TAT C U : CAG CCC C M C

A1235

A

CM

A G

-06 70 A42 CAC CM CM CM 'IT CCA CCA CM CAG CCA TAT CCG CAG CCC CM CCT TTT CCA TCA CM C M CCA TAT CTGCAG A26 C A A A A212 C A A7 35 C A A A A1235 G T CA A C

-

G

T

T

A7 35

90

T G

A

30 ____)

20 A12

G

A

c

T A

G

T

A A A A C A

C G C

GTC TCC TTC CAC CCA TCT CAC CTA G G A G T A A A

G

T

G

280 290 270 A42 M C CCA CAG GCT CAG GGC TCT GZC CM CCT CM C M ffG CCC CAG W C CCG CM ATA AGG M C CTA CCC --ACC MA A26 C G A T A A212 C T C

A

--- --- --- --- --- -- CTA CAG ACC CTA CCT GCA ATC

""__ "_

"_

c

C C

A7 35

A1235

T

c

CA 310

300

T

""-

A

__ ___

~ 6 2TGC M T GTC TAC ATC CFT CCA CAT TCC TCG ACC ACC ATT GCG CCA TTT CCC ATC TCT A C -C TC A26 T "- -" T TC T A212 G T T T C TC A 7 35 T T TC A1235

_"

A42

ATCCTTTCCT

CCTTGTAGCC

ATGAMMAT

A26

" " " " "

" " " " "

" " " " "

A212 A735

AI235 CA

A

G A G G

"_

MACTCACAT " " " " "

GEACTATCAT

CTMGMCCC

" " " " "" " " " "

T 319 CCT ACC M C TGA C M G A C M C A A T T T T

GMCIATACT " " " " "

" " " " "

"

A

"

ACTCTAGTAC T c-

TAGATATATC

MACACCGTT TTCCTACTCC

" " " " "

" " " " "

T T

G G

" " " " "

T T T

C C C C e

AGACAMCAC

ATCTCTTGTC

" " " " "

" " " " "

" " " " "

T C

Fig. 4 . Comparison of the dertved amino acid sequences of a-/n-gliadins. The amino acid sequences were derived f r w the nucleotide sequences (Fig. 3 ) ; dashes represent gaps introduced to maximize homology. The borders berveen the six peptide domains of the a-/B-gliadin primary structure are s h m .

"

T AGTTCAMCT

T T

"__" GGA ACC "_ TAG CC "_ _" "_ "_ "_ "_ "_ "_

T 7

A

G

TAC

"_

c

___ ________--__------- ___ C

Domain I, signal peptide; 11. region displaying a tandem repeat of a conserved dodecapeptide (arrovs)~ 111. polyglutamine stretch; I V , l o w proline. high histidinei V . polyglutamine stretch; and V. high proline.

Heterogeneity Gliadin of Wheat

8212

Genes A.

Df the 167 simple base substitutions observed, 81 (44%) result in an amino acid replacement. The bulk of these a d n o acid changes are neutral with respect to charge and conformational coefficients. The consensus amino acid sequence derived from the five a-In-type cDNA sequences agrees with the determined amino acid sequence of o-gliadin proteins (10) with only a few apparent exceptions. Table I1 sumnarizes codon usage of the a-18-type gliadin subfamily a8 well as a y-type clone, 81133. Codon usage is decidedly nonrandom for many of the amino acids. especially those observed for proline and glutamine. The triplet codons CCA and CAA are used at least m i c e as frequently as the other iso-coding triplets. TABLE I1 Codon Usage in m - 1 0 - and y-type Gliadin cDNAs

Phe

m

81133

A42

A26

9 4 3 6 3 5 5 2 4 9 1 6 7 4 3 6 2 6 8 5 7 2 22 2 2 6 1 1 2 6 4 2 1

8 3 0 9 3 2 7 6 7 6

6

3 0 0 1 2 61 36 1

1 1 1 0 0 3 3 2 6 1 2 3 1 0 0 4 4

3 5 5 2 2 4 2 4 4 6 29 8 2 6 1 1 4 3 5 4 6 2 0 0 5 1 91 29

3 4 1 1 1 0 2 2 2 4 1 0 0

0 0 0

3 7 2 2 4 8 6 5 2 2 7 2 2 3 3 2 4 3 3 1 29 3 2 3 1 0

A212 8 5 0

7 2 1 6 7 7 6 2 2 7 1 2 4 3 1 3 3 4 3 28 7 3 4 1 1 3 4 2 3 8 1 0

2 4 1 2 6 1 1 0 3 3 62 30 2 3 4 0 1 0 2 3 2 2 0 1

3 6 63 34 3 4 0

0

0

0 1

0 1 1 1 2 1 2 1 4 1 1

1

1 3

1

1

1

1 7 2 2 0

2 1 2 0 1

1 2 0 3 1 1

1

0

1 0 3 3 2 4 1 1

A735

A1235

8

8 3 2 5 2 2 4 5 6 6 2 3 5 6 1 2

3 0 6 2 2 6

5 8 6 2 3 8 3 2 4 1 4 3

3 2 29 9 2 4 1 1 3 4 3 2 9 1 0 0 3 2 68 34 3 3 0 2 1 0

1 3 2 5 1 1 1 0 1 1 1 3 1 2 1 4 1 1

E.

3 2 4 4 3 2 29 9 2 5 1 2 2 4 4 4

7 3 0 0 6 2 63 25

.< Pro

W

e**

C U TIC

ccx

Cln

Cln

Pro

Phr

Pro

+

CM

2

Cln

Fig. 5. Sequences of clones 3-12 and 11-33 at the 5' end containing the tandem repeat. Clone 11-33 is numbered with 1 as the first position of the initiator codon (panel A ) ; clone 3-12 was truncated and is numbered from the Panel C depicts the consensus nucleotide sequence of a 5' end (panel B ) . tandem repeat observed in 848 ( 3 ) . Solid double-headed arrows indicate complete or partial repeat units of 42 bp found in bath clones. Dotted arrows indicate a short tandem repeat found only in clone 3-12. The boxed r e g i m e indicate a large perfect direct repeat.

3

4 0 1 1 0 3

2 2 4 1 0 0 0 0 1 3 2 1 2 1 3 0 1

48

A

050

I133 312 48

ATC CCA UU CGT C T I CCT ACC

I133

CU C U I

c L 490 4iO T U CAC ATC TCC CAC CAC AGC ACT TCC CAT GTC ATC G

510

1133 312

TCT TGC CAG W TIC

c

CAO W ATCCCC

c

1133

CTC GM

"_ "_

"_ "_

T U C R W GCT CTCFCC

""

T

G

UU TGT TCTTV2 GT

T C C 730 "

c ---

"_

"_

" .

c

. "

" .

A

c c

"-

A

110

--- C M CAG CAA

c CM CAC CAG TCG 750 C MU C CTACTA C I C UX ACC

C

CCT C M W CAA CM C M

_"

--- ---

W CU CCT

_"

610 CAA TCC W, CAC CAC TCC CAC CAC CAC

TCC *'IC

690

48

TTT CTC C M Cc? CAC CAG

C MC I C GGC "

650

630 W W CCC C M U G

1133

C G 610

L C

CTG C M W C M

T

L8

312 48

G M C M TCC CGC TAT CAA GCA ATCCGT

590

510 CCC ATC ATC I A C TCCATCATC

1133 312 48

1133 311

550

I30

312 48

C M

A

CTG CCT CAA U C

G

C

C

A

710

TIT TTC

CAC CCA CAC CAC ATA

A T C C C

830

790 810 CCT U C CTT GAG CCC CTC ACT TCCATT

CCA CTCCCT

K C T I C C C I ACG ATC

TGC ACTGTC

312

Analysis of ,(-Type Gliadin =DNA Clones by DNA Sequencing. With the DNA sequence of 848 already known ( 3 ) . we sequenced two cDNA8 belonging to hybridization class 8-1 (83-12 and 811-33) in order to determine the modes of divergence in this subfamily. For clarity. the sequences in the tandem repeat region are shown separately for 811-33 (Fig. 5A) and 83-12 (Fig. 58). and the remaining 3' sequences of both these =DNAs are directly compared together with 848 (Fig. 6). Clone 811-33 was found to contain the complete coding sequence and a portion of the 5'-untranslated region of a putative y-type mRNA, whereas 83-12 was missing the information for a portion of the region. The comparison clearly indicates strong N-terminal coding conservation in the C-terminal region. but divergence in the regions that display repeating DNA segments.

I133

M T CTC CCC TTC TAC

CTG CCA TTC GGT GTI GGC ACC CCA CTT CCT

ACC CCC ACC ACC ACT

312 A A

48

910

C C

T

930 910 TAA ~ C T C T C T A G T A A T A T A T A C T T f f i A C ~ C C G T C C T C T A R C U T C C ~ C ~ T C T C ~ ~ ~ ~ M

1133 312 48

CCC T I C W A

1133

CCCGT~CW~CTCTUUWCCT~TCTCTCACCC~CC~CT~CTA~-AAATTCT~--~T~C ". T C T T -CTGAA TA h IO90 I 140 ~TWCTGTITTO'CT~C~CMTCTCATAICGCCCTTCTCT~~CCTCCATCTC~CTACC~T~~~~GC~CTTC~TM

G

A C

A

CT

T

T

1010 311

I8

1133 712 48

G

T

I C ID50

poly A A

TC A C

T

A

Poly A

1133

1 I90 1240 LTICCTCT~CCCAGGTUCIL(CCTCUC~~~CTATM1TMTTMTCACGTATAGC~TCAGATTI~~AT

1133

*IITAGTGTTATICT~~CCGTTCMC1GM~~CCAC~CATACMCTCTTTMTGCTTCC~G~TTCA~AT~

II?J

lCTTlT p o l y A

1290

1340

Fig. 6. Comparison of y-gliadin cDNA sequqnces 3' to the tandem repeat region. Base substitutions differing from the 811-33 sequence are shown for 83-12 and 848. Dashes are gape introduced to maximize homology. Putative polyadenylation signals are underlined.

Heterogeneity of Wheat Gliadin Genes In 811-33 the conserved DNA segment (or portions of it) are repeated five times starting at base position 135 (Fig. 5A). This segment encodes a 14 amino acid peptide, rich in glutamine and proline residues. and contains a single residue each of phenylalanine. leucine and serine. A repeat identical in sequence has been described for a y-type gliadin =DNA by Barrels and Thompson (19). In contrast. the conserved octapeptide (Fig. 5 C ) observed for 848 (3) is quite different. The 83-12 sequence shares the 42 bp repeat element of 811-33 plus additional repeat units of varying sizes (Fig. SB). Starting with base 4 and ending with base 343. an 111 bp segment (Fig. 5B, enclosed box) is duplicated at positions 4 through 114 and 232 through 343; these large repeats are separated by a region containing smaller repeats (Fig. 58. dotted arrove). Furthermore, the latter portion of the 111 bp repeat contains the 42 bp repeat of Bll-33.

In contrast to the divergence of DNA sequences at the )‘-ends. the remaining DNA sequences of 811-33 and 83-12 are identical with the exception of a single base substitution at bp 958 (Fig. 6 ) . However. consistent vith the restriction enzyme site mapping and hybridization studies. the 3’ sequences of 848 are lese homologous to 811-33 and 83-12 at the DNA sequence level. A number of gaps in the coding region varying in size from 3 to 39 bp have been introduced to obtain maxihomology between 848 and B11-33/83-12. The distribution of these gaps as well a8 simple base substitutions are clustered, the sequence of B48 being widely divergent from that of 811-33 betveen residues 666-826 (42% homology). but 88-93% related elsewhere. In B3-12 the site of poly(A) addition lies 29 bp from the start of the sequence (bp 1048). which is the typical arrangement seen for In contrast. the poly(A) site of 811-33 lies 403 bp eukaryotic mRNAs. further downstream (Fig. 3). Hovever. a variant signal sequence (AATAA). first detected in a zein clone (34). is present at bp 1095 and apparently acts a8 the functional polyadenylation signal.

AATAAA