Chicken BF and &Microglobulin Sequences Reveal ... - John D. Lambris

4 downloads 98707 Views 10MB Size Report
classical class I molecules and the MHC-encoded nonclassical molecules ..... CA) and were manipulated using the program Adobe Photoshop. (Adobe Systems.
0022-1767/92/1485-1532$02.00/0 THEJOURNAL OF IMMUNOLOGY Copyright 0 1992 by The Amerlcan Association of Immunologists

Vol. 148. 1532-1546. No. 5, March 1. 1992 Prlnted Ln U.S.A.

DIFFERENT FEATURES OF THE MHC CLASS I HETERODIMERHAVE EVOLVED AT DIFFERENT RATES Chicken B-F and &Microglobulin Sequences Reveal InvariantSurface Residues' JIM KAUFMAN,'* ROLF ANDERSEN,3t DAVID AVILA,' JAN ENGBERG,+ JOHN LAMBRIS,* JAN SALOMONSEN,*'KARENWELINDER,8 AND KARSTEN SKJ0DT4*

Chicken ,&-microglobulin (&m) and classI (B-F19a chain) cDNA clones were isolated and the sequences compared to those of B-F Ag isolated from chicken E.These clones represent the major expressed class I molecules on E,with B-Fa size variants evidently due to alternative use of small exons in the cytoplasmic region. The cDNA sequenceswere compared to turkey Bzm,the apparent allele B-FJ2a and other vertebrate homologs, using the 2.6 A structure of the human HLA-A2 molecule as amodel.Both chicken a1 and a2 domains resemble mammalian classical class I molecules and the MHC-encoded nonclassical molecules morethan CD1 or the class I-like FcR.In contrast, the chicken a3 domain is equally homologous to all a3 domains, to Bzm and to class I1 82 domains. Foreach pair of extracellular domains (a1 vs a2, a3 vs Bzm),the level of sequence homology between mammalian and avian molecules is quite different. Thissuggests that the structurally homologous domains have been under different selective pressures during evolution. There is a very overall strong G + C bias in a3 and Bzm,leading to an change in amino acid composition inB-F compared to class I molecules from other taxa. Many of the surface residues are quite diverged, particularly in a3 and Bzm. There are fewer changes in intra- and interdomain contact sites. Some residues with important functions are invariant, including seven residues that bind the ends of the peptide, tworesidues that bind CD8, and three residues that are phosphorylated. Thepositions of the allelic residues are conserved. Thereare other patchesof invariant residues on a l , a2, and Barn; these might bind TCR or other molecules involvedin class I function. Received for publication March 4, 1991. Accepted for publication December 16. 1991. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked aduertlsernent in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ' This work was supported by the Danish Research Council and by the Carlsberg Foundation. The Basel Institute for Immunology was founded and is supported by F. Hoffmann-La Roche. Address correspondence and reprint requests to Dr. Jim Kaufman. Basel Institute for Immunology, Grenzacherstrasse 487, Basel, Switzerland CH-4005. Current address: Dr. Rolf Andersen. Carlsberg Laboratory, Department of Physiology. Gamle Carlsberg Vej 10, Valby. Denmark DK-2500. Current address: Dr. Karsten SkjBdt. University of Odense. Institute of Medical Microbiology,Campusvej 55. Odense. Denmark DK-5230.

The mammalian MHC contains many kinds of genes, of which certain class I and class I1 genes are the hallmarks, being responsible for graft rejection and a variety of other Tlymphocyte recognition phenomena. Thesesocalled classical (or transplantation) MHC class I and class I1 molecules are highly polymorphic cell surface heterodimers that bind peptides generally derived from proteolytic processing by the cell, and in turn are bound by clonally diverse ap TCR and the coreceptors CD4 and CD8. The class I and class I1 molecules have related structures, and are probably derived from an ancestral class I1 /3 chain-like homodimer. The classical class I heterodimer consists of the relatively nonpolymorphic p2m5(a roughly 12-kDa polypeptide encoded by a single gene outside the MHC) in noncovalent association with the polymorphic a-chain (a large transmembrane glycoprotein encoded by a member of the class I multigene family in theMHC) (1-5). All the known classI1 molecules are involved in antigen presentation to T lymphocytes. By contrast, the class I molecules are much more plastic in evolution. Besides the classical class I molecules, there are manyso-called nonclassical class I molecules that are much less polymorphic (if at all), have quite restricted tissue distributions (andmay even be secreted) and need not beencoded by genes located in the MHC. Some of these are quite closely related to the classical class I molecules (mouse Qa Ag and human HLA-E, F, and G Ag), others are less similar (mouse Tla and M Ag), and others are quite diverged (human andmouse CDl molecules, intestinal FcR of neonatal rats, human a2-Zn glycoprotein and a cytomegalovirus protein). The nonclassical heterodimers all include &m, although some a-chains are only distantly related to classical class I a-chains. The intestinal FcR transports maternal antibody to the neonate, some Qa, Tla, M, and CD1 molecules may present peptide Ag to some T cells, and thecytomegalovirus protein is involved in infectivity; otherwise the biologically relevant functions of the nonclassical molecules are unknown (1, 35)The structural features responsible for various funcAbbreviations used in this paper: pzm. &-microglobulin; aa. amino acid; nt. nucleotide; and thesingle lettercode for amino acids: A. alanine: C. cysteine: D. aspartic acid: E. glutamic acid: F,phenylalanine; G. glycine; H. histidine: I, isoleucine; K. lysine: L, leucine; M. methionine; N. asparagine; P, proline; Q , glutamine; R. arginine: S , serine; T. threonine: V, valine; W. tryptophan; Y. tyrosine.

1532

1533

EVOLUTION OF CHICKEN CLASS I HETERODIMER

0.5 mmol of benzhydryl amine resin. Deprotection and cleavage of tional aspects of classicpl class I function have been the peptide from the solid support was effected by treatment with located on 3.5 and 2.6 A resolution three-dimensional anhydrous liquid hydrogen fluoride (25). The crude peptides were structures of the human HLA-A2 and -Aw68 molecules verified by sequence analysis, coupled (26) through the sulfhydryl (5-9). After the exon encoding the signal sequence, the group to BSA (Sigma, St. Louis, MO) using rn-maleimido-benzoy1-Nhydroxysuccinimide ester (Pierce Chemical Co., Rockford, IL), and next two exons of the class I a-chain gene encode two dialyzed against PBS. Rabbits were immunized once a wk for 1 mo structures, (0.5 mg conjugate emulsified in CFA once and in IFA three times) domains (a1 and a2) withsimilartertiary which together form a platform of eight antiparallel 8- and thenbled. Antisera to thesepeptides, to B-Fa chain (serum 966) chicken Ozm (serum 996) were used to detect purified B-F15 by strands supporting two roughly parallel a-helices. The and Western blot in Figure 3 as described (27). except that iodinated highly polymorphic residues, as well as some invariant donkey anti-rabbit Ig (Amersham Corp., Arlington Heights, IL) was residues, are located in and around the groove formed by used as the second reagent. Molecular biology. cDNA expression libraries made in A g t l 1 from the two a-helices and the ,&strands. Apparently the anoligo-dT selected RNA isolated from bone marrow cells of anemic tigenic peptide binds in the groove, whereas the TCR H.Bl9 chickens were screened for expression by rabbit antisera to binds the peptide and thetops of the a-helices. The two purified B-Fl5a. the fusion proteins tested for the ability to select membrane-proximal domains, Pzm and a3 (encoded by antibodies to authentic B-Fa chain, and the clone F 3 picked for analysis (27). An eightfold degenerate oligonucleotide (5’the next exon of the a-chain), aresimilar to Ig C regions further ARAAYTCNGGRTCCCA-3’) corresponding to the last five amino with two @-sheetsof three andfour antiparallel 8-strands. acids of chicken &m (WDPEF) (28) was used to screen the same The single 8-sheetof a l / a 2 sits above a3/Pzm, but inter- libraries. Four positive clones were isolated from 2 X lo5 recombinants, of which one (pRA2) contained part of the p2m cDNA. Fouracts mostly with BZm. Some residues in the a3 domain teen positive clones were obtained from screening another 2 X lo5 are implicated in the binding of the CD8 molecule to the recombinants with the pRA2 insert, of which the longest one was classical class I heterodimer [ 10-1 2). The next three to selected for further analysis(pRA5). Bluescript KS+ (Stratagene, La Bzm clone four a-chain exons encode the connecting peptide, the Jolla. CA) or pUC18 subclones of the B-F clone F3 and the were sequenced by dideoxy chain termination using Sequenhydrophobic transmembrane region, and thecytoplasmic pRA5 ase 2.0 (U. S . Biochemicals, Cleveland, OH) or by chemical cleavage. tail. A tyrosine and some serines encoded by different The nucleotide sequences of F10 and pRA5 have been assigned the cytoplasmic exons are phosphorylated in vivo or in vitro GenBanknucleotide database accession numbers M84766 and M84767. (13, 14). Linear sequence comparisons. The nt andderived aa sequences The study ofMHC molecules in species other than of F10 and pRA5 were aligned with other sequences by inspection were unambiguous. In human and mouse can give insight into which residues (Figs. 1 and 2). In many cases the alignments are critical for function, which other structural features those with particularly low similarity, we attempted to find the highest numberof identities with fewest gaps whereas maintaining are constrained by selection, the importance of coevolv- the positions of certain critical residues. but it should be borne in ing regions and molecules, and the evolutionary history mind that these may not be the absolutely best fits. In Figure 2. 18 class I a-chain-like sequences. 8 &m sequences, anda portion of a ofMHC molecules ingeneral. The bestcharacterized class I1 8-chain sequence are compared: the articlesdescribing these nonmammalian MHC is the chicken B complex, which is sequences are cited in the figure legend. These comparisons are located on a microchromosome of 8 Mbp. and which representative of all of the sequences compared (also derived from encodes class I (B-F)a-chains and classI1 (B-L) @-chains. the articles referenced in the legend to Fig. 2). which include the translated aa sequence from the chicken B-Fl2a cDNA clone F10; as well as many other kindsof proteins (15, 16). the translated aa sequences from 108 classical class I genomic In this report, we compare the sequences of a chicken and cDNA clones from 7 mammalian species (77 humanHLA-A, B. class I a-chain allele and chicken Pzm with homologs and C sequences, 12mouse H-2K, D and L sequences, 12cotton-top from other species, to place the chicken a-chain in the tamarin SaOe sequences, 2 rabbit RLA sequences, 2 bovine BoLa 2 swine SLA sequences. and 1 Syrian hamster SyHam multigene family of class I molecules and to understand sequences, sequence): the translated aa sequences from 7 reptile cDNA clones the evolutionary forces of selection on this heterodimer. (1 complete lizard LC clone, 4 incomplete lizard clones,and 2 incomFrom studies with crossreactive antibodies (17- 19) and plete snake clones), 1 frog XLAcDNA clone and 1 carp TLAcDNA general principles of protein evolution (20, 21). we ex- clone: the translated aa sequences from 26nonclassical sequences from man, mouse and rat (HLA-E, F, and G from human, 12 Qa pected to find the intra- interdomain and contact residues molecules, 7 Tla molecules, and 1 M molecule from mouse, 6 CD1 to be more conserved than the surface residues, except molecules from mouse and human,1 FcR from rat): theaa sequences for those regions that interact with another molecule of p2m proteins isolated from chicken, turkey,mouse (2 alleles), rat, rabbit, and guineapig: and the translatedaa sequence from a (such as peptide, TCR. or CD8). We wanted to look for cow, human BZmgene. other conserved surface epitopes, that might be involved The residue numbers given are for thechickensequences, in previously unrecognized functions. We were also in- whereas residue numbers in parenthesesrefer to the humanHLAand &m sequences. Throughout the analysis, the percentages of terestedtoseewhether the same positions are poly- A2 nt and aa sequence identity were used as a measure of sequence morphic inchickenandmammals. Finally,antibody relatedness. The conservation of change for nonidentical residues cross-reaction experiments show that mammal and am- (in Figs. 4 and 5: in the text) was assessed by two criteria: chemical phibian MHC molecules are apparently more closely re- relatedness (e.g.. hydrophobicity, charge, polarity, and size) and observed likelihood of replacement in homologous proteins (mutation lated to each other than to bird and reptile MHC molecules probability matrix) (29). These criteria are flawed by the fact that (18,19),so we wished toknow whether the B-F molecules many aa sidechains have several different chemical properties and are derived from classical or nonclassical class 1 ances- relatedness thus depends on the precise chemical environment in the protein (20, 21). tors. Comparisons involving three-dimensional structure. The cmr(Y

dinates of the refined 2.6 d, structure of HLA-A2 (Brookhaven entry were usedto calculate residues containing a n atom within 3HLA) (9) 4 A of another residue. A surface dot representation of the molecule Protein chemistry.The a-chains from B-F13 and B-F15 molecules (Fig. 4) wasdisplayed on a n Evans and SutherlandPS390 graphics were purified fromGB-1 and H.Bl5 chickenE membranes and then terminal using the program FRODO (30).The CPK representations the amino-terminal sequences used for Figures 1 and 2 were deter- (Figs. 6 and 7) were displayed on a Macintosh iIci computer using mined essentially as described (22. 23). the program MacImdad (Molecular Applications Group, Stanford, The peptides CGKKGKGYNIAP (exon 6)and CDREGGSSSSST CA) and were manipulated using the program Adobe Photoshop (exon 7) were synthesized usingan Applied Biosystems 430A peptide (Adobe Systems. Mountain View,CA). These representations are synthesizer (24)(Applied Biosystems, Foster City, CA) starting with flawed by the fact that mainchain atoms (which are invariant in MATERIALS AND METHODS

1534

EVOLUTION OF CHICKEN CLASS I HETERODIMER A

a1 d o u o l n (exon 2 )

Yignol

IO

¶cqYc"ce

B-F I3

E L E L I A f

B-FIS

B

~

F

1

9

R

H'T H ~

l

~

20 P G P G 0 P F U I P G P G q P r ? v ? u ? o u G l ~ I O P G P G ~ P U F U

f l ? V I f l ? I A M?l 0 R ? V I s I A n T o L H l L R V I S T f

L TL

E

O

U

G

6 - F I 9 CGGACGGCGGCCGRGCICCfllflCCCIGCGGTflCfllClCTflCGGCGfl~GflCGGfl~CCCGGCCCCGGGCflGCCGlGGIlCGIGGflCGIGGGG-90 6 - F I Z GC-G----CC--------------------------ARR------------------------------------------flCl-----6 . F l Z R R . P . . . . - . . . q . . . . . . . . . . - . . . I . .

V

30 D

U

G

E

L

F

40 H

~

50 V

~

S

T

~

R

R

R

U

P

~

~

E

U

I

~

~

lRCGlGGACGGGGRAC~CllCACGCRCTACRACAECRCCGCGCGGflGGGClGIGCCCCGCflCCGflGIGGflIflGCGGCCflflCflCGGflCCflGCflG 183 ...............GI.............IflC...............GG...... - - - - ~ - - U - - -

-

-

-

.

.

V

~

~

-

"

-

-

-

-

X

A

-

-

.

.

.

.

-

60 70 80 Y U D S E T O T S O R S E O I O R O G L G ~ L O R A Y ~ P T G IRCIGCGRCAGlGAGACGCflGflCC~CflCflGCGCRGTEAECflGflTlGflCCGCGflIGCCCIGGGCflTflCTGCAGCGGCGCIflCflflCCflGflCCGGC ........TG-RC .....T.G..G..A.........Gflfl.l.................. . ~G O ". I G - G n " " - E n " " " " " "

276

a2 domoln ( e x o n 3 )

90 .

G

110 R O

IO0 S

H

l

U

P

U

n

V

G

C

O

l

L

E

O

G

T

I

R

G

V

V

f

l

V

O

G

f

l

GGGlCICACRCAGTGCAGlGG~IGTflCGGClGTGflCfllCC~CGflGGflCGGCflCCflICCGGGGGTflICGICflGIfllGCC~flCGflTGGGflGfl 366 ..............G.I l f.... lTf.............C...G.................................G........... "

" " " " " G . P " " V " " " -

"

120 D F I

A

F

D

K

G

I

~

~

130 F

T

R

R

U

P

E

R

U

140 P

T

K

R

K

U

E

E

G

O

GRCllCflTTGCClICGACflAAGGCACCRTGRCGfllGflCGlICflCIGCGGCflGT~CCflGflGGCflGIICCCflCCflflGflGGflflfllGGGflGGflflGGflGflI 456 .

~

.

I50 V A E

.

G

.

L

.

K

.

.

Q

.

V

.

L

.

E

.

.

.

.

.

I60 I C U

E

.

E

-

U

-

L

-

R

-

f

-

l

-

V

-

-

U

.

E

~

I70 E V G

.

K

f

l

E

L

G

f

l

R

lRlGCIGRGGGGCIGflRGCRG~flCClGG~GGflflflCCTGCGTGGflGIGGCIGCGGflGfl~flCG~GGflfllfl~GGGflflGGCTGflGCIGGGCflGGflGfl 549 G.RC"".R"IG""R

.T"-----------------.~"..."...........""""~"""""""""""" ~ ~ . ~ ~ . ~ " " " " " " " - - - - - - - - - -

and derived aa sequence of B-Fa chain and pzm cDNA clones. Stop codons for the expressed proteins are indicated with stars and the last nt of each exon (37) (J.Kaufman. unpublished observations) is indicated with afllled clrcle. A, The B-F19a chain clone F3. with the polyadenylation site underlined. The N-terminal sequences of the B-F13 and B-F15 protein are indicated above, with question marks indicating ambiguous residues and blanks indicating residues not determined. The nt and derived aa sequences of the B-F12a chain clone F10 (15) are indicated below. with dashes indicating identities and blanks indicatinggaps. B.The chicken &m clone pRA5, with oligonucleotide used for 290 screening.

FLgure 1 . Thent

a3 d o m o i n ( e x o n 4 ) I80 . E

R

P

E

U

R

U

U

G

190 E f

K

l

O

G

i

L

T

L

S

C

200 f l

f

l

H

G

F

V

P

~

P

l

G~GCGGCCCGAGGTGCGRGlGIGGGGGflflGG~GGCCGflCGGGflTCClGflCCITGICCTGCCGCGC~CflCGGCIIC~flCCCGCGGCCCfllC 639 .......................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

GCCGlCRGCTGGCTGAflGGflCGGCGCGGTGCGGGGCCflGGflCGCCCflGlCGGGGGGCfllCGIGCCCflflCGGCGflCGGCflCCIflCCflCflCC 729 ...................................................................

240 U U

l

i

270 O

f

l

~

P

250 O G

G

D

K

V

~

C

R

U

260 E H

f

l

S

L

P

~

P

G

L

V

S

U

IGGGlCflCCR~CGA~GCGCflGCCGGG~GflCGGGGflCflflGlflCCflGIGCCGCGIGGflGCflCGCCflGCClGCCCCflGCCCGGCC~CIflC~CGIGG 822 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C P t T n r e g l o n r ( e x o n 5)

300

280

.

E

P

P

~

P

N

L

U

P

I

U

f

l

G

U

R

U

f

l

I

U

R

l

f

l

I

U

U

G

U

G

F

I

I

V

GRGCCGCCflCflGCCCflRCClGETtCCCfllCGTGGCGGGGGlGGCCG~CGCCflIIGIGGCCflTCGCCfl~CGIGGIIGGIGlIGGflIICflICfl~CIflC 9111 ..............................................................

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.""""""""""--n--------

c y t o p l a r n l c r c g l o n (exon 6 )

320

c y t o p l a r r l c region (exon 7 ) ~ y ( o p l 0 r . 1 ~r c g l o n 330 310 R f l H f l . G K K G K G V N I f l P . D R E G G S S S S S T . G S ~ P f l l A G A C G C C f l C G C f l G G G f l R G f l f l G G G G f l f l G G G C ~ f l C f l f l C f l ~ C G C G C C C G f l C f l G G G f l f l G G I G G f l l C C f l G C f l G C ~ C G f l G C f l C ~ G G G f l G C f l f l C C C C G C C f l I C 1014

..............................................

.""""""""

" " " " " " . "

"

"

"

3 ' U T (exon 8) I G A ~ l G C l G T G C I l C R G C C T G C f l GflflGCCflACflCTCCflCflCCRGCfllI~GGGGTCGGIGfl~GGflCflCflGCCCCflICCIClTGflCClCICflCflICICflI 1112 +

"""""""""""~~.~"""-----""".."""..."~""... ~"""""..."."""............

~ C ~ G C T ~ c c r R ~ ~ C ~ G A c ~ G r r ~ ~ c c ~ r ~ c c c r ~ c f l c r ~ c ~ r c c r ~ r ~ f l ~ f l r ~ f l r ~ G ~ c c f l r c c r ~1~2 1c2c ~ c f l c c ~ ~ ~ c ...................................................................

l""-polyfl

~ ~ ~ ~ ~ ~ ~ T ~ ~ G ~ ~ ~ ~ ~ ~ G l G G ~ I ~ f l G G G G f l C C G T G T C C C f l G I ~ ~ G G C l G C ~ C f l G G G I G C f l G f l ~ G ~ G G C C C l G ~1312 GC~GflG~fl~~~f

40 50 60 i s i T L n ~ o ~ ~ ~ n ~ ~ ~ q v ~ o n ~ ~ n GRICICCRlCRCGCl~flIGAflGGflCGGCGIGCCCfllGGflGGGlGCGCflGlflClC~GflCflIGICClICflflCGflCGflClGGflCGTICCflGCGCCIGGIGCflC 300

70 R

D

F

I

P

S

S

G

S

I

V

80 ~ ~

C

K

U

E

H

E

~

90 L

.exon K

~

P

Q

U

IIi V K

U

~

P

GCCGACllCflCGCCCAGCAGCGGTICCRCCTACGCGIGCAflGGTGGflGCflCGflGflCCCIGflflGGflGCCGCflGG~CIflCflflGlGGGflICCCGflGI~C~Gfl~ 400 flCCCIflGGNCIVflflflfl

.exon I U ClGlGCCCGGGRTGAGCRCGGlCTGflA 427

o

E

o

F

*

1535

EVOLUTION OF CHICKEN CLASS I HETERODIMER A

011 r,>ma,,

iexon

>, 13

23

4C

0

60

3

,

ad ldcnclty 84

a0

35 42 36 39 38

35

54

30 73

C

Flgure2. Comparison of the aa sequences of B-F19a chain and chicken &m with related avian, mammalian, reptilian. amphibian, and fish molecules, with A, a1 and a2;B,a3 and Pam; C.transmembrane andcytoplasmic regions. Dashes indicate identities; blanks indicate gapsintroduced to maximize alignments: X(85)in mouse Barn indicates allelic A and D. Numbers above the line indicate chicken sequence numbers: those in parentheses indicate HLA-A2 sequence numbers. Also shown to the rlght of each llne is the percentage aa identity with the B-F19a or chicken Barn

1536

EVOLUTION OF CHICKEN CLASS I HETERODIMER

linear sequence and may be invariant in space) are colored according to whether the sidechain is invariant or not.

structural assignments made below. The sequence of F3 is also very homologous to B-Fl2a cDNA clone F10, but includes additional stretchesof 33 RESULTS AND DISCUSSION nt encoding a cytoplasmic exon (see below) and 127 nt of B-Fl9a and Pzm sequencesrepresent major ex- 3'UT. Both clones have a n obvious polyadenylation site E. The cDNA clones about 145 ntdownstream of the stop codon, but F3 has pressed classZ proteins on chicken of the poly A tail found inF10, for chicken Pzm (pRA5) and B-Fl9a (F3) both encode the 127 nt instead mature proteins with general structural features that areprobably reflecting the presence of another polyadenylvery similar to mammalianclass I molecules (Figs. 1 and ation site further downstream (as in some mammalian 2) including the peptide/TCR binding a1 and a2domains class I transcripts, e.g., Refs. 32 and33). There is one site for potential N-linked glycosylation (exons 2 and 3). Ig C region-like a3 and pzm domains and most (exon 4 and thePzm exons I-111), the connecting peptide, found at N85(86) as in all mammalian classical I molecules. Another is found at hydrophobic transmembrane region followedby basic nonclassicalclass residues (exon 5 ) ,and a cytoplasmic tail (apparentlycom- N37(37) at the end of the 83 strand in the a1 domain, a posed of three exons similar to mammalian exons6-8). position thus faronly utilized in Tla T3b (34). ThesideThese structural features include the invariantcysteines chain of N37(37) points downward from the @-sheetof (Fig. 2A) and is exposed to the solvent (between at 99(101)and 161(164) in a2, 199(203) and 255(259) [in~ l / a 2 a3, 24(25) and 79(80) in Bzm, other features of Ig-like A40(40) and TlO(10) in Fig. 6C). There is no N-linked domains (31)with the L38(39) diagnostic of Pzm, and the glycosylation site in the Bzm sequence. In fact, there is kinase sites Y313(320). S323(332) and S326(335) in the one complex and one high mannose glycan in B-Fa cytoplasmic tail (13, 14). (The residuenumbers given are chains, and nonein P2m, as assessed by glycosidase for the chicken sequences, whereas residue numbers in digestion (23). These clones almost certainly represent the major exparentheses refer to the human HLA-A2 and Pzm sequences.) Thereare four small deletions and aninsertion pressed B-F19 protein recognizedby the mAb F21-2 directed to class I a-chains and the in the chicken sequences, but these are found in the mAb F2 1-2 1 directed bends or breaks, do not have sidechain interactions (data to Pzm (23). Thededuced aa sequence of pRA5 (Fig. 1B) not shown) and thus probably do not affect the overall is identical to the aa sequence of chicken P2m protein

+

sequences, and the number of aa with Intrinsically G C rich codons (calculated as the sum of the number of glycines, alanines, tryptophans, prolines. and two-thirds of the number of arginines). The symbols above the chicken sequences indicate structural position and orientation of the ~ and r n , thearesidues in the@ strands (@strands overlined. ===; pointing down for al/a2 or infor a3/Bzm. .; pointing up for al/a2 or out for ~ ~ 3 / @t) helix (a-helices overlined. +++; pointing down toward the @-sheet,.; pointing into the groove. *; pointing up from the groove. fi ; not clear, -) as designated (6,9) or determined by inspection. The orientation only takes into account the mainchain and first sidechain bonds, without implying the location of the end of the sidechain. The letters above structural symbols indicate whether the position is considered nearly invariant ( I or f; see explanation below) or polymorphic ( P . polymorphic among human or mouse alleles). Indicated above the HLA-A2 and human pzm sequences are residues Involved in presumed contacts with the peptide (c),involved in interdomain contacts based on the HLA-A2 structure (1. a1 contact; 2. a2 contact; 3. a3 contact; b,Bzrn contact), residues affecting CD8 binding by mutagenesis experiments (8. CD8 contact), andresidues phosphorylated in uluo or fn uftro (outline of P). Class I sequences are for chicken B-F (B-F12)(15).human HLA (HLA-A2)(41). monkey So (cotton-top tamarin So-47) (64).mouse H-2 (H-2Kd)(65).rabbit RLA (RLA-1)(66).cow BL (BL3-6) (32).swine SLA (SLA-PDl)(67).Syrian hamster (33),lizard LC (LC-1)(60).frog XLA (CL13)(61).carp TLA (TLAa-1)(62).human E (HLA-E) (68).mouse Qa (Q8)(69).mouse Tla (T13') (34).mouse M (Mbl)(70).human CD1 (CDla) (71).and rat FcR (FcR p51)(72).BZrnsequences areturkey (28),human (73).mouse (74).bovine (75).rabbit (76).guinea pig (77).and rat (78).Class 11 @-chainsequence is human DQ from Raji cells (DR3.6)(79).For the determination of invariant residues. sequences from the references above and from compilations [5, 80) were used. The numberof times that eachof the following residues is found in sequences from each species (2 chicken, 77 human (2 HLA-C alleles not complete in al). 12 monkey, 12 mouse (1 K sequence not complete in a2), 2 rabbit. 2 bovine, 2 swine and 1 Syrian hamster classI sequences; 2 chicken and 6 mammalian Bzm sequences) is added to give the total (of 110 presumed classical class I sequences, 8 pZm sequences). The designations that follow (I or 1) are used in Figures 2 and 5 and are the basisof Figures 4, 6. and 7. The I (invariant)residues occur in at least 50% of the sequences from every species and overall a t least 95% of the sequences; most of these residues are in nearly every sequence. The i (nearly invariant)residues must occur in a t least 80% of the sequences, but may be absent in onespecies; many of these are innearly every sequence. Those residues withoutsums anddesignations occur in all of the sequences and are designated I in Figures 2 and 5. The a1 domain: H3(3). 2+75+12+12+2+2+2+1=108,I;R6(6).2+74+12+12+2+2+1+1=106.I;Y7(7).2+75+12+12+2+2+2+1=108.I;T10(10). 2+75+12+12+2+2+2+0=107.i;P15(15).2+75+12+12+2+2+2+1=108.1;G18(18).2+76+10+11+1+2+2+0=104.i; P20(20),2+76+12+12+2+2+1+1=108,I;V25(25),2+76+12+12+2+2+2+1=109,1;G26(26),2+76+12+12+2+2+2+1= 109, I: Y27(27): V28(28); D29(29): F33(33): V34(34). 1 77 12 12 2 + 2 + 2 + 1 = 109, I; S38(38).2 76 + 12 12 2 2 1 + 1 = 109. I: A40(40),2+77+11+12+2+2+2+0=108,i;R42(44);P45(47),2+77+12+12+1+2+2+1=109,I;R46(48),2+77+12+9+2+2+ 2 2 1 = 107, I; W49(51).2 + 76 + 12 + 12 + 2 2 2 1 = 109, I; Y58(59),2 + 76 + 12 + 12 + 2 2 + 2 1 = 109, I; W59(60); T63(64); Q71(72). 77 12 12 1 2 2 1 = 109, I; R74(75).2 + 77 + 12 12 2 2 1 1 = 109, I; L77(78); Y84(85); N85(86); Q86[87); the a2 domaln: G89(91),2+77+12+12+2+2+2+0=109,i:S90(92),2+77+11+10+2+2+2+1=108,I;H91(93);T92(94),2+70+12+12+2+1+ 2 0 = 101.1;Q94(96); M96(98). 2 77 11 10 2 + 2 + 2 + 1 = 107, I; G98(100); C99(101); D104(106), 1 + 75 + 12 + 12 + 2 + 2 + 2 + 1 = 107, 1:G109(112);Y110(113).2+54+12+12+2+0+2+1=85.i:Q112(115);A114(117).2+77+12+12+2+0+1+1=107.i;Y115(118); D116(119).2 + 77 + 12 11 2 2 2 1 = 109, I: G117(120); D119(122); 1121(124), 1 77 10 + 12 + 2 + 2 + 2 + 1 = 107, I; A122(125),2 + 77+10+12+2+2+2+1=108.I;T131(134);A132(135).2+77+12+11+2+2+2+1=109.I:A133(136).2+77+11+12+2+1+2+ 1 = 108, I; A137(140).2 77 12 12 2 2 2 + 0 = 109,i: T140(143),2 77 12 12 1 + 2 + 2 + 1 = 109, I; R142(145),2 + 63 + 12 + 7 +2+2+2+1=91.i;K143(146),2+77+11+12+2+2+2+1=109.I;W144(147).2+74+11+11+1+2+2+1=104,I;E145(148).2+ 77+10+12+2+2+2+0=107,i;A150(153),1+77+12+12+2+2+2+1=109,1;E151(154),2+75+12+12+2+2+2+1=108,1; Y156(159): L157(160); E158(161),2 75 11 11 2 2 0 1 = 105, 1; C161(164); V162(165); E163(166). 2 75 12 11 2 2 2 0 = 106,i;W164(167).2 + 70 12 11 2 2 1 + 1 = 101,i; L165(168),2 + 76 12 + 11 + 2 + 2 2 1 = 108, I; R166(169).2 + 77 + 1 + 6 + 2 + 2 + 1 + 1 = 93, i; R167(170); Y168(173). 2 69 + 11 11 2 2 2 1 = 100, i; E170(173),2 75 + 12 + 4 + 2 + 2 + 1 + 1 = 1 0 0 , i ; G172(175), 2+77+12+11+2+2+2+0=108.i;K173(176),2+77+12+0+2+2+2+1=98.1;L176(179):R178(181).2+77+12+9+2+2+2+ 1 = 107. I; the a3 domafn: P182(185); V186(189). 2 52 11 12 2 2 2 1 = 84, i: T196(200).2 77 11 12 2 2 2 1 = 109. I: L197(201);e199(203);A201(205). 2 + 76 + 12 + 12 2 2 2 1 = 109, I: G203(207),2 59 + 12 + 12 + 2 + 2 + 2 + 1 = 92. i; F204(208); Y205(209); P205(210); 1209(213); W213(217]; G217(221). 2 77 12 12 2 2 2 0 = 109.1; Q222(226); D223(227). 2 77 12 11 2 2 + 2 + 1 = 109, I: P231(235): G233(237); D234(238). 2 77 9 12 2 2 2 1 = 107, I; G235(239).2 + 42 + 12 + 12 + 2 + 2 + 2 + 1 = 75.1; T236(240); W240(244); G248(252); Y253(257); C255(259); V257(261). 2 + 75 + 12 12 2 2 2 + 1 = 108, I; H259(263); L262(266); P265(269), SlO(11);Rll(12);P13(14); G17(18); N20(21); L22(23), 2 2 + 77 11 12 2 2 + 2 + 1 = 109. I; W270(274);and pzm: P4(5): Q7(8); VS(9); Y9(10); + 5 = 7, I; N23(24); C24(25); F29(30);H30(31);P31(32);P32(33).2 5 = 7, I; I34(35):L38(49),2 5 = 7, I: K40(41);G42(43);S51(52);D52(53):S54(55); F55(56);D58(59); W59(60); TS60(61),1 + 5 = 6, I; F61(62):L64(65); HY66(67).1 + 5 = 6, I: F69(70);Y70(71);P(71)72;Y77(78); C79(81); V81(83); H83(84);T85(86).2 5 = 7, I; L86(87).2 + 5 = 7, I; P89(90);W94(95); D95(96).

+

+

+

+ + + +

+ + + +

+

+

+

+

+

+ + + +

+

+ +

+

+ + +

+ + + +

+

+ + + + + + + + + + + +

+

+ + + + + + +

+ +

+ + + + + + + + +

+

+

+

+ + +

+

+

+

+ + + +

+ + + +

+

+ + +

+ + + + + + + + + + + + + + + + + + + + + + + + +

+

+

+

+

+

+ + + +

+ + + + + + +

+ +

EVOLUTION OF CHICKEN CLASS I HETERODIMER

1 2 3 4

1537

F10. has some homology with the mammalian exon 7. including the invariant S323(332) and S326(335) that are phosphylated in vivo (14). Antibodies tothe B-Fa chain recognize three bands (the top one being a doublet) by Western blot in the B-F15 i preparation shown in Figure 3. Antisera to the peptides G307-P317 (exon 6) and D318-T328 (exon 7) show that the top doublet contains both exons 6 and 7. the middle band containsonly exon6 and thebottom band contains neither. This could be due to alternative splicing, since B-F€ both exons are found in a B-F12a gene (37).Alternatively. these multiple B-Fa chains could be products of two or more nearly identical genes with different cytoplasmic exons, since there are a number of class I a genes in chicken genomic DNA (15, 27). Classical class I molecules with such alternative exon usage have not been widely demonstrated in mammals. However, mouseH-2K" and Dd genes encode alternatively spliced mRNA species that lack exon 7 and encode corresponding class I molecules that arenot phosphorylated in vivo (38,39).In addition, the effect of deleting exon 6, 0 exon 7. or both from the HLA-A2 gene on constitutive b m Figure3 Western blot of purified EF15 visualized wtth antisera to endocytosis in a leukemic T cell line has been investithe cytoplasmic peptides DREGGSSSSST (D318-T328. exon 7. lane 1 ) . gated. A consensus region in exon 7 has been suggested the peptide GKKGKGYNIAP (G307-P317.exon 6,lane 2 ) , chicken &rn as the signal for endocytosis (40). but it is absent in B(lane3) and B-Fa chain (lane4 ) . F19. I t will be very interesting to know whether the B-Fa (28).The deduced aa sequence of F3 is identical with the variants have different properties (e.g., in turnover, reN-terminal 20 aa of the B-Fl5a protein and has one cycling or Ag presentation), because they possess differdifference (the known polymorphic position 9) from the ent intracellular kinase sites that aremaintained in evN-terminal 20 aa of the B-F13a protein (Fig. 1A). It is olution. Chicken class I clonesrepresentclassicalrather therefore unlikely that F3 represents the B cell-specific class I molecule recognized by the mAb CB3 (35).The than nonclassical MHC class I molecules. Inasmuch as relationship between F3 and thetwo reported B-F2 mol- studies with cross-reactive antibodies indicate that the ecules detected using alloantisera (36)is unclear. The 42- MHC molecules of mammals are more similar to amphibkDaB-F2 molecules (found in mature erythroid cells, ians than to chickens(18, 19). we wanted toknow thymocytes. and REV-transformed lymphoid cells with whether the chicken B-F cDNA were derived from some nonrearranged Ig loci) and the 45-kDa B-F2 molecules unusual offshoot of the class I multigenefamily (for (foundin immature erythroid cells, bursal cells, and REV- instance, like the mammalian nonclassical class I moletransformed lymphoid cells with rearranged Ig loci) may cules). The first two domains of the B-F19a moleculeare be products of different genes or the product of a single most homologousto the classical transplantation class I gene with different biosynthetic (e.g.. carbohydrate) mod- molecules (Fig. 2A). They are just as similar to certain nonclassical molecules (mouse Q Ag and humanHLA-E, ifications. Alternate usage of cytoplasmic region exons gives F. and G Ag), less similar to others (mouse Tla and M rise to multiple B-Fa glycoproteins. Although the major molecules), and much less similar to others (CD1 and p2m and B-Fa proteins isolated from E have the same intestinal FcR). The chicken molecules are not signifisequences as the chicken cDNA clones isolated, every cantly more homologousto any particular classical class chicken haplotype gives rise to multiple B-Fa bands of I isotype fromany particular mammalian species. Thus, 40 to 45 kDa. as illustrated for E B-F15 (Fig. 3. lane 4 ) . the chicken B-F cDNA are part of the major evolutionary These protein size variants have identical N-terminal line of class I molecules. The a3 domain and pzm are the most conserved porsequences and apparently differ only in the cytoplasmic C-termini. based on proteolysis, IEF. and peptide map- tions of the class I molecule within and between mamping (23).Inasmuch as these multiple B-Fa chains are all malian species. Similarly, a3 of B-Fl9 and B-F12 have found on E. they presumably do not correspond to the 98%aa identity, with one change in each @-sheet.Also, two sizes of B-F2 alloantigens, which have different tis- there is evidence for only one allelic residue in chicken B2m (SG75(76))(28). Chicken and turkey B2m have 93% sue distributions(36). The connecting peptide, transmembrane, and cyto- aa identity, with all seven differences in andbetween the plasmic regions of F3 (B-F19a) and F10 (B-F12a) are E and F strands, at the end of p2m that is not in contact virtually identical, except for an extra stretchof 11 aa in with al/a2 (Fig. 2B). In contrast, chicken a3 has only 31 to 36%aa identity the middle of the cytoplasmic region of B-F19a (Figs.1A with the various mammalian class I sequences, making and 2C). The 11 aa stretch after the transmembrane region (G307-P317) has some homology with the mam- it the least homologous of the extracellular domains. malian exon 6. including the invariant Y313(320)that is Unlike al/a2. a3 is just as homologous to the classical phosphylated by src kinase in vitro (13).The next 11 aa class I molecules as to the nonclassical molecules, stretch (D318 to T328). which is present in F3 but not in chicken &m, or the class I1 j32 domain. Similarly, there

-m

1538

EVOLUTION OF CHICKEN CLASS I HETERODIMER

is much less homology between avian and mammalian Bzm than within each group (Fig. 2B). Many of these identities are at positions identified as crucial for the folding of general Ig C domains (31). with the surfaces mostly diverged(see discussion below). There is marked dmerence inevolution of domains. The a1 and a2 domains are structurally homologous and presumably evolved from a common ancestor. Both domains are involved in binding the peptide Ag and the TCR, so we expected them to be under the sameselective pressure and to evolve at the samerate. In fact, inevery comparison betweenthe chicken and a mammalian class I sequence, a1 was much less similar than a2 ( 1 1-42% aa identity in a1,23-58% in a2, see Fig. 2A).This difference is not overwhelmingly apparent at the ntlevel (e.g., the a1 and a2 exons have around 6 2 and 65% nt identity, respectively,between F3 and the HLA-A2 gene) (41). Thus, thesetwo domains are apparently under different is that selective forces at the protein level. One possibility the TCR contacts primarily the a2 domain, leading to conservation by coevolution. In fact, this suggestion has already been made, based on the inhibition of cytolytic assays by certain monomorphic mAb apparently directed to the a2 helix (32).However, large blocks of identities between chickens and mammals are found throughout the a2 sequence (Fig. 2A). Other possible explanations include structural constraints (for instance, due to the disulfide bridge) or other, unknown, functions. We also expected that theB-F a3 domain and chicken Bzm might diverge at the same rate, but thiswas not so (Fig. 2B). Chicken and turkey pZm have between 45 to 53% aa identity with mammalian sequences with 35 residues invariant among all species. The a3 domain has 3 1 to 36% aa identity with mammalian class I sequences (with some 25 practically invariant residues). The invar-

iant residues in a3 and Bzm are mostly involvedin intradomain contacts (31)(see Fig. 4), interdomain contacts (many in blocks in thelinear sequence, Fig. 2B.see Figs. 4 . 6 . and 7 ) .and in invariantsurface epitopes (discussed below, see Figs. 6 and 7 ) .&m makes more interdomain contacts than a3 and also apparently has additional biologic functions (as an activating or chemotactic factor) (42-44),which mayexplain the difference in the rateof evolution of these two domains. Remarkable codon bias in a3 and &m. The F3 a3 exon and chicken &m have 97% and 96% G + C in the third codon position, respectively,and similar levels of G C in degenerate first and second positions (fig. 1).This is much higher than usual for the third position (e.g.. 68% for 39 other chicken genes) (45).Other regions of F3 and transmembrane are much lower: a1 (86%). a2 (69%), plus cytoplasmic regions(81%) with 84%overall. By comparison, mammalian pzm and class I genes have about 60 and 80% G + C in thethird position, respectively(45). For instance, the HLA-A2 gene has a1 (86%). a2 (87%), a3 (75%).and transmembrane plus cytoplasmic regions (68%)with 80% overall (41). Not only are the wobble bases nearly entirely G C. but the actual amino acid compositions of a3 and Bzm are shifted slightly to residues that areintrinsically G + C rich, in comparison to the mammalian sequences (Fig. 2B). Such changes, e.g., to glycine, explainthe apparent loss of some contact sites inthe chicken class I molecule compared to mammals (see below). There is no such aa bias ina1 or a2 (Fig. 2A). One possible explanation is that these chicken genes lie in G + C rich Reverse Giemsa-staining (R) chromosomal bands, and the high G + C content in the gene reflects the overall high G + C content of these regions (46).We inspected the sequences of the published B-L @

+

+

Ftgure4. Surface dot representations of residues in &m that are invariant in avians and mammals (a]or invariant, conserved and radically changed betweenavians and mammals(b).The dots were placedat one van der Waals radius for each atom inthe contextof all other atoms, colored for the designation of the entire residue.Most residues were designated nearly invariant (blue]. conserved (by both chemical properties and natural replacement frequencies, green), radically changed (by either chemical properties or natural replacement frequencies. yellow) by comparison to the sequences in Figure 2. Large arrowheads indicate regionsof interdomain contact;long arrow indicates band of presumed intradomatncontacts in Pam:short arrow Indicates location of peptide-bindingsite. Domains are indicated:a l , 1: a2.2 a3. 3;Dam.

1539

EVOLUTION OF CHICKEN CLASS I HETERODIMER "

'Lrn

a3

c I

Flgure 5. Contact residues between a 11012.a3. and f12pbased on the HLA-A2 structure at 2.6 A resolution [9)and the amino acid sequence data in Figure 2. Each pairof residues with atoms within 4 A of each other are listed. Numbers indicate the chlcken sequence location. nurnbers in parentheses indicate HLA-A2 sequence location:I . 1 indicate Invariantresidues (from Fig. 2): P indicates polymorphic residues (from Fig. 2);C and D indicate conserved and diverged residues based on size. charge, and natural replacement frequency (29): L indicates residues that have diverged through loss or diminution of the sidechains in the chicken molecule. Each pair of residues interacts through the atomic contacts (sc. sidechain: rnc. mainchain) listedin the middle column. which are Invariant ( I . 1). are conservedor probably conserved (C or C?). or are lostor probably lost [ L or L?).

I I

I C

C C

F98/Ml99)

C

sc:sc sc:sc sc:mc sc:sc sc:sc sc:mc mc:sc sc:mc sc:mc sc:sc mc:sc sc:mc sc:sc sc:sc sc:sc sc:mc sc:sc mc:sc mc:sc sc:sc mc:Sc mc:sc mc: sc

L

G228/E(?32J

L

L I L? I.?

G227/Vl231) G 2 2 8 / E( 2 3 2 1 V230/R(2341 V230/R(2341 P231/P ( 2 3 5 1 H238/Q(242i N232/A1236) G233/GI2371 D 2 3 4 / D (2381 H238/Q12421 N232/Al2361 P 2 3 1 / P (2351 G 2 2 8 / E (2321 P231/P (2351 G233/GI2371 Ki89/Hl1921 S198/R(2021 R200/W(204) S198/R(20?) R200/WI2041 V230/R(2341 W240/W(244)

L

sc:sc mc: sc sc:sc sc:mc sc:sc 5c:sc mc.Sc mc: sc sc:sc

I I

I C? I

I I C? I C? L

I I C?

L? ?

L? ?

L? I

L

E31/TI311

D I I

D

sc:sc

L L

Y205/Y ( 2 0 9 ) Y205/Y (209) P 2 0 6 / P 1210)

.)

sc:rnr

D D

sc:mc sc:mc sc:sc sc:sc sc:sc sc:sc sc:sc

I C D I I C

D I

MS3/Ll54)

C

mc:sc sc:sc mc:sc

I, ?

Gi17/Gl120)

I

Gi17/G(1201 G94/G(961 T9Z/T(941

I 1

MlZ/V(IZI

?

L32/Ql32) H35/Rl35) V23/1123) V25/V(25) TlO/T(iOI IB/F(8) IB/F(8! S9/F(9!

?

C? I

mc:s;

I

mc:sc

:

I C D D D D

sc:sc

C? C?

3 I

I I

sc:sc

G117/G(120)

C? 1 ?

L

a3 I

-/I I

L

sr:mc sc:sc sc:sc mc:sc sc:sc

I

sc:mc sc:sc

R46/R1481

L32/QI121

D?

I 1

c c

1

TIO/T(10)

I

I

I

Y94/Y196) 394/?1961 0112/011151

I

I I

G117/G(1201 D:lq/D(l?Z!

I

I !

I

I

I

L ?

I C? ?

I.?

B-F12 genes are actually alleles or isotypes. Ten invariant residues are contact sites with the peptide, presumably binding mainchain atomsthe in peptide (Fig. 2A). In fact, it was recently found that seven of these residues located at the ends of the peptide binding site (Y7(7), Y58(59), Y156(159), Y168(171). T140(143), K143(146). and W 144( 147)) form hydrogen-bonds with the N-terminus, the C-terminus,or the penultimate carbonyl of nonomeric peptides bound to HLA-B27 (51). (Another hydrogen-bonding residue, R83-Y(84),is a n invariant tyrosine in mammalian classical class I molecules, but is replaced by arginine in chickens and in the comparable position in classI1 a-chains.)Other invariant residues (such as Q71(72), R74(75). R142-H(145). R166(170), and E151(154)) are located on the two aLocations of polymorphic and invariant residues in helices outside of the peptide binding site: they may be peptldelTCR binding site are same in chicken and involved in contacts with invariant residues in theTCR. mammals. A typical feature of mammalianclassical Many intradomain contact residues are invariant or class I molecules is that the polymorphic residues are conserved. We expected that residues with important mostly found in the peptide/TCR binding site of al/a2. roles in the structureof individual domains would change In fact, nt differences between alleles in mammals are more slowly than residues without contacts. Our naive predominantly replacement substitutions in a 1/a2 and approach was to analyze residues with the appropriate silent substitutions in a3/P2m. indicating strong selective location and orientation,based on the regularities of pressures for polymorphism in the peptide binding re- secondary structure, to pack side chains in the hydrogions and against changes in the rest of the molecule phobic center of a domain. For example (Fig. 2), there are (50).Most of the nt differences between the B-F19a (F3) some 31 residues in a l l a 2 pointing down from the aand B-F12a (F10) cDNA clones are in a1 and a2 (25 nt helices and up from the relevantlocationsin the 0changes1264 nt in a1 and 221273 in a2,but only 31273 strands, with 16 invariantor nearly so, 11 conserved and in a3. 31132 in the rest of the translated region and 51 4 divergent positions. Similarly, of some 40 residues in 298 in the 3'UT region) (Fig. 1A).Virtually all of these nt a3 and Pzm that are located in the P-strands with sidechanges lead to aa substitutions (20125 in a1, 19/22 in chains oriented toward the interior of Ig-like domains, 19 a2, 2/3 ina3, 113in the rest). positions are invariant, and some 10 are conserved. In Of the 26 allelic chicken residues in alla2, 17 are addition, many of the residues in the turns were concontact residues for the peptide (15 are polymorphic in served or invariant. Many of the invariant residues are at least one humanor mouse isotype) and 4 others point clearly packed into the centerof the Ig-like domains (blue up from the a-helices in positions to be TCR contact in the centerof Pzm, Fig. 4A). residues (2 arepolymorphic in man or mouse and 2 are However, inspection of the HLA-A2 structure shows not) (Fig. 2A). The other five allelic positions are nearly that these precise numbers are meaningless. We found invariant among mammalian classical class I sequences. that most of the residues in HLA-A2 have some solvent It should be noted that it is not clear whether B-F19 and accessible portions, including residues that we expected genes (47,48),which are located close to the classI genes in theMHC (15). and found that they also havehigh G + C contents in the thirdposition of the Ig-like P2 domain (97%)compared to the 01 domain (81%) and the transmembrane pluscytoplasmic regions (88%). The polymorphism of the B-F a l , B-F a2, and B-L 01 exons might account for their lower G C content, but in fact the polymorphic positions do not have a lower G + C bias than theother positions in a1 and( ~ 2In . fact, theG C content of each exon is inversely related to the overall homology with mammalian genes (see Fig. 2). but we have not been able to devise a n explanation that relates these phenomena. We have suggested that the codon usage could be due to the presence of these genes on microchromosomes (49).

+

+

1540

EVOLUTION OF CHICKEN CLASS I HETERODIMER

I.

C

C

L

14

QD

wa

8

a

f

e "

ir

A

QD.

4 \14 TL

F

3 QR t,

1

QD

*% "

Figure 6. CPK molecular models of HLA-A2 highlighting the invariant residues between chicken and mammalian classical class I molecules. a. classic Bjorkman view down the peptide-bindingsite (Refs. 6 and 7): b. 90' righthand rotation about the vertical axis from a; c. 120"rotation from a; d, 180" rotation from a; e. 270' rotation from a;f, view from above showing the peptide/TCR binding site. Each residue that is invariant or nearly invariant (considering 110 presumed classical class I sequences described in the legend to Flgure 2) is colored white and outlined in black. Every gray). Isolatedinvariant residues are indicated with numbers other residue is colored accordingto domain location(al,red, a2, yellow: a3. green: (al: 1. H3(3): 2. G18(18):3, S38(38):4. A40(40): a2: 5. D104(106):a3: 6. T196(200): 7. G217(221):8, G248(252):9, Y253(257):Bzm: 10. P4(5): 11. Clusters of invariant residues are indicated with letters (A patch on a l : T63(64), R42(44). G32(33): 12.Y77(78)/W94(95): 13. P89(90): 14. D95(96)). QR patch on a l : Q71(72). R74(75): R patch on al: Q94(96).A1 14(117), Y115(118). W1441147). K143(146), R142/H(145], W59(60). P45(47). R46(48): E145(148):M patch on a2: A150(153). W144(147). E151(154):L patch on al. a2and a3: Y156(159). Y7(7). Y27(27), F33(33),Y58(59). Y168(171). W164(167). E158(161), V162(165), E163(166), R166(169), R167(170). E170(173). K173(176). G172(175). andL176(179) W49(51):C patch on a l , a2 and P a m N85(86), Y84(85), Q86(87), G89(91). S90(92).H91(93). T92(94). A114(117). Y115(118). D l 16(119)on a1 and a2, H30(31), P31(32). P32/S(33) on &m: B patch on a3: Y205(209). P206(210), H259(263), 1209(213). L262(266), P265(269). P182(185): G patch on a3: V257(261), C199(203). C255(259). W2131217). V186/M(189). L197(201). Y253(257):QD patch on a3: QZZZ(226). D223(227):N patch on Barn: G17(18). K40(41). F69(70). T70(71),Wl(72). L22(23). Y77(78). W94(95): S patch on &m: L86(87).T85(86).QSS(89). I34(35), H83(84). S51(52): T patch on Barn: P13(14). N20(21). F69(70)and P71(72):not visible is theU patch on the underside of Pam: W94(95), D95(96). Q7(8), L22(23)). Clusters of invariant residues involved in interdomain contacts between al/a2, a3, and &m are indicated with arrowhead%actual contact residues from Figure 5 are indicated with a small dot colored according to domain location.Each invariant residue involved in a highly conservedsalt-bridge (9)(Fig. 2) is marked with a black X.

&m.

to be buried in the hydrophobic interior of the domain (data not shown, Figs. 6 and 7). We found that most of the residues in HLA-A2 are involved in some intradomain contacts, with only 15 to 20% of the residues in the various domains of HLA-A2 (a third of these being glycines) having np sidechain atoms involved in contacts as assessed by 4 A proximity. The number of intradomain sidechain contacts varies depending both on size and location of the sidechain (data not shown).Thus, we have no unambiguous and simple criteria for which residues have essential intradomain contacts. In a different approach, Williams and Barclay (31)have suggested certain residues areinvariantin many different Ig-like sequences because they have important structural roles: all of these residues are conserved or invariant (Fig. 2B). Although many of the residues that we presumed were

important for intradomain contacts are indeed conserved, this is clearly not an absolute requirement. The fact that radical changes in sidechain need not have major effects on the intradomain contacts is illustrated by two extreme examples: T239-K(243)in a3 and R63LI(64) in &m. The four methylene groups of K(243) in strand E stretch across the hydrophobic interior of the domain, with the basic amino group thrusting out into the solvent between the strandsof the opposite 8-sheet. Interestingly, this K(243) was predicted to form a salt bridge with the E(232)in strand D (52),but in fact G228V(231) and G229-E(232) are in a 8-bulge outward. The residue R63-L1(64)is located in strand E at the edge of the four-strand face and the sidechain in HLA-A2 initially points inward, but then turnsparallel to the edge of the sheet to hang out in thesolvent.

1541

EVOLUTION OF CHICKEN CLASS I HETERODIMER U

"t

I' FLgure 7. Silhouette of HLA-A2 based on the CPK models in Figure 6. The domains are colored(al.red 012. yellow; a3. green; &m, blue), except for the invariantareas, which are gray. The blackspots represent water molecules in the original structure. To facilitate identification in the CPK models, the invariant residues are indicated in approximately the correct relative positions using the human amino acid and number (except for a few resldues in b. c, and e, which are identified inthe silhouette by letters. and outsidethe silhouette with both letters and numbers).

Interdomaincontact sites are mostly invariant or conserved. We expected the residues involved in interdomain contact sites to evolve more slowly than those residues without contacts. We determined the residue and atomic coatacts between the domains of HLA-A2 as defined by 4 A proximity (data not shown). Saper et al. (9) have reported the contact residues and some of the atomic contacts; the two results differ only slightly in detail, and therefore we utilize their identifications in Figures 2 and 5. The residues involved in interdomain contacts include some pointing downward (and upward) from the 0-strands and bends of al/a2, some pointing upward from bends in a3 and Pzm. and some pointing outward (and inward) from the 4 strand 0-sheets of a3 and P2m.These canbe seen in part as blocks of invariant or conserved residues in the linear sequence (Fig. 2) and as invariant residues at the interfaces of al/a2, a3, and Pzm in thethree-dimensional models (blue in Fig. 4 and white with colored dots in Fig. 6). Saper et al.(9)identified 63 residues involved in interdomain contacts,of which 33 are invariant(or nearly so) between chicken and mammalian classical class I sequences, 15 are conserved (asassessed by size and chemical nature, as well as by natural replacement frequencies), 4 are lost in chicken (by deletion or change to glycine), and 1 1 are diverged or polymorphic. Of the 62 contact pairs formedby these residues, 46%(29/62 residue pairs) of the atomic contacts are invariant between

chicken and mammalian class I molecules and 24%( 1 5/ 62 pairs) arelost or probably lost (Fig. 5). In our analysis,we found similar percentagesboth for residue contacts and for atomic contacts. Some 59 contact residues formed 54 pairs involving 135 atomic contacts; with 18 invariant pairs (40 atomic contacts), 1 1 pairs (29 atomic contacts) in whichthe contact site was totally or partiallymaintainedbecause of mainchain atomic contacts, 6 pairs (17 atomic contacts) in which the contact wastotally lost due to deletion or sidechain diminution, and 7 other pairs ( 13 or more atomic contacts) which were partially or probably lost (data not shown). Many lost contacts were dueto glycines in chicken a3 and P2m. As originally pointed out by Bjorkman et al. (6). P2m makes the most residue contacts (29 with al/a2 and 24 with a3), with only 10 residue contacts between al/a2 and a3. Of the contacts between B2rr, and al/a2, 59% (17/29 residue contacts) are invariant and only 10%(3/ 29 residue contacts) are lost by side chain diminution, whereas there are 38%(9/24) invariant and 30%(8/24) lost P2mto a3 residue contacts, and 30%(3/10)invariant and 40%(4/10)lost a 1/a2 to a3 residue contacts(Fig. 5). This suggests that theP2m contacts to al/a2 are the most important interdomain contacts inthe heterodimer. We expected to see changes on one side of a contact site balanced by a compensatory change on the other side, but there was none that we could recognize, with

1542

EVOLUTION OF CHICKEN CLASS I HETERODIMER

the exception of a few salt bonds. Saper et al. (9)have identified 25 residue pairsinvolved in saltbridges. Of the fourinterdomaincontacts (1 between a1 and Pzm, 3 between a3 and Bzm),two are conserved and two are lost. Of the 21 intradomain salt bonds, 7 are composed of invariant or highly conserved residues (2 within a1 , 4 within a2 and 1 within a3), whereas most of the other contacts are lost or probably lost. Thus the salt bonds appear in general to be lessconserved than other kinds of contacts; nevertheless, some of these salt bonds are conserved in virtually every class I molecule known (see discussion below). There are patches of invariant residues on surface, including some residues of CD8 binding site. We expected that the surface residues would be diverged between chicken and mammalian classical class I molecules, except for contact sites with other molecules. By inspection of surface dot models and CPK models, we found that the surface residues are roughly equally divided between radically diverged residues, conserved residues and invariant (or nearly invariant) residues (Figs. 4,6, and 7; and datanot shown). The number of invariant surface residues reflects the overall homology of each domain. There are some 14 isolated invariant residues and atleast 14 clusters of invariant residues (A, B, C. G, L, M, N. QD, QR, R, S , T, U, and theinterdomain contaFt sites), some as large as an antibody epitope (20 x 30 A) (53). As discussed above, many of the interdomain contact residues are invariantbetween chicken and mammalian classical class I molecules. Although some of these contact residues are nearly completely buried in the heterodimer, most of them are partially exposed on the surface. These exposed contact residues are visible as a ring of invariant residues around the interfaces betweenal/a2, a3, and Pzm (arrowheads indicating blue residues in Figure 4, arrowheads indicating white residues marked with colored dots in Fig. 6), with some residues that are not contact residues on the fringes. Some of these invariant patchesmay not have significancebeyond the interdomain contacts (those that are not lettered in Fig. 6). For instance (Figs.6A and 7A), the a3 residues R178(181) and D234(238) and the Pzm residues Rll(12). F61(62), and L64(65) are involved ininterdomaincontacts, whereasthe adjacent a3 residuesG203/S(207) and T236(240) and Pzm residue HY66/Y(67) might simply have a role in orienting the actual interdomain contact residues. But other clustersof invariant residues contain bothresidues involved incontactsand residues far enough away that it is unlikely that they areinvolved in the interdomain contacts (includingthe A and C patches in al/a2, the B patch in a3, and theS patch in &m).For example (Figs. 6B and 7B, cluster A), R42(44)and W59(60) are separatedby P45(47)from the contact residues R46(48)and Y27(27)in a2 andD52(53) in Pzm. Other invariant surfaceresidues may also havesimple structural roles. Virtuaily all of the isolated invariant residues (white residues numbered 1 to 14 in Fig. 6) are located in bends. Most are residues often used in turn structures (like glycine, alanine, proline, aspartic acid) (54); some of the others are also clearly involved in important turn structures (e.g., the H3(3) in a salt bridge with D29(29);Figs. 6A, 6E, 7A, and 7E).In some patches of invariant residues, some or all residues may also be

involved in important intradomain contacts, particularly the S , T, and U patches in Pzm, the B patch in a3, and parts of the C, L, and R patches inal/a2. A s mentioned above, some of these residues are involved in highly Conserved salt bridges (white residues marked with a small black X in Fig. 6). The residues of the G patch are intradomain contacts that are exposed only because @strand G is absent in thepapain-cleaved HLA molecules crystallized for structure determination: in fact, the surface of a3 is mostly diverged. There areeight patches that we think may have interesting functionalroles (Figs. 6 and 7):one small patch on the side of a3 (QD), one larger patch on the end of Pzm (N).three small patches outside the peptide-binding site on the tops of a1 (A and QR) and a2(M), and three large patches located in, around, and below the peptide-binding site at either end (containing residues from a2 (R), from a1 and a2 (L), and from a l , a 2 and a3 (C)). As described below, three of these patches have already been implicated in important functions: QD for CD8 binding and some of the residues inR and L for binding the ends of the peptide. The QD patch is a protuberant epitope on a3, composed of residues Q222(226) and D223(227),which in mammals interact with thecoreceptor CD8. Although a CD8 homolog has been described in chickens (55). none of the otherresidues that has been previously described as important for CD8 binding in mouse or man are conserved (Fig. 2B) (10- 12).For instance, both chicken class I molecules have V241(245), found only in HLA-Aw68 that does not bind human CD8, instead of the otherwise invariant A(245) (11). However, the nearbyresidues T196(200) and Y253(257) have not yet been examined for effects onCD8 binding. In addition, residuesfrom the C patch on al/a2 and theS patch on Bzm might also be contacts withCD8. The N patch is a concave epitope on Pzm, composedof eight residues on the very end of the @-barrel, withsome residues from the A/B, C/D, and E/F bends and some strand residues nominally buried between the two sheets. The location of the N patch is intriguing; it could possibly interact with molecules involved in the assembly of the class I heterodimer (4, 56). with molecules in the membrane to tip the class I molecule (9, 12), with receptors involved in chemotaxis (42-44). or with other receptors in the membrane (57-59). The R patch forms a U-shaped group from residues Q94(96), A114(117), Y115(118),and W144(147) on the floor of the peptide-binding site (presumably covered by the peptide) to K143(146) closing the right end of the groove to R142/H( 145), E145( 148).and W144(147) outside of the peptide-binding site. These last two residues arenearthe M patch(A150(153),W144(147), and E151( 154)) onthe outside, but not above, the a-helix of a2. As mentioned above, K143(146) andW144(147) are involved in positioning the endof the peptide. E 145( 148) is involved in ahighly conserved salt-bond alongthe helix with K141(144) (a basic residue in all classical class I molecules). The residues Q94(96), A1 14(1 17). and (in our analysis but not that of Saper et al. (9))Y115(118) are PZrn contact residues that are invariant in virtually all class I molecules (Figs. 2A and 5). The large L patch is composed mainly of residues from the a2 domain, and includes Y156(159). Y7(7), Y27(27),

EVOLUTION OF CHICKEN CLASS I HETERODIMER

1543

been established. (In fact, the only clear example of a nonclassical class I molecule outside of mammals is the chicken CB-3 Ag (35),which has not been sequenced.) Moreover, any comparisons of all available class I sequences will be quite biased by the differences in sample size for each taxa. Despite these limitations, three interesting points emerge from the comparison of sequences in Figure 2. The firstpoint is that certain mammalian nonclassical class I molecules (HLA-E, F, and G Ag; Qa. Tla. and M Ag) are clearly more related than others (CD1 and FcR) to mammalian classical class I molecules. The mammalian classical and the less diverged nonclassical molecules are all linked to the MHC (unlike CD1 and presumably FcR); these MHC-encoded mammalian class I molecules can be treated as a group. For example, the number of practically invariant residues does not differ whether classical or all MHC-encoded mammalian sequences are compared (35 a1, 39 a2, and 56 a3 positions with 10 or 11 identitiesout of the 11 mammalian MHC-encoded sequences in Fig. 2). Also, the invariant residues in the peptide/TCR and CD8 binding sites are virtually all conserved, arguingthat theMHC-encoded nonclassical class I molecules bind peptides and arerecognized byTCR and CD8. In contrast, the identities shared between CD1 and FcR are mostly shared with all class I molecules (presumably serving structural roles) or are unique, whereasthe functionally-important residues are mostly diverged (Fig. 2). The CD1 and FcR genes may have appeared relatively recently and diverged from classical classI genes quickly (perhaps due to intense selection for nonclassical functions). However, it seems more likely that they diverged from classicalclass I molecules earlier(maybemuch earlier!) than thedivergence of mammals. The second point is that thecrosswise comparisons of all sequences are consistent with the presumed phylogeny. The chicken and lizard sequences areclearly more similar to each otherthan to the MHC-encoded mammalian classI sequences in termsof total identities (chicken and lizard: 39 a l , 49 a2, 39a3; chicken and mammals: 18 a1, 29 a2, 27 a3; lizard and mammals: 23 a1, 25 a2, 27 a3). presumably reflecting a n earlier (or faster) divergence of mammals from the lineage leading to lizardsand birds. The residues that are sharedbetween carp and Xenopus fall into three categories. About a third are shared with lizard, chicken, and MHC-encoded mammalian sequences (4/12 a l , 5/21 a2, 13/34 a3) and presumably represent ancestral residuesthat areunder such intense selection that they cannoteasily change. About one-third are unique to carp andXenopus (1/12 a 1, 912 1 a2, and Some residues that are Conserved between chicken 12/34 “3) and presumably represent ancestral residues and mammalian class I molecules are conserved in that diverged somewhere in the evolution of reptiles, other taxa. To understand the place of the chicken B-F birds, and amphibians. Of the remaining residues, most molecule in the evolution of class I heterodimers, the are shared with lizard, chicken, or both (6/12 a1, 7/21 comparisons of chickens and mammalsdescribed above a2, 7/34a3). must be extended to a range of other species. Recently, Xenopus and Parenthetically, it is unclear whether the class I a chain-like sequences from lizard, snake, frog carp sequences represent expressed classical or nonclas(the anuran amphibian Xenopus laevis), and carp (the sical molecules. They share more aa identities with MHCteleost fish Cyprinus carpio) have been reported (60- encoded class I molecules than CD1 or FcR. However, 62). The relationshipsof these cDNA clones to presumed just as in CD1 and the FcR, many of the residues with expressed cell surface proteins, whether classical or non- important functional roles in classical class I molecules classical in a functional or structural sense, have not yet (and virtually all of the invariant surface residues dis-

F33(33), Y58(59), Y168(171), and W164(167) withinthe groove (presumably covered by peptide), E l 58(161), V162(165), E163(166),and R166(169) on the outside of thea2 helix, and R167(170),E170(173),K173(176), G172(175), L176(179). and W49(51)below the endof the helices of a1 and a2.A s discussed above,the particularly well conserved residues Y 156(159). Y7(7), Y58(59). Y 168(171).and W164( 167) are all of part pocket A in the peptide-binding site (9,51).W164(167) andY58(59) close the end of the groove, whereas the four tyrosines are involved in positioning the end of the peptide. F33(33) is a contact residue betweenthe p3 strand and thehelix of a1, E163( 166)and R166(169) form a salt bridge along the a2helix found in mammalian classical, chicken, and lizard class I molecules, and G 172(175)evidently breaks the a2helix in virtually all class I molecules. L176(179) and thevirtually invariant Y27(27) are contact residues with the a3 domain and p2m, respectively. The reasons for the conservation of E158(161), V162(165), R167(170), E170(173), K173(176),and W49(51) are obscure. The latter fourmight be important a l / a 2 contact residues; K173(176) is in fact a glycosylated asparagine in mouse molecules. In addition to certain residues in the R and L patches, there are residues outside of the peptide-binding site in the M, QR, and A patches that could be involved in contacts with TCR. The residues Q71(72)and R74(75) of the two residue QR patch, T63(64) and R42(44) of the five residue A patch, and E 151(154) of the threeresidue M patch are all clearly accessible from the top (Fig. 6F). In the A string, R42(44) forms a highly conserved salt bridge along the a1helix with D60(61), which is a n acidic residueinvirtually every class I molecule, whereas R46(48)is a contact residue withp2m. N85(86) that is The C patch contains the invariant located in the loop between the a1 and a2domains and is glycosylated in virtually all class I molecules examined. Below N85(86) are residues from a l , a2, and prim. Some of these residues are involved in contacts between these three domains (H91(93) andD l 16(119) in a highly conserved salt bridge holding the ends of the p l and p2 strands of the a2 domain together; T92(94). A1 14(117), of a2 with P2m;H30(31)and P32/S(33) of and D l 16(1 19) p2mwith al/a2). Other residuesare involved in tightturn structures (e.g., G89(91) of a2 and P31(32) of P2m).The residues Y84(85), Q86(87),S90(92),and particularly Y 1 15( 118) are invariant in many classical and nonclassical class I molecules, but the reasons are obscure. This patch is in a position to bind CD8, along with D58(59) of &rn and residues on the underside of a l / a 2 (such a s D119(122), T131(134), A132(135), A133(136), YllO(113). andR6(6)).

1544

EVOLUTION OF CHICKEN CLASS I HETERODIMER

cussed in theprevious section) havediverged in Xenopus to take part in assembly of the class I heterodimer (4, and carp class I molecules. The Xenopus and carp se- 56)).A second unexplained point, the strikingdifference quences may represent classical class I molecules with in homology of a1 and a2 between mammals and chickmany interesting differences from the lizard, chicken, ens, may reflect different roles of these two domains in and mammalian class I molecules due to the long diver- such interactions.Alternatively, structural requirements gence time. But most likely they are not the classical of the protein, RNA, or DNA may be responsible for the class I molecules recognized by T cells in these animals. invariant residues and/orthe differential evolution of the The thirdpoint is that thedomains diverge at different domains. There aresimilar considerations for the a3 domain and rates and to different final limits. Although a3 is the most Conserved between different mammalian classical P2m. Of all the a3 residues identified as affecting CD8 class I molecules or between the two chicken class I binding by mutagenesis, the two invariant (and protubermolecules, it drops to a low percentage of identical resi- ant) residues are almost certainlythe main contacts; this dues between mammals, chickens, lizard, Xenopus, and must be tested for other nearby invariant residues. The carp (with small differences that may represent the di- large invariant surface epitopes on p2m could interact vergence time, as discussed above). In contrast, al and with the membrane, the CD8 coreceptor, molecules ina2 are the least homologous within a taxon, but most volved in assemblyof the classI heterodimer, or receptors homologous between chicken and either mammals(35- involved with pre-T cell chemotaxis. Again, the striking 42% al, 47-58% a2) or lizards (43% a l , 54% ( ~ 2 How). difference inhomology of a3 and p2m between mammals ever, a1 and a2 have the same low similarity between and chickens may reflect different roles of these two chickens (as well as lizard and mammals) andXenopus domains in such interactions, in interdomain contacts, (31% a1, 30% a2) and even less between chickens (as or some other undefined constraint on structure. well as lizard and mammals) and carp (24% a1, 20% ( ~ 2 ) . A third observation, the high G + C bias of the Ig-like We interpret thisto meanthat there are more structural exons (class I a3 and p2m, class I1 p2) compared to the constraints andfewer functional constraints on a3 than nearby non-Ig-like exons [class I a1 and a2, class I1 pl). on a1 and a2, so that thenumber of identical residues in is important for two reasons.First,such codon bias a3 rapidly drops to the structural limit (around 30%), presumably affects the apparent evolutionary distance whereas the number of identical residues in a1 and a2 ofMHC molecules between chickens and other animals drops slowly due to functional constraints but reaches a [particularly reptiles, amphibians, and fish with 50%G lower structural limit (10-20%). The highly invariant C, see sequencesin Refs. 60-62). The distances based residues shared between all classI molecules (as well as on both nt and aa sequences would be affected, because p2m and class I1 82 domain) nearly all have clear struc- the wobble bases arenearly all G+ C in a3 and p2m, and tural roles in turns, disulfide bridges or in packing the the amino acids used are also biased toward those that intradomain spaces (Fig. 2). use G + C rich codons. Second, the origin of the G + C richness is a mystery. The very high G + C content is not restricted to the MHC, because the p2m gene is located CONCLUSIONS outside of the MHC on another chromosome (J.Kaufman Birds and mammals last shared a common ancestor and M. Dominguez-Steglich, unpublished observations). some 250 to 300 million yr ago (63).This work is the first We have suggested that A + T rich chromosomal regions toanalyze the molecular evolution of a n entire MHC corresponding toGiemsa staining-bands were selectively heterodimer over such long spans of time. Some common deleted during the evolution of microchromosomes, and sense expectations have been partially fulfilled: the lo- that this caused those genes located on microchromocations of polymorphic residues are the same, the highly somes to become especially G + C rich, by the same (unknown) mechanismsthat areresponsible for the G + invariant residues that bind peptide remain invariant, the contact residues within and between domains are C richness in R-bands (49). This model suggests that a mostly highly conserved or invariant, some residues of macroevolutionary event [appearance of microchromoprotein evolution. the CD8 binding site remain invariant although the rest somes) can have important effects on The finalunexpected result was thedifferential use of of the surfaceresidues have mostly diverged, the kinase sites in the cytoplasmic tail are invariant, the domains cytoplasmic regions withdifferent conserved sites of that areconserved within andbetween mammalian spe- phosphorylation. The Y313(320) in exon 6 andthe cies are conserved within the avian species examined, S323(332) and S326(335) in exon 7 are among the few cytoplasmic regions of all mamand those that are polymorphic in mammals are poly- invariant residues in the morphic in chickens. Thus, certain portions of chicken malian classical class I molecules and thechicken class and mammalian class I molecules may be interchangea- I molecules. The conservation of these kinase sites over ble (e.g., chicken Pam should bind to mammalian class I such long periods of evolutionary time argues for their importance. The chicken class I molecules that bear both chains and vice versa). However, this analysis has revealed intriguing points requiringfurther clarification, sites of phosphorylation, the Y313(320)site only, or neiwhich are independent of the fact that we do not yet ther sitemay have different behaviors(e.g., in recycling, know the true detailed three-dimensional structure of antigen presentation, turnover, etc.). The functional significance of the invariant features chicken class I heterodimer. One important resultof these analyses is the identifi- can in partbe assessed by mutagenesis of the mammacation of invariant surface features. For instance, the lian genes. However, we need to analyze the MHC molestriking patches of invariant residues on the surface of cules of many other species to begin to understand the role of accident and selection in theevolutionary history a l / a 2 may reflect contacts with invariant residues in the TCR, CD8, or other molecules [such as those postulated of these molecules.

+

1545

EVOLUTION OF CHICKEN CLASS I HETERODIMER

Acknowledgments. We thank Dario Grossberger and Josef Schwager for invaluable advice and training in molecular biology, Hans-Ruedi Kiefer for oligonucleotides, Pamela Bjorkman, David Banner, Fritz Winkler, Bert Teminck, and Jerome Aarden for invaluable help with the molecular graphics,Francois Guillemot and Charles Auffray for useful discussions, Hans Spalinger and Bea Pfeiffer for photographic work, and Klaus Karjalainen, Pamela Bjorkman, David Banner, and Richard Scheuermann for critical reading. We especially thank Fritz Melchers and theBase1 Institute forImmunology for support of K.S. during repeated visits. REFERENCES 1. Klein, J. 1987. Natural Hlstory of the Major Htstocompatlbllity Complex. Wiley and Sons,New York. 2. Davis,M., and P. Bjorkman. 1988. T-cell antigen receptor genes and T-cell recognition. Nature 334:395. and J. Strominger. 1988. MHC 3. Guillemot, F., C. Auffray, H. antigen genes. In Molecular Immunology. B. Hames and D. Glover, eds. Oxford IRL Press, Oxford, New York. p. 81. MHC 4. Kaufman, J., K. Skjoedt. and J. Salomonsen.1990.The molecules of nonmammalian vertebrates.Immunol. Rev. 113:83. diver5. Bjorkman, P., and P. Parham. 1990. Structure, function, and sity of class I major histocompatibility complex molecules. Annu. Rev. Blochem. 59:253. 6. Bjorkman. P.,M. Saper. B. Samraoui. W. Bennet. J. Strominger, and D. Wiley. 1987. Structure of the human class 1 histocompatibility antigen, HLA-A2. Nature 329:506. 7. Bjorkman. P.. M. Saper. B. Samraoui, W. Bennet, J. Strominger. and D. Wiley. 1987. The foreign antigen binding site and T cell recognition reglons of class I histocompatibility antigens. Nature 329:512. 8. Garrett, T., M. Saper. P. Bjorkman. J. Strominger, and D. Wiley. 1989. Specificity pockets for the side chains of peptide antigens In HLA-Aw68. Nature 342:692. 9. Saper, M., P. Bjorkman, and D. Wiley. 1991. Reflnedstructure of the Human Hlstocompatibillty Antigen HLA-A2 at 2.6 A resolutlon. J. Mol. Blol. 219:277. 10. Potter, T.. T. Rajan. R. Dick, and J. Bluestone. 1989. Substltutlon at residue 227 of H-2 class 1molecules abrogates recognition by CD8dependent, but not CD8-independent, cytotoxlc T lymphocytes. Nature 337:73. 11. Salter, R.,A. Norment, B. Chen, C. Clayberger, A. Krensky. D. Littman. andP. Parham. 1989. Polymorphism in the a3 domain of HLA-A molecules affects binding to CD8. Nature 338:345. 12. Salter, R., R. Benjamin, P. Wesley, S. Buxton, T. Garret, C. Clayberger. A. Krensky, A. Norment. D. Littman, andP. Parham. 1990. A binding slte for the T-cell co-receptor CD8 on the a3 domaln of HLA-A2. Nature 345:41. 13. Guild, B., R. Erikson, and J. Strominger. 1983. HLA-A2 and HLAB7 antigens are phosphorylated ln ultro by Rous sarcoma virus kinase (pp60""") at a tyrosine residue encoded in a highly conserved exon ontheintracellular domain. Proc. Natl. Acad.Scl.USA 80:2894. 14. Guild, B., and J. Strominger. 1984. Human and murine classI MHC of HLA phosphorylatlon antigens shareconserved serine 335,the site In vivo. J. Blol. Chem. 259:9235. Chausse. R. 15. Guillemot, F.. A. Billault, 0. Pourquie. G. Behar. A". Zoorob, G. Kreiblich, and C. Auffray. 1988. A molecular map of the chicken major histocompatlbllity complex: the class I1 @ genes are closely-linked to the class1 genes and the nucleolar organizer.EMBO J. 7:2775. 16. Guillemot, F.. J. Kaufman, K. Skjoedt. and C. Auffray. 1989. The Trends Genet. major histocompatibility complex in the chicken. 5:300. 17. Benjamin. D., J. Berzofsky. I. East, F. Gurd, C. S. Leach, E. Margoliash, J. Michael, A. Miller. E. M. Reichlin, E. Sercaz, S. Smith-Gill,P. Todd, and A. Wilson. 1984. Theantigenic structure of proteins: a reappralsal. Annu. Rev. Immunol. 2:67. 18. Kaufman. J.. K. Skjoedt. J. Salomonsen. M. Simonsen. L. Du Pasquier, R. Parisot. and P. Riegert. 1990. MHC-like molecules in some nonmammalian vertebrates can be detected by some crossreactive xenoantisera. J. Immunol. 144:2258. 19. Kaufman, J.. S. Ferrone, M. Flajnik. M. Kilb, H. Vblk. and R. Parisot. 1990. MHC-Hke molecules in some nonmammalian vertebrates canbe detected by some crossreactivemonoclonal antlbodles. J. Immunol. 144:2273. 20. Chothia. C. 1984. Principles that determine the structure of proteins. Annu. Rev. Blochem.53:537. 21. Creighton, T. 1983. Proteins. Freeman and Company, New York. 22. Salomonsen. J., K. SkjBdt, M. Crone, and M. Simonsen. 1987. The

Orr.

Hannum. Prager.

chicken erythrocyte-specific MHC antigen. Characterization and purification of the B-G antigen by monoclonal antibodies. Immunogenetlcs 25:373. 23. M~iller,L.. J. Kaufman. S. Verland, J. Salomonsen, D. Avila, J. Lambris. and K. SkjBdt. 1991. Variations in thecytoplasmic reglon evidently account for chicken MHC class I (B-F) heterogeneity. Immunogenetlcs 34:lIO. 24. Kent, S.. and 1. Clark-Lewis. 1985. In Synthetlc Peptldesln Blology and Medlclne. K. Alialo. P. Partanen. and A. Vaheri, eds. Elsevler. Amsterdam, p. 29. 25. Tam, J.. W. Heath, and R. Merrifield. 1983. Sn2 deprotection of synthetic peptides witha low concentration of HF in dlmethyl sulfide: evidence and application in peptide synthesis. J. Am. Chem. SOC. I05:6442. 26. Lui, F., M. ZiMecker, T. Hamaoka, and D. Katz. 1979. New procedures for preparation and isolatlon of conjugates of proteins and a synthetic copolymer of D-amino acids and immunochemical characterization of such conjugates. Blochemlstry 18:690. 27. Kaufman. J.. J. Salomonsen, and K. SkjBdt. 1989. B-G cDNA clones have multiple small repeats and hybridlze to both chlcken MHC regions. Immunogenetics 30:440. 28. Welinder. K., H. Jespersen, J. Walther-Rasmussen, and K. SkjBdt. 1991. Amino acid sequences and structures of chicken and turkey beta2-microglobulin. Mol. Immunol. 28: 177. 29. Dayhoff, M.. W. Barker, andL. Hunt. 1983.Establishing homologies in protein sequences. Methods Enzymol. 91:524. 30. Jones. T. 1978. J. Appl. Crystallogr. 11:268. 31. Williams, A.. and A. N. Barclay. 1988. TheImmunoglobulin superfamily-domains for cell surface recognitlon. Annu. Rev. Immunol. 6:381. 32. EMiS, P.. A. Jackson, and P. Parham. 1988. Molecular cloning of bovine class I MHC cDNA. J. Immunol. 141:642. 33. McGuire, K., W. D u n c a n , and P. Tucker. 1986. Structureof a class I gene from Syrian hamster. J. Immunol. 137:366. 34. Pontarotti, P.,H. Mashimo, R. Zeff, D. Fisher, L. Hood, A. Mellor, R. Flavell, and S. Nathenson. 1986. Conservation and diversity in the class I genes of the major histocompatibility complex: sequence analysis of a Tlabgene and comparison with a Tla' gene. Proc. Natl. Acad. Scl. USA83: 1782. 35. Pickel, J.,C. Chen, and M. Cooper. 1990. An avian B-lymphocyte protein associated with@,-microglobulin.Immunogenetics 32:I. 36. Kline, K., W. Briles. M. Bacon, andB. Sanders. 1988.Characterlzation of different B-F (MHCclass 1) molecules in thechlcken. J. Hered. 79:239. expres37. Kroemer, G.. R. Zoorob, and C. Auffray. 1990. Structure and sion of a chicken MHC class 1 gene. Immunogenetlcs 31:405. 38. Rogers, M., D. Siwarski, E. Shacter, W. Maloy. E. Lillehoj, and J. Coligan. 1986. Three distinct H-2K" molecules differing at the carboxy terminus are expressed on a tumor from S J L / J mice. J. Immunol. 137:3006. 39. McCluskey. J.. L. Boyd, W. Maloy, J. Coligan, and D. Margulies. 1986. Alternative processing of H-2Dd pre-mRNAs results in membrane expression of dlfferentlally phosphorylated protein products. EMBO J. 5:2477. 40. Vega, M.. and J. Strominger. 1989.Constitutive endocytosis of HLA class I antigens requires a speciflc portion of the intracytoplasmic tail that shares structural features wlth otherendocytosed molecules. Proc. Natl. Acad. Scf. USA 86:2688. 1985. Cloning and complete sequence of a n 41. Koller. B., and H. HLA-A2 gene: analysls of two HLA-A alleles at thenucleotide level. J. Immunol. 134:2727. 42. Brinckerhoff, C., I. Mitchell, M. Karmilowicz, B. Kluve-Beckerman, and M. Benson. 1989. Autocrine induction of collagenase by serum amyloid A-like and ~2-mlcroglobulIn-like protelns.Sclence 243:655. 43. Dargemont. C.. D. Dunon. M. Deugnier, M. Denoyelle, J. Girault, F. Lederer, K. Le, F. Godeau, J. Thiery, and B. Imhof. 1989. Thymotaxin, a chemotactic protein,is Identical to @,-microglobulin.Scfence 246:803. 44. Dunon, D.. J. Kaufman. J. Salomonsen. K. SkjBdt, 0. Vainio, J.-P. Thiery. and B. Imhof. 1990. T cell precursor migration towards g2microglobulln is Involved in thymuscolonization of chicken embryos. EMBO J. 9:3315. 45. Aota. SI.. and T. Ikemura. 1986. Dlversity in G + C content at the third position of codons in vertebrate genes and its cause. Nucleic Aclds Res. 14:6345. G. 1989. Theisochore organizationof the humangenome. 46. Annu. Rev. Genet. 23:637. 47. Bourlet. Y.,G. Behar, F. Guillemot, N. A. Billault. A.4" C h a m , R. Zoorob, and C. Auffray. 1988. Isolation of chlcken major hlstocompatlbility complex class I1 (B-L) @ chain sequences: comparison with mammalian @ chain and expression in lymphoid organs. EMBO J 7: I03 1. 48. Xu,Y.,J. pitcovski, L. Peterson, C. Auffray. Y.Bourlet, B. Gerndt. A. Nordskog. S. Lamont, and C. Warner. 1989. Isolation and characterization of three class11MHC genomic clones from the chicken. J . Immunol. 142:2122. Salomonsen. P. Riegert, and K. SkjBdt. 199 1. Using 49. Kaufman, J., J. chlcken class I sequences to understand how xenoantibodies cross-

Orr.

Bernardi.

Frechin.

EVOLUTION OF CHICKEN CLASS I HETERODIMER

50. 51. 52. 53. 54. 55. 56.

57. 58.

59.

60. 61.

62.

63.

64.

react with MHC-like molecules in nonmammalian vertebrates. Am. Zool. 31:570. Hughes. A.. and M. Nei. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature335: 167. Madden, D.. J. Gorga, J. Strorninger, and D. Wiley. 1991.The structure of HLA-B27 reveals nonamer self-peptides inan extended conformation. Nature353~321. Travers. P.. T. Blundell. M. Sternberg, and W. Bodmer. 1984. Structural and evolutionary analysis of HLA-D region products. Nature 310:235. Amit. A.. S. Mariuzza, S. Phillips, and R. Poljak. 1986.Threedimensional structure of a n antigen-antibody complex at 2.8 A resolution. Science233:747. Milner-White, J., and R. Poet. 1987. Loops, bulges, turns and hairpins in proteins. Trends Biochem. Sci. 12: 189. Chan, M., C. Chen. L. Ager, and M. Cooper. 1988. Identificationof the avian homologues of mammalian CD4 and CD8 antigens. J . Immunol. 140:2133. Degen, E., and D. Williams. 1991. Participation of a novel 88 kD protein in the biogenesis of murine class I histocompatibility molecules. J . Cell Biol. 1 12: 1099. Edidin, M. 1983. MHC antigens and nonimmune functions. fmmunol. Today 4:269. Kittur. D., Y. Shimizu, R. DeMars, and M. Edidin. 1987. Insulin binding to human B lymphocytes is a function ofHLA haplotype. Proc. Natl. Acad. Sci. USA 84:1351. Due. C., M. Simonsen, and L. Olsson. 1986. Themajor histocompatibility complex class I heavy chain as a structural subunit of the human cell membrane insulin receptor: implications for the range of biological functions of histocompatibilityantigens. Proc. Natl. Acad. Sci. USA 835007. Grossberger, D.. and P. Parham. 1992. Reptilian class I major hisI structure. tocompatibility genes reveal conserved elements in class Immunogenetics. In press. Flajnik, M..C. Canel, J. Kramer. and M. Kasahara. 1991. Evolution of the major histocompatibility complex: molecular cloningof major histocompatibility complex class I from the amphibian Xenopus. Proc. Natl. Acad. Sci. U S A 88:537. Hashimoto. K.. T. Nakanishi, and Y. Kurosawa. 1990. Isolatior, of carpgenesencoding major histocompatibility complex antigens. Proc. Natl. Acad. Sci. USA 87:6863. Young. J. 1981. The Llfe of the Vertebrates, ClarendonPress. Oxford. Watkins, D., 2. Chen, A. Hughes, M. Evans, T. Tedder, and N. Letvin. 1990. Evolution of the MHC class I genes of a New World

65.

66. 67. 68. 69. 70. 71. 72.

73.

74. 75. 76. 77.

78.

79.

80.

primate from ancestral homologues of human non-classical genes. Nature 346:60. Kvist, S.,L. Roberts. and B. Dobberstein. 1983. Mouse histocompatibility genes: structure and organization of a Kd gene. EMBO J . 2:245. Tykocinski, M.. P. Marche. E. M a x , and T. Kindt. 1984. Rabbit class I MHC genes: cDNA clones define full-length transcripts of a n expressed geneand a putative pseudogene. J . Immunol. 133:2261. Satz, M., L.-C. Wang. D. Singer, and S. Rudikoff. 1985. Structure and expressionof two porcine genomic clones encoding classI MHC antigens. J . Immunol. 135:2167. Koller, B., D. Geraghty, Y. Shimizu, R. DeMars. and H. Orr. 1988. HLA-E. A novel HLA class I gene expressed in resting T lymphocytes. J . Immunol. I 4 1 :897. Devlin. J..E. Weiss. M. Paulson, and R. Flavell. 1985. Duplicated Qa2 region of the murine gene pairs and alleles of class I genes in the major histocompatibility complex: a comparison. EMBO J . 4~3203. Singer, D.. J. Hare, H. Golding. L. Flaherty, and S. Rudikoff. 1988. Characterization of a new subfamily of class I genes in the H-2 complex of the mouse. Immunogenetics28:13. Balk, S . . P. Bleicher. and C. Terhorst. 1989. Isolation and characterization of a cDNA and gene coding for a fourth CD1 molecule. Proc. Natl. Acad. Sci. USA 86:252. Sirnister, N., and K. Mostov. 1989. An Fc receptor structurally related to MHC class I antigens. Nature337: 184. Gussow, D.. R. Rein, I. Ginjaar. F. Hochstenbach, G. Seeman, A. Kottrnan. and H. Ploegh. 1987. The human µglobulin gene. J . Immunol. 139:3132. Gates, F.,J. Coligan, and T. Kindt. 1981. Complete amino acid sequence of murine &-microglobulin: structural evidence for strainrelated polymorphism. Proc. Natl. Acad. Sci. USA 78:554. Groves, M.. and R. Greenberg. 1982. Complete amino acid sequence of bovine &-microglobulin. J . Biol. Chem. 257:2619. Gates, F., J. Coligan, and T. Kindt. 1979. Complete amino acid sequence of rabbit P,-microglobulin. Biochemistry 18:2267. Wolfe, P., and J. Cebra. 1980. The primary structureof guinea pig @2-microglobulin.Mol. Immunol. 17: 1493. Loegdberg, L. 1982. PhD thesis. University of Lund, Sweden. Larhammar. D., L. Schenning, K. Gustafsson, K. Wiman. L. Claesson, L. Rask. and P. Peterson. 1982. Complete amino acid sequence of an HLA-DR antigen-like p chain a s predicted from the nucleotide sequence: similarities with immunoglobulins andHLA-A, -B. and -C antigens. Proc. Natl. Acad. Sci. U S A 79:3687. Watts, S., C. Wheeler, R. Morse, and R. Goodenow. 1989. Amino acid comparison of the classI antigens of mouse major histocompatibility complex. Immunogenetics 30:390.