Interphotoreceptor Retinoid-binding Protein

0 downloads 0 Views 3MB Size Report
and the fourth, are large, 3173 and 2447 bases, re- spectively. The introns are each about 1.5-2.2 kb long. The human IRBP gene has a sequence that is similar.
Val. 264, No. 2, Issue of January 15, pp. 1115-1123,1989 Printed in U.S.A.

THEJOURNALOF BIOLOGICAL CHEMISTRY

Interphotoreceptor Retinoid-binding Protein GENE CHARACTERIZATION, PROTEINREPEATSTRUCTURE,

AND ITS EVOLUTION* (Received for publication, August 9, 1988)

Diane E. Borst$, T. Michael Redmond$,John E. ElserQ,Matthew A. GondaQ, B. WiggertS, Gerald J. ChaderS, andJohn M. Nickerson$ From the $Laboratory of Retinal Cell and Molecular Biology, National Eye Institute, National Institutesof Health, Bethesda, Maryland 20892 and the §Laboratoryof Cell and Molecular Structure, Program Resources,Inc., National Cancer Institute, Frederick Cancer Research Facility, Frederick, Maryland21 701

The gene for bovine interphotoreceptor retinoidbinding protein (IRBP) has been cloned, and its nucleotide sequence has been determined. The IRBP gene is about 11.6 kilobase pairs (kb) and contains four exons and three introns. It transcribed into a large mRNA of approximately 6.4 kb and translated into a large protein of 146,000 daltons. To prove the identity of the genomic clone, we determined the protein sequence of several tryptic and cyanogen bromide fragments of purified bovine IRBP protein and localized them in the protein predicted from its nucleotide sequence. There is a &fold repeat structure in the protein sequence with 30-40%sequence identity and manyconservative substitutions between any two of the four protein repeats. The third and fourth repeats are the most similar pair. All three of the introns in the IRBP gene fall in the fourth protein repeat. Two of the exons, the first and the fourth, are large, 3173 and 2447 bases, respectively. The introns are each about 1.5-2.2 kblong. The human IRBP gene has a sequence that is similar to one of the introns from the bovine gene. The unexpected gene structure and protein repeat structure in the bovine gene lead us to propose a model for the evolution of the IRBP gene.

Interphotoreceptor retinoid-binding protein (IRBP’ or 7 S protein) (1-3) is a large glycolipoprotein found in the interphotoreceptor matrix (IPM) between the neural retina and retinal pigment epithelium (4). It is capable of binding retinoids (1-3), as well as fatty acids (5), with retinol bound in a light-dependent manner (6). It is the only retinoid-binding protein in the IPM in monkeys where it constitutes about 70% of the soluble protein (7).Based on these characteristics, it has been proposed that IRBP plays a role in retinoid transport between retinal photoreceptors and pigment epithelial cells (8). The protein can induce experimental autoimmune uveitis (EAU) in rats (9) andmonkeys (10) andmay be involved in human uveitic conditions as well (11). The expression of IRBP shows tissue preference, occurring * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore he hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. T h e nucleotide sequence(s)reported in thispaperhas been submitted to the GenBankTM/EMBLDataBankwith accession number(s) J04441. The abbreviations used are: IRBP, interphotoreceptor retinoidbinding protein; IPM, interphotoreceptor matrix; kb, kilohase pair; bp, base pair; PTH, phenylthiohydantoin; PIPES, 1,4-piperazinediethanesulfonic acid.

mainly in retinal photoreceptor cells and in pinealocytes (12). The IRBP mRNA is large, varying from 4.4 to 7.4 kb in retinas of different animals. In most species, only one message is found, although a second, less abundant band is detected in some species (13). Partial cDNA sequences of bovine and human IRBP have been published (14-16), as well asits chromosomal location in human and dog (16, 17). There is only one copy of the gene per haploid genome (18). IRBP is a large protein, about 140 kDa, in almost all vertebrate classes examined. The single exception appears to be in bony fish (teleosts) where it is only about one-half this size (19). Bovine IRBP binds two retinoids per polypeptide (20) and has approximately four noncovalently attached fatty acids in the purified protein (5). Its shape, as indicated by electron microscopy and hydrodynamic analysis, is an elongated rod with a flexible region in the middle (21, 22). These properties, and thefact that we have found some evidence for internalsimilaritiesin the amino acid sequence (23, 33), suggest that specific repeated structures are present in the protein that result from at least one internal duplication within the IRBP gene. The determination of the nucleotide sequence of the gene is necessary for the study of the evolution of the protein, its structure andfunction, and for the analysis of the elements that control its expression in normal and abnormal states. We have obtained cDNAs (14) and recently cloned full length representations of the bovine IRBP gene (18).Here, we report the sequence of the entire gene and some a for the evolution of its flanking sequences and present model of this gene. EXPERIMENTAL PROCEDURES* RESULTSANDDISCUSSION

Fig. 1 shows the clone XgIRBP7 that we have analyzed by DNA sequence determination. The gene for IRBP andseveral kilobases of 5’- and 3”flanking sequences have been cloned (18). Two(XgIRBP7 and XgIRBP8) of the several clones contain full length copies of the protein-encoding portion of the gene (Fig. 1).The IRBP gene does not contain any high or middle repetitive sequences, as Southern blots of bovine genomic DNAs digested with several restriction enzymes and probed with XgIRBP7 show only a small number of discrete bands (18).This analysis also suggests that there is only one gene per haploid genome (18).The gene is compact, only 11.6 kb, considering the large size of the protein (140 kDa) and Portions of this paper (including “ExperimentalProcedures,” Tables I-IV, and Figs. 2-7 and 10) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal thatis available from Waverly Press.

1115

IRBP Gene and Its Evolution

1116 E hgIRBP7

E

vector\\-l-/-

K

S

E

1-1

EK E S

I

I-I-I-1-1-1-1

I

I-I-I-1-1-1-1

I

I-I-I-I-I-1-1

E

E K

\\ v e c t o r

E

hg I REP8

\\

h g 1 REP9

\\-

EXON 1

EXON

2

\\

I

\\--l-l-l-

Ag I R B P 4

I-\\

EXON

1-1

3

EXON

\\

4

GENESTRUCTURE INTRON B INTRON

INTRON A

I

0

I

5

I

10

C

I

15

I

20

k i I obases

FIG. 1. Restriction map, gene structure, and protein repeats of the genomic clone XgIRBP7. E = EcoRI, K = KpnI, S = SalI. The approximate positions of the four exons and three introns and the four protein .~ repeats are indicated.

large 6.4-kb mRNA (13, 14).Therestrictionmap shows several restriction sites. The gene contains four exons (Fig. 1) of 3173, 191, 143, and 2447 bp as will be demonstrated below. The fourth exon consists mainly of 3”untranslated sequence. All three introns(2230,1961, and 1491 bp in length) fall in the region encoding the fourth protein repeat. Based on the protein structure shown in later figures, the locations of the four protein repeats in thegene are also indicated. We determinedthe sequence of XgIRBP7 (reportedhere)and parts of the other gene clones including 264 bp of XgIRBP8 and 135 bp of XgIRBP (see Ref. 18). These latter sequences are identical with thatof XgIRBP7. An assembly of a total of 250 gel readings from a shotgun sonication library of subfragments of XgIRBP7, when compiled, gives the entire sequence of the IRBP gene (Fig. 2). There are 133 gel readings on the positive strand and 117 gel readings on thenegative strand with nogaps on either strand. A total of about 59,000 bases was accumulated in the course of this 12-kbsequencing project. Fig. 3 shows the complete nucleotide sequenceof the bovine IRBP gene and several important structural features including the putative Cap site, polyadenylation signal, and polyadenylation addition site, and the boundaries of introns and exons. Based on the DNA sequence, the gene itself is 11,636 bases long, and the mRNAis 5,954 bases long excluding the poly(A) tail. Twenty-one bases downstream of the polyadenylation site isa sequence,TTGTTATCTTTT, which resembles a consensus sequence, TT(G/A)NNNTTTTTT, associated with the ends of eukaryotic genes (34). The consensus sequence may play a role as a putative regulatory signal and mayhavearole in 3’-end formation or in termination or cleavage at the 3‘-end of the gene. This consensus sequence is found5 to 20 bases downstream of the polyadenylation site in 50% of eukaryotic genes. The AATAAA motif at position 11,613 is the first polyadenylation signal encountered in the 3”untranslated region of the IRBP mRNA. Near the 3’-end of the IRBP mRNA (gene positions 11,395 t o 11,410) is an (AC) sequence that is repeated eight times. A possible polymorphism may be presentatthesepositionsasanother, otherwise homologous, cDNAclone contains not 8 but 14

repeated (AC) units at this po~ition.~ Similar sequences of alternating purines and pyrimidines known are to beinvolved in enhancement of selected viral promoters and may convert to a Z-DNA conformation under certain conditions (35). We have sequenced a clone, XgIRBP16, obtained from a human chromosome 10 library (chosen on the basis of the localization of the IRBP gene). The human IRBP gene sequence is very similar to the bovine cDNAand gene sequences (Fig. 4). The similarity in the exons is about 83%; the introns are about 57% identical. The human gene exon sequence is identical with a partial human cDNA sequence reported by Liou et ul. (16). In this case, the sequences match for 128 nucleotides and then diverge at asplice acceptor site (36) found only in the gene sequence (Table 11). Two possible lariat signal sequences (37) are just upstream from the putative splice site (Table 111). The splice acceptor site in the human gene (Table 11) is located in an identical position in the bovine gene. At the other end of the human2.0-kb Hind111 fragment, there is a sequence similarity to thebovine second intron, with sequence identity of 57%. The splice donor and acceptor sequences of the bovine and human genesagree closely withtheconsensus sequences for splice junctions (bottom of Table 111). They all contain the invariant bases GT and AG at the beginning and end of introns. Thus, this gene fragment, 2.0 kb in size, contains parts of one exon and one intron of the human IRBPgene. Primer extension and S1 huclease protection experiments were performed to map the beginning of transcription. The results are shown in Fig. 5. In the primer extension experiment, a set of 3 oligonucleotide primers were used to define the putative Cap siteof the mRNA. All 3 primers (617, 619, and 1574, see Table I) gave consistent lengths, 617 and 1574 mapping the beginning of the IRBP mRNA at position +1 within 1base. The extended productfrom 619 maps to within a few bases of position +1, the precise size being difficult to determine because of its long length, about261 and 265 bases. This is shown in lunes I , 2, and 11, respectively. The lines a indicate the lengths of the extended products based on sequencing ladder shown in lunes 3 through 6 for oligonucleJ.3. Si and J. M. Nickerson, unpublished observation.

IRBP Gene and Its Evolution

1117

otides 1574 and 617. For oligonucleotide 619, the sequencing ladder is shown in lanes 12 through 15. Compensating for the length of the primersandtheir positions relative tothe putative Cap site, they all indicate the same position for the Cap site. Negative controls are described in the figure legend. None of the controls generated similar extended products (lunes 7-10). The corresponding Sl/mung bean nuclease protection experiment also mapped the beginning of the mRNA to gene position +1.Lane 5 shows the labeled oligonucleotide prior to digestion, lune 2 shows the unprotected oligonucleotide subjected to S1nuclease digestion, and lane 1 shows that retinal RNA protects 26 and 27 bases of the oligonucleotide from digestion by S1 nuclease. The same experiment was performed with mung bean nuclease, and the same result was obtained shown in lunes 5, 4, and 3, respectively. Experiments using the two enzymes thus give essentially the same result with mung bean and SI nuclease protecting a 26-base fragment of oligonucleotide. These enzymes bracket the same site, i.e. Cap FIG. 8. Dot matrix analysis of the bovine IRBP protein site, identified by the primer extension experiment. The sequencing ladders (A,G,C,T in lunes 8 through l l ) and an sequence compared to itself. Evidence for a 4-fold repeat is indiby four diagonal lines along any column. The mutation data oligo(dT) ladder afforded accurate sizing of the protection cated matrix (39) allowed for conservative and semiconservative substitureactions. tions to be accepted. The Align program was used with a window of The protein sequence for bovine IRBP deduced from the 25 and minimum match of 25. gene sequence is given in Fig. 6.The sequence beginning with the authenticN terminus of the mature protein isolated from coincide with hydrophobic sequences of the protein. These the IPM is shown. We also determined the protein sequence residues may be located in hydrophobic clefts, or grooves, or of 36 tryptic and cyanogen bromide fragments, as well as the might be buried in the core of the protein. N-terminal end of the purified IRBP protein. We have localThere are proline 5 residues at positions 690 and 694. These ized all of these fragments in the predicted protein sequence may be involved with the flexible bend found by electron of the gene (see highlighted sequences in Fig. 6). The deduced microscopy in IRBP (22). protein sequence also contains a putative signal sequence of Sequence analysis allows for the identification of a number 17 amino acid residues and a putativepro-peptide, i.e. a short of putative sitesof covalent modification of the IRBPprotein. 5-amino acid sequence positioned between the signal sequence Five sites in the protein sequence match the N-linked glycoand the authentic N terminus of the secreted extracellular sylation site consensus sequence, NX(T/S).Three of the IRBP. These are notshown here and aremore fully discussed putative glycosylation sites arelocated in equivalent positions elsewhere (18). By conventional amino acid sequencing, we (shifted by not more than 1 amino acid residue) within their determined 622 amino acid residues corresponding to 48% of respective repeats (see Fig. 9). The locations of these glycothe entire 1264-amino acid sequence of the mature protein. sylation sites correlate with hydrophilic areas of the sequence, In each case, the amino acid residue matched that deduced especially the 3 glycosylation sites that align among the from the nucleotide code. repeats (at positions 183, 491, and 1092 in the protein seDot matrix analysis (Fig. 7) demonstrates a repeat structure quence). These areas of the protein sequence could be on the in the nucleic acid sequence, albeit weak, which indicates a 4- surface of the protein. Edman degradation of one tryptic and fold repeat. In contrast to thenucleotide sequence, there is a two CNBr fragments (TRP7, CB105, and CB58) supports the strong 4-fold repeatstructurein the protein sequence as presence of carbohydrate on those fragments. The putative shown by dot matrix analysis (Fig. 8). Each repeat is about glycosylatedAsn residues in these peptides could not be 300 amino acids long. The boundaries between repeats 1 and detected in their respective cycles, suggesting that they had 2 are at amino acid positions 301-302, between repeats 2 and been derivatized with carbohydrate. 3, amino acids 609-610, and between repeats 3 and 4, amino Thereare 18 potential Ser/Thr phosphorylation sites acids 910-911. The final repeat ends at approximately posi- (RXY(Z)(S/T)). Threeof these sitesalign from one repeat to tion 1209. There is a short C-terminal extension of 55 amino the next,about 180 and 197 amino acids into the repeat acids. Table IV shows that there is about 30-40% sequence (marked by dots on Fig. 9). Many of these are likely to be identity (lower left) between any two of the four protein found on the surface of the protein rather than in the interior repeats, the homology scores (38, 39) (upper right) also indi- based on hydrophilicity analysis (Fig. 10) discussed later. cate the same relationships. Thereare many conservative Wiggert et al. (40) have demonstrated the phosphorylation of substitutions as well. The third and fourth repeats (38.8% IRBP by endogenous kinases of the IPM.Lectin-binding identical) show the greatest similarity (see Table IV). The characteristics of IRBP appear to be markedly altered after precise alignments of the 4 protein repeats are shown in Fig. phosphorylation. 9. There are 42 residues per repeat that are identical among Fig. 10 shows the hydrophobicity profiles of the repeat all 4 repeats, and 62 residues per repeat that are identical in structure of IRBP. Ingeneral, the hydrophobicity at any point 3 of the 4 repeats. Some of these residues are clustered, in one of the repeats is similar to that of the other three, suggesting evolutionary constraints that could imply an im- although there areexceptions. portant function for these areas. Possible constraints might Another aspect of the repeat structure is reflected in the include retinoid- or fatty acid-binding sites or close interac- induction of experimental autoimmune uveitis by IRBP. In tion with membrane receptors or IPM proteins, etc. Several addition to the intactprotein, three CNBr peptide fragments of the conserved sequences near the ends of the 4 repeats of IRBP, CB58, CB*47 (a subfragment of CB72), and CB71

IRBP Gene and Its Evolution

1118

1 F O RSLEL EM , * ~ _ ~ _ L _ L D N ~ C F C E N L M G U ~ ~ K CA S IOE O EA I LI S 2 RAL:~GVIORLOEA~~EYY~LVDRVPALLSHLA*MOLS VS E VD D L V T

3 RSLGELgEG T 4 AKVPT V L a T A

ISDPOT~~HVLTXGMSSLN~~~I~~YE

K L N A G L O A V I E D ?L O v a v v R lkSOLTADLOEMSGDHRLLVFHS OSRY A ~ ~ ~ ~ ~ ; $ E A A L A E L L O A D L O V L S G M ) H L K T A H I

GRLLEAHYARPEVVGOMGALLRAKLAOGAYRTAVDLES GKLVADNYASPELGVKMAAELSG L CHO

1

CST

L E A P P R A P A V ~ L E IE I A G L O O G L R H E I L E O C I U Q Y L R V ~ D I P G o x 2 PKEASSGPEEEAEEPPEAVPEVPEDEAVRRALVDSVFOVSvtPGNVOYL~~~~~FADAS 3 MPVGAEE E A P P PVPPPSVP E E L S Y L FI EKATLE V L P G O L G Y L R T D A M A E L F 4 PK PEG DDR IA Vl P M O I P S P E V F E D L FI K HFTSN V L E G N V G Y L R C D M F G D C E

cno

0

CHO

1 2

3 4 0

CHO

1

2

3 4

1 2

3 4

FIG.9. Alignment of the four repeatswithin the bovine IRBP protein. Cross-hatches indicate that the residuesare identical in all four repeats, and shades indicate that 3 of the 4 residuesare identical. Putative glycosylation sites are in solid bores and flaggedwith CHO.Possible phosphorylation sites are marked with dashed boxes. The solid dots mark three phosphorylation sites that align among repeats. also induce disease (33). Of these, CB*47 and CB71 are homologous and arederived from the endof the first protein repeat (amino acid position 228 to 274) and the end of the fourth protein repeat (amino acid positions 1135 to 1205), respectively.

Homology Considerations

Two protein families contain members that bind retinoids and fatty acids. The first family is the serum retinol-binding protein family which are all secretory proteins and includes at least 10 members (41). One member of this family, purpurin, is found in the interphotoreceptor matrix of the chicken (42). Common sequences have been found in these proteins that might be involved in the binding of retinoids or other ligands (43, 44).The members of the second family are cytosolic proteins and it includes a t least 8 members (45-47). All these proteins have significant similarities in the N-terminal halves of the polypeptide chain suggesting a common evolutionary ancestor. IRBP is not similar to otherretinoid-binding proteins in primary amino acid sequence, size, repeat structure, retinoid-binding specificity, concentration, or tissue location. Also, searches of GenBank and NBRF databases, versions 55.0 and 16.0, respectively, do not identify any genes or proteins with significant similarities to IRBP or its gene sequence.

human and bovine IRBP protein sequences appear to be structurally similar with 80-90% identity in the repeat sequences and asimilar line-up, i.e. repeat 4 of the bovine protein is most homologous to repeat 4 of the human protein but much less similar to repeats 1, 2, or 3. The only intron position yet determined (Fig. 4) is located identically in human andbovine genes at thesplice acceptor site. The distribution of introns in “proteinmosaics” which were invented even more recently (