cDNA Cloning of PG-M, a Large Chondroitin Sulfate Proteoglycan

0 downloads 0 Views 5MB Size Report
Jul 5, 2016 - TCT GAC CAG ACT GTT AGG TAT CCA ATT AGA CAT CCA AGA ATT GGT TGC TTT GGA GAT MA ATG GGA M G AM GGA GTC AGA ACA ...
THEJOURNAL OF BIOLOGICAL CHEMISTRY Q 1993 by The American Society for Biochemistry and Molecular Biology, Inc.

Vol. 268,No. 19, Issue of July 5,pp. 14461-14469, 1993 Printed in U.S. A.

cDNA Cloning of PG-M, a Large Chondroitin Sulfate Proteoglycan Expressed during Chondrogenesisin Chick Limb Buds ALTERNATIVESPLICEDMULTIFORMS

OF PG-M AND THEIRRELATIONSHIPS TO VERSICAN* (Received for publication, January 28, 1993, and in revised form, March 23, 1993)

Tamayuki Shinomura, Yoshihiro Nishida, Kazuo Ito, and Koji KimataS From the Institutefor Molecular Science of Medicine, Aichi Medical University, Nagakute, Aichi 480-11, Japan

We haveisolated cDNA clonesencodingthecore (4, 5) suggest that the extracellular matrix plays important protein of PG-M,a large chondroitin sulfate proteogly- regulatory roles in mesenchymal condensation as an environcan that has been shown to be expressed in the premental factor. chondrogeniccondensation area ofthedeveloping PC-M, a large chondroitin sulfate proteoglycan, is a molechick limb buds (Shinomura T., Jensen, K. L., Yama- cule that has been shown to be one of the major extracellular gata, M.,Kimata,K.,andSolursh, M. (1990) Anat. molecules of the condensation area and to disappear with the Embryol. 181, 227-233). Theaminoacidsequence development of cartilage (6,7).The disappearance is inversely deduced from the cDNA analysis revealed the presence correlated to a dramatic increase of aggrecan, a cartilageofahyaluronicacidbindingdomain at theaminocharacteristic large chondroitin sulfate proteoglycan, during terminal side and two epidermal growth factor-like maturation of cartilage (8).Such a spatiotemporalexpression domains, a lectin-like domain, and a complementregof PG-M in the limb bud indicates the transientformation of ulatory protein-like domain at the carboxyl-terminal side. These domains show an extremely high homology a specific extracellular matrix necessary for the precartilage to corresponding domains of a human fibroblast large condensation process. PG-M hasbeen shown to interactwith chondroitin sulfate proteoglycan, versican. Such evo- hyaluronic acid, fibronectin, and typeI collagen (9) which are lutionally conserved structures in the PG-M core pro- distributed in the whole limb bud. Therefore, these results strongly suggest that PG-M may play essential roles in the tein might be involved in important biological funcregulation of the cellular activities by the extracellular matrix tions of this molecule. On the other hand, the chondroitin sulfate attachment domainat the middle region during skeletal cartilage development. In the present study, we describe the cDNA sequence that of the PG-M core protein shows no significant amino acid sequence homology to the corresponding domain encodes the entirecore protein of PG-M. The deduced amino of the versican core protein. Further, the chondroitin acid sequence revealed the presence of specific domain strucsulfate attachmentdomain of PG-M core protein is tures with a high homology to the corresponding domains in about 100 kDa larger than that of versican core pro- versican, a fibroblast proteoglycan (10). Furthermore, cDNA tein. The finding of alternatively spliced forms of the analysis revealed the presence of alternatively spliced forms PG-M core proteinsuggests that versican might be oneof the PG-M core protein. In these molecules, the chondroitin of the multiple forms of PG-M. sulfate-attaching region was truncated. With regard to these composite findings, the implications of the structure and the function of PG-M in the condensation process are discussed. Skeletal development in the vertebrate limb begins with MATERIALS ANDMETHODS the onset of condensation of undifferentiated mesenchymal Preparation of Antibodies against PG-M Core Protein-A rabbit cells. The process is considered to be essential for subsequent immunized as described previously (11).Briefly, polyacrylamide chondrogenesis and a key step in determining the formation was gel slices containing the 550-kDa protein-enriched core fraction from of correct skeletal patterns (1).Since the process includes cell PG-M (see Ref. 6) were homogenized in phosphate-buffered saline division, adhesion, migration, and differentiation (Z), it must (145 mM NaCl, 2.8 mM NaH2P04, 7.2 mM Na2HPOa, pH 7.2) and occur through aregulated interplay of these cellular activities, mixed with an equal volume of complete Freund’s adjuvant. This mixture was injected subcutaneously into a rabbit. The obtained which are influenced greatly by environmental factors surrounding the cells. The observations that some of the extra- antiserum was subjected to affinity chromatography on a column of Sepharose 4B gel coupled with intact PG-M molecules. The bound cellular matrix molecules are expressed with a strict spatiantibody was eluted with 3 M KSCN in phosphate-buffered saline otemporal regulation during prechondrogenic condensation in and then dialyzed against phosphate-buffered saline. This antibody the limb buds (3) and can influence chondrogenesis in vitro preparation willbe referred toas anti-PG-Mantibody. Antisera

* This work was supported partly by grants-in-aid from the Ministry of Education, Culture, and Science, the Special Coordination Funds for Promoting Science and Technology from the Science and Technology Agency, and a special research fundfrom Seikagaku Corporation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBLDataBankwith accession number(s) D13542. $. TOwhom correspondence should be addressed. Fax: 561-63-3532.

against special regions of the PG-M core protein were also prepared.

A 20-mer synthetic peptide (DKMFERDFRWTDGSPLQYEN) derived from the lectin-like domain (amino acid residues 3395-3414) was used as an immunogen. After conjugation with hemocyanin, the peptide was used for immunization. This antibody is referred to as anti-lectin-like domain antibody.A rabbit antibody against the alternatively spliced domain (amino acid residues 549-1223)was also prepared. A cDNA fragment encoding this region was inserted into the protein Agene fusion vector, pRIT-2T (Pharmacia LKBBiotechnology Inc.). The fusion protein produced in transfected Escherichia coli N4830-1 was purified from a cell lysate using IgG Sepharose 6 Fast Flow and used for immunization. The obtained antibody was designated anti-splicing domain antibody.

14463

14462

cDNA Analysis of PG-M

Tissue Labeling and Immunoprecipitation Analysis of PC-M Core Protein-Limb buds were dissected from chick embryos at stage 2223 (determined according to Hamburger and Hamilton (12)) and cultured with either [35S]sulfate or a "C-amino acid mixture (Du Pont-New England Nuclear) in Ham's F-12 medium supplemented with 10% fetal calf serum and 0.005% ascorbic acid under 5% CO,, 95% air for 5 h at 37 "C. The labeled tissues were then lysed by boiling a t 100 'C for 10 min in 2% SDS, 50 mM HEPES, pH 8.0, 10 mM EDTA, 1 mM phenylmethanesulfonyl fluoride, 0.36 mM pepstatin. After removing the insoluble material by centrifugation, the tissue lysate was subjected to immunoprecipitation as described previously (13). After digestion with chondroitinase ABC in the presence of protease inhibitors, the precipitates were dissolved in 2% SDS, 0.065 M Tris-HC1, pH 6.8, 10% glycerol, 0.13 M dithiothreitol and then analyzed by electrophoresis on a 5% SDS-polyacrylamide gel. The radioactivity was detected by fluorography using Enlightning (Du Pont-New England Nuclear). Preparation and Screening of a Agtll cDNA Library-Poly(A)+ RNA was isolated from stage 22-23 chick limb buds by a guanidine isothiocyanate method (14) followed by oligo(dT)-cellulose affinity chromatography. An oligo(dT)-primed cDNA library was constructed in Xgtll using a cDNA synthesis kit (Pharmacia) as described previously (15). The library comprised 4.8 X lo7 independent recombinants when transfected to E. coli Y1090. A total of 5 X lo5 independent clones was initially screened with antibody to PG-M according to the methods described by Young and Davis (16). Positive plaques were detected by visualizing the bound antibodies using peroxidaseconjugated protein A (E-Y Laboratories Inc., San Matero, CA). After the first cDNA clones, XMa and XMb, were isolated by immunoscreening, subsequent clones were obtained by plaque hybridization using 32P-labeledcDNA probes. Labeling of cDNA probes was done by a random priming method (17). DNA Sequencing and Analysis-The inserts from Xgtll cDNA clones were purified and subcloned into pGEMBZf(-) plasmid vector (Promega Corp., Madison, WI). The nucleotide sequence of the isolated cDNA was determined by the dideoxy chain termination method (18) using oligonucleotide primers synthesized based on the preceding sequences. DNA sequences were determined for both strands at all positions. The obtained DNA sequences were compiled and analyzed using DNASIS computer programs (Hitachi Software Engineering Co., Yokohama, Japan). Thededuced amino acid sequence was compared with other protein sequences inthedata base compiled by the National Biomedical Research Foundation in December 1991. Northern BlotAnalysis-RNA was extracted from stage 22-23 chick limb buds and from cultured 10-day chick embryo fibroblasts. About 10 ag of total RNA was electrophoresed in a denaturing formaldehyde-agarose gel (0.6%) and transferred to a Hybond N' membrane (Amersham International, plc) by a vacuum blotter, VacuGene XL (Pharmacia) under alkali blotting conditions recommended by the manufacturer. A 2-kb' cDNA fragment corresponding to nucleotide positions 6439-8406 in Fig. 3 was radiolabeled with 32P and used as probe B. Prehybridization and hybridization were carried out at 42 "C in the presence of 50% formamide and 10% dextran sulfate for 24 h as described previously (19). The membrane was then washed with 0.1 X SSC (15 mM NaCl, 1.5 mM sodium citrate) containing 0.1% SDS at 65 "C and exposed on x-ray film, Fuji-RX (Fuji Photo Film Co., Japan). For another experiment, the hybridized 32Pprobe was removed from the membrane by boiling in 0.1% SDS. The membrane was then rehybridized with probe A that was derived from the splicing domain, nucleotide positions 2276-3810 in Fig. 3. Hybridization and washing conditions were the same as described above. Sizes of RNA were determined usingan RNA ladder (Bethesda Research Laboratories). Immunoblot Analysis-Partially purified PG-M was digested with chondroitinase ABC in the presence of protease inhibitors as described previously (20) and thensubjected to 5% SDS-polyacrylamide gel electrophoresis. Proteins in the gel were then blotted onto a nitrocellulose sheet asdescribed by Towbin et al. (21). The membrane was stained with various antibodies described above. Polymerase Chain Reaction (PCR) Analysis-To identify the alternatively spliced forms of PG-M cDNA, the following three different cDNA libraries, stage 22-23 chick embryo limb bud cDNA library, 12-day chick embryo epiphyseal cartilage cDNA library ( X ) , and 10day chick embryo fibroblast cDNA library (22) were analyzed by The abbreviations used are: kb, kilobase pair(s); PCR,polymerase chain reaction; bp, base pair; CEF, chick embryo fibroblast.

1

2 "

Origin -o

3

' r v

b-

xPG" PG-M

4 f

(Vl)

200 kDa -m

116 kDa -m

97 kDa ".

FIG. 1. SDS-polyacrylamide gel electrophoresis of immunoprecipitates of PG-M from chick limb bud. Chick limb buds at stage 22-23 were metabolically labeled with a 14C-aminoacid mixture (lanes 1and 2)or with [35SJsulfate(lane3). The crude tissue lysates were then subjected to immunoprecipitation with anti-PG-M antibody. After treatment with (lanes 2 and 3) or without (lane 1) chondroitinase ABC, the precipitates were separated on a 5% SDSpolyacrylamide gel. Arrows indicate positions of marker proteins: myosin (205 kDa), 8-galactosidase (116 kDa), and bovine serum albumin (80 kDa). PCR. The reaction was performed in a model PJ 2000 DNA thermal cycler (Cetus Co., Emeryville, CA) using a GeneAmp DNA amplification reagent kit (Takara Shuzo Co., Osaka, Japan) under the condition of 30 cycles at 95 "C for 1 min, 54 "C for 2 min, 72 "C for 3 min, and then finally 72 "C for 15 min. The template DNA was prepared from 1 X 10' independent phage clones. The positions of specific primers used for PCR analysis are indicated as a, b, c and d in Fig. 4. Each primer was composed of an outer primer and aninner primer: a, TGGAGGTGTTCGTTACCCTGCC and GGTGGAGGT TTACTTGGGGTGA; b, AATTTCCTGGAGTAGCGGTCCC and

ACTCGGAGCAGCAGTTACTGCA;c,CTGCCGTAACAGAAGCA GGAAC and GCCGATCCTTCTGTTTGCTGCG; d, GCTGAGACT GCTACTGACCAAG and GGAACCACCACAAGGGAAGAAG. The first PCR was done by combinations of the outer primers of a and b, c and d, or a and d. The products were then used as templates for the second PCR which was done by the inner primers of the same combinations as described above. The final products were analyzed by agarose gel electrophoresis. Genomic Southern Blot Analysis-Chick genomic DNA was obtained from Clontech (Palo Alto, CA). Two micrograms of the DNA were digested with either BglII, EcoRI, HindII, KpnI, PuuII,SmI, or XbaI. The digests were electrophoresed in a 0.7% agarose gel and then transferred to a Hybond N' membrane by a vacuum blotter as described above for Northern blot analysis. The membrane was hybridized with a 32P-labeledprobe derived from a 536-bp polynucleotide near the splicing region (nucleotide position 4441-4976 in Fig. 3). Prehybridization and hybridization were done at 42 "C in the presence of 50% formamide for 24 h, followed by washes and autoradiography as described above. RESULTS

Characterization of Antibodies to PG-"The specificity of anti PG-M antibodies was confirmed as follows. Tissue lysates were prepared from chick limb buds that had been incubated with 14C-aminoacids or [35S]sulfateand subjected to immunoprecipitation (Fig. 1).The 14C-labeledimmunoprecipitate

cDNA Analysis of PG-M

14463

LECTIN

U PC-M HABR

SPLICE

I

1

0

lo00

x2

XI

I 2000

EGF CRP I

3Ooo

I 3563 (am)

FIG. 2. Alignment of isolated cDNA clones (top)and schematic structure (bottom)of PG-M core protein.Isolated cDNA clones coding for the PG-M core protein are aligned. The scale for kilobase pair length is indicated at the top. Coding and noncoding regions are represented by thick and thin lines, respectively, and EcoRI restriction sites are indicated by arrowheads. The 5’ region of clone Mk indicated by the dotted line was different from the overlapping clones. This nonmatching segment is derived from an intron sequence detected in a premature mRNA (see “Materials and Methods”). The gross structural features of the predicted PG-M core protein are also represented schematically. The number of amino acids is indicated on the bottom scale. Organization of various domains is indicated by boxes as follows: hyaluronic acid binding (HABR),alternative splicing (SPLICE),epidermal growth factor-like (EGF),lectin-like (LECTIN),and complement regulatory protein-like (CRP)domains. Details of the X1 and X2 domains are described under “Materials and Methods.”

gave two core protein bands after chondroitinase ABC digestion. These two bands corresponded to the 550- and 500-kDa core proteins of PG-M as described previously (6). Theresult indicates that the antibody specifically recognized the PG-M core proteins in the limb bud extracts. The presence of the [36S]sulfatelabel in the core protein preparations is probably due to thesulfated disaccharide unit next to the linkage region which is resistant to digestion with chondroitinase ABC. Therefore, the differential labeling of the core proteins with [36S]sulfatemay indicate that chondroitin sulfate chainswere more abundant in the 550-kDa core protein than in the 500kDa one (compare lunes 2 and 3 in Fig. 1). This willbe discussed later. Isolation and Characterization of cDNA clonesfor PG-MWe first isolated two cDNA clones, Ma and Mb from 2.5 X lo6 independent clones of a Xgtll expression library of stage 22-23 chick limb bud, which were reactive tothe above antibodies and had inserts of3.2 and 3.1 kilobases, respectively. Their restriction maps and partial cDNA sequences revealed that theMb was completely included within Ma. We isolated flanking sequences in both 5’ and 3’ directions from the initial clones by plaque hybridization screening and aligned them by restriction mapping and hybridization. Although one clone, Mk, showed a discrepancy in its 5”terminal region when aligned (for details, see below), a single series of consecutively connected sequences could be constructed as shown in Fig. 2. The nucleotide sequence and its deduced amino acid sequence are shown in Fig. 3. The cDNA is 12,307 nuclotides long and contains a short 5”untranslated region of 144 nucleotides followed by a single, long, open reading frame of 10,686 nucleotides which codes 3,562 amino acid residues

corresponding to a proteinwith a molecular weight of 388,062. The 3”untranslated region of 1,477 base pairs includes a polyadenylation signal of AATAAA (23) but does not extend to a poly(A)+ tail. The initiator methionine is followed by a putative signal peptide composed of 26 amino acid residues. Both the nucleotide sequence upstream from the translational initiation site and the amino acid sequence around the possible signal peptidase cleavage site are in good agreement with Kozak’s rule (24) and Von Heijne’s (-3, -1) rule (25). The putative protein sequence contains a total of 61 SerGly and Gly-Ser sequences which are presumed to be substituted with chondroitin sulfate chains. Although many of these sequences show little similarity to the consensus glycosaminoglycan attachment sequence, acidic-acidic-Xaa-Ser-GlyXaa-Gly(26),theyretain acidic amino acids around the putative acceptor serine residues. In addition, 23 possible glycosylation sites for N-linked oligosaccharides (27) are present. Both types of glycosylation sites are distributed almost uniformly on the core protein. There are 35 cysteine residues in thecore protein. They areexclusively located in theaminoand carboxyl-terminal portions except for 2 cysteines in the central part of the core protein. A computer-assisted homology search revealed that both the amino- and carboxyl-terminal portions of chick PG-M showed high homology to those of human fibroblast proteoglycan, versican (Table I) (10). However, the central parts of the two proteoglycan core proteins have no significant similarity except for two small regions, the X1 and X2 domains shown in Fig. 2. Alternative Splicing Events-Although the core proteins of chick PG-M and human versican show regions of high ho-

cDNA Analysis of PG-M

14464 CTT CCA GCA CCA ACC AGG

1 41 81 121 161 201 241 281 321 361 401 441 481

521 561 601 641 681 121 161 801 841 881 921 961 1001 1041 1081 1121 1161 1201 1241 1281 1321 1361 1401 1441 1481

ATG TTG TTA AAC N -M L - L GTC CTT CCA TGC V L P C ACA GTC CTT GTG T V L V G U AGT GAT GCA A A S D ACT TTG AAT TTT T L N* F TCT GAC CAG ACT

--- --

S

D

V

R

L Y GCT CAG A Q

ATA MA AGC ATC ATA I K S I I TTC TTT TCA ACCACA F f S T T GCC CAG AAT GGG AAC N A Q N G GGG GTG TAC CGCTGT G V Y R C ACA CAA GCT CAG CAG T O A O. O GTT AGG TAT CCA ATT

""_

""""""

Q

T

V

R

Y

P

I

CGTT U TGC GGC CAC TOG GAT CTC TGC

TGG ATG u W CCC ACC P T ATC AM K I GAT GTG V D ACT TGT C T . AGA CAT

ATA I ATA GGC I G ATGTAT U Y TTG GAC O L ~ CCA AGA

R

P

H

c

GCG

TTGTGA TTT

TTT

TGCAAT CTC

T

L

Y

R

Y

E

N

*

Q

T

G

TGC

GGA GAC CCC CGG AAC U G CAC CTC AATACG CTA CCT TTT M G ATC AAG

24 144 264

TAT ATG Y M GAA TAT E Y AGG GTG R V CAA GGC Q G GCA AGC A S GAT MA

CTA L CTT L TCT S

CCT MA P -K CGA ATC I R GTT CCA V P ATT GTll TCG S I V CCA GAG C M P E O ATG GGA M G

E TGG T U AAG K K W S ACC U T TGAG U E T H S CTG GCTGTT GAT D L A V CTG AAG GU GCA L K A A AM GGA GTC AGA

V K T I K GTT GAA CTG GAT AM V L D E K GAG acc GGT GAT GCT A E G D T GGT GTA GTA TTT CAC V G V F H TAT GAA GAT GGA TTT F E D G Y ACA TAT GGA CGC CGT

TCC K U AGC GGA AAA U K TCA CTGACT S L T TAT CGT GCA Y R A GAA CAA TGT E Q C TTT CCT AAT

CTC TCT GGA ACG TU 5 L U T GAT GCA AM GAA ACC D A K T E TTC TCTAGG CTG CGT F S R L R GCC ACC AGC AGG TIC I T S R Y GAT GCT GGC TGG TTG D A G w 1 GAA ACT TAT GATGTC

I

D

U

K

T

F

*

TTA GCT A TAT Y CAG AAC TAC Q N Y GGA GTT GAG G V E AAT GGT GCT

R

ACT AGC CTG GCC ACC

ATA ACA T I GCA GCC A A AAG GAC D K GAC ACA D T GTC ATA N G A " I ATT GGT TGC TTT GGA

TGC TCT acG

""_

"_"

G

F

C

F

G

K

G

GTT MA GU

GM MA MAACC CTG GTG AAG =A

- - -V - -K- - -A -

""""""

K

MA

G

V

R

Y

G

P

Y

P

D

S

K

F

R

R

P

N

E

T

Y

D

384

504 624 144 864 V

CGG MA AGA GAT GGT GTT CTT GCT TCT GTT GGA R K R D G V L A S V G GCA AGG CCC CAG TGT GGT GGA GGT TTA CTT GGG R Y P A S V A R P ATT GTA TCA GAG CCTACA ACT GTT M G CTG GTT

984 1104

~

C

V

I

Q

I

T

ACT GAA CCC TGT T

E

P

G

ACT

C

T

N

*

TCA S

D

GAC

T

TCT

D

GTG

D

~ V TTGU G TTC

K CAG

1561

M G

K GTG Y GAT D AGC

N

.

V

T

F

S

Q

I

N

*

H

V N E F L D L F S R H I L P H A V D E T H T D A E S TCG GM TCT TAT ATA ATATTG GAT CCT TTC TTT CCA "2 TTT ATG GAC TTT GM GAG GAG GAA GAG GAC TGT G M M T ACT ~ E Y I I L D P F F P N F M D F E E E E E D C E N AAT GGA AAG C M Caa GTG ACA AGTGU CCT AAA AGC ACT AAA GCT GAG GAA GCA AGA ACT GAC CAA ATT GM ACT GTT Gu U T

T

4584

GAT

D

S

G K Q Q ACC TTC ATA ATA N

T

F

I

l

V T S TCT GAA ACA S

E

T

A GM

E

P GCT A

K S T K A E E A R S D Q I E TCT GGT ACA ATG CAG CCA AGC AM GCTGGA GM GTA ATG GGG -

T

U

Q

P

S

K

A

G

E

V

M

G

A

F

S G U

E

V A H TTT G M GTc

*

T 4104 4824

v

4944 CCG ACA GU GAT GTT G U ATG TTA GAA CCA GTA TAT AGT GGTGAG TCA GAA GTT A 0 2 ACA A U GAC AM TAT TTA GAG ATT ACG TCT GTC TATGAG CAG T U CCA MA AM Q P T A D V A M L E P V Y E S E V T T T D K Y L E I T S V Y E Q S P K K N 5064 GAG ACT GTA ATG TGG CAT GGA ACT GAG GAG AGC TCT ACC AAA GAC x u AM AAC TTG CTT TTG ATC ACT U T GM TCT TCTGGA GAT GGC TCC ACT GM TCT GAT TTG TCT AGG E T V U W H G T E E S S T K D T K N L L L I T N . E S - D - T E S D L S R S 5184 TTC ACAGu ATT TTG ACA ATG TCAAGT CAT GM GAT AGT GU AM ATT TCT CATACA ACT TCT GTT CCT ACT ATC CTT TCT GTG GM AWL TCT GU GTA ACT GCT GCT CCG 03 F T E I L T U S S H E D S E K I S H T T ~ V P T I L S V E R S A V T A A P S A 5304 TCT GAT ACT GCT ACT GGC GTA ATA GAT GTGAM GAC CTT ATT C U MA GGT GGG ACC CCT ACT CCA GGA M T TAC TATAM TCG ACT A T T AM CTT GAT GCA GM TTC CCT TTTGM S D T A T V G I D V K D L I P K G G T A T P G N K K K S T I ~ L D A E F P F E 5424 AAC CCA G M GU A U AGT CAT ACAACA AAG CCT GAC ATG ACT GCT TCT TCT TTT ATA GTT CTG GM GGT TCT GGA GAT GTA GM GAG AAC AGC ACT TTA GCTT U GCC ATG ACA

1121

5544 1161 5664 1801 5184 1841 5904 1881 6024 1921 1961 2001

G

~

D

P

CAG

Q

E

G

1224

A Y C Y E R K K I V S E P T T V K L V GCT M A GTT ACT CTT A M CCT TCT GTT TTT GAG AGT TCA GTT ACT GAA GTG K T GTC ACC M A ACA AAG GTC CCT GCT TGG GAG

ACT GAT GTT ACA ACT CCT CCA GCT ATC T D V T T P P A L Q F I N T U AM AAT GTA ACT TTT TCACAA ATC U T GM A U AAC S

1681

ACT

ACA ACT CTG MA ACT GAC AGT GTG GAA CTA T U TCT 1344 T T L K T D S V E L S S A K V T L X P S V F E S S V T E V A V T K T K V P A W E GAA GCA ACC CTA GAG ACT GAA GAT ACA AAA ATG ACT ACT GAA GTG GCT GAG GAG MA AGG GAA ATG GAG GTGCTT ATG GAG AAC ATT M G TTA ACC ACC CTT TTA CCT U G ACT GTC ACA 1464 E A T L E T E D T K M T T E V A E E K R E ~ E V L U E N I K L T T L L P Q T V T GAT GGT GAA ATA AGC CCT TAT GAT ACCCTG GGA AGG ACT G M TAT GAT GTT TCA CCT AGG TTA ACA GAG AGC ACA TCT GCA GCT TTG GM GTG GAG CAC ACT TIC TCT GM GU GAA TTG 1584 D G E I S P Y D T L G R T E Y D V S P R L T E S T S A A L E V E H T Y S E A E L TCC GAG GAA CAA tGT AGG TCT GAA AGC ACT GAG GAC GCT TTC TTG ACC TCT GTA GTT CAG TTTGAT AGC ACA GCT GTA GCT AAG AGT TCC ACT GGT TCA TGGGM GAT ATT GM ACA GGA 1104 S E E Q G 8 S E S T E D A F L T S V V F Q D S T A V A K S S T U E D l E T G GAT ACA CAA MA UT GAT GGT GAT AAT UG ACT CAA CAA ATA GAA GTG GGT CCT CTG ATC A U GCT ACA GATAGC TTG GTA CCA GCT TCT CAG AGA GAG TTG ccc a= ACA GGG TCT TCA 1824 D T Q K H D D D N ~ Q T E Q I E Y G P V ~ T A T D S L V P A S Q R ~ L P R GTT TCA CTA ACA M A GAA AAT CTG TAT CTT GGT TCT CAC TCA ACA AAA G M CCC ACA AAA AM TCA ATG GAA GCA MA TCT GAC AAG AM CTT ACA ACT GTT GTA ATC CCT AAG GCT TTG 1 9 4 4 V S L T K E N L Y L H S T K E P T K K S M E A K S D K K L T T V V I P K A L TTC ACT GAT CAG TAT GAT CTT ACTGGG ACA GGG GAG GGA AGA GAG AGC A X TAC ACC GTTATG CCT GAT AGA GTT TCT GGC GTG GCT CTG GTT AGC ATC CCA GAA T U GAT GTT CCT GCT 2 0 6 4 F T O Q Y D L T T G G E G R E S I Y T V ~ P D R " V A L V S I P E S D V P A GTA TCA GAG ACA CTC ATG GAT GAG CTTGCG GTA ACC ACT GGA CM TCT TCTACA GU GAT GAG TCA ACT CCA TTC ATT MA TTT AGT TCTTCT GCA ACT GAG TTGGAT AAT GAG GCG TCA 2184 V S E T L M D E L A V T T G Q S S T A D E S T P F I K F S S S A T E L D N E A S GCT GAG GGA AGC AGA GAG GAC TTA AAA GAT GTG CAT CTC ACC ACG TCA GGA TCTATA CCT GTA TCC TTT ACT TTA TTC ACC MGCC T GAG ACA GGA TCT GAG GTC ACT GCT CTC T U GM 2304 A E - R E D L K D V H L T T S - I P V S F T L F T A N * E T - E V T A L S E 2424 AGT ACC AGTGCC CCA CAG AAC TTT GAA GAA GGC ATC ACA TCTGTT CTT CAC T U TCG CAG UA ACA GM GGA TCG GCT ATT TTA GAA AAG CAG GAA AAA ACA AAG GAG CCA G M ATG AGT S T S A P Q K F E E G l T S " L H s S Q Q T E A I L E K Q E K T K E P E U S 2544 ACA ATA GAT K T AAA GTC CTT TAT ATCACA ACT GTT GTTCCT GCT TCT GTT ACG GCA GGA TCT GAG GGA CGC TTT GGA AGT GAG AAG TTC ACG CAC ACT CCTCCA GTT TCT GGC ATG TGG T I D A K V L Y I T T V V P A S V T A E G R F E K F T H T P P V U U W 2664 GAA GAA ACA TCT CAT ACT AAG AGA ATT GAG TTA GAC ACT GM GAT GAT ATC TCT GGA ATG GAG CCT ACA T U TCT C U GGG C M ATT CTG CAG ACA GAT AAG GAT CAG GTC TAC ATG ACA L O T D K D P V Y U T E E T S H T K R I E L D T E D D I M E P T S S P G Q I 2 184 ATA GAA TAT ACA AAG CAT CTA GGA GCT CCA GTT TCA GCA GTA ACA GAT GAA ACC AAG ACA AGC ATG GAG ACAGCA GM ACA GAA AGT GAT GM GM GTA GTA TCA GCT GAT TTT GAT U G I E Y T K H L G A P V S A V T D E T K T S U E T A E T E S D E E V V S A D F D Q 2904 ACT MA GGT ACT ACA GAG GTC TTC CAT ACA AGC AGC TCT TTA GAT GAA AAG TTG TTT ACT CTA TCC MA ATA CCA GAG GAT GM AGT ACT GU ACA GTG MA TCG TTT AGC TCC T U AGT T K G T T E V F H T S S S L D L E K F T L S K I P E D E S S A T V K S F S S S L 3024 GGT ACT GTG TTACCA ACT GCT GTG GCC ACT GTC TTG GAG GTGACT GAT CAT O M GCA GAT GM A U TCA GGA TAT GTT TTG M C ATG ACC TTT TCT ACCCCA GM GGT GM C M AGA M G _ G T V L P T A V A T V L E V T D H E A D E T U Y V L N ' M T F S T P E G E Q R K 3144 GCT ACT GAG MA TCA CCT GCA ACT TCT GCTGM GAT GAA GTC TCT A U GGA ACA GAG ATC T U A M TAC ACA ATG A U GM GGC vf U G llTA AGC AGT GTG ACA TCT GCT GAG M G GAG A T E K S P A T S A E D E V S T G T E I S K Y T N T E G G Q I S S V T S A E K E 3264 T U GTG GCA GCA CTT CAG GAG AGG GAA GM CAA CCG TCT GTTujc TTA C U GM ACA M G GM CCC TTTM G YT'I A U GAT GTA A U GAG ATA GAG ACT ACA GTT C U CAG AGA GM GGT S V A A L Q E R E E Q P S V G L P E T K E P F K F T D V T E I E T T V P Q R E G 3384 GAT ACT TCC CTA GTT C U GTC ACT GTA GGT AGT GAG GAC ATT GGA GAG ATG CAA GTT ACT GAC U T ACT TCT TTT GAC AGT ATC ATT U T ACT GM GCA A U GTA A U AGT A U AM GCT D T S L V P V T V G S E D I G E U Q V T D U T S F D S I I H T E A T V T S T K A 3504 AGT GAA GTG TTT CCT AAG GAG CTC TCC ACA AAA GAT CAA GAT AGG G M CTA GGT ACT GCCATG GGG TCT ACCTTA CCT GTC A U TCT GTC CAA ATG CAT GAA CM M G ACG ACA GCA GGA ~ E ~ ~ P K ~ L ~ T K ~ Q D R E L G T A M ~ T L P ~ T ~ ~ 3624 TTT GAA TCA CCT U A ACA ACC ACT CAGGM MA CAT GAC GAG ATG GGT T U GCC TAT GATGAG ATC TAT CCA GCA A U GM TTA T U GTG CCT GCA TTA ATG CTT A U GM TAT GGA UA F E S P Q T T T Q E K H D E U _ E S A Y D E H Y P A T E L S V P A L U L T E Y G Q 3141 GTT TCA GGG CCT GTT GM ACT AGC ACCAGG TCT TTG CAT CTC ACA GGG ACT CCCMA GCT GAG ACT GCT ACT GAC C M GM GAG M G ATT A U GAA GU GTG CCT GTA ACA TTT GGT ACA V P V E T S T R S L H L T G T P K A E T A T D Q E E K I T E A V P V T F G T 3864 C M GCA MA GTG TAT GAA AGC MA GGA ACC ACCACA AGG GAA GM GAC AGA GAT GTA GGA AGC TV. U T TCT R G TTG CCC CCCU T ACG ATG CTG AGT AGT CCT TCT ACT GU GGA AGT Q A K V Y E S K G T T T R E E D R D V U N S V L P P H T U L S S P S T A 3 984 ATT ACT CTT CTG ACT TTG GCA GCT TCT CCA ACT CAA ACT CCA GM GGT TU GGA ATA TU GM GAG CTT GAA GM GTC M A AU GTT CCA TTT TU TCT AGG =a ACA GAT AAG A U ACT l S L L T L G A S P S Q T P E G 9 G I S E E L E E V K T V P F S S R A T D K T T 4104 M AGTA ATT CRG CCT ACA T U Gu TCC MA CCT TTT GTT TCT T U AAG TCG C U CGT ATT ATT CCT GAG GM GAT GAG GAG GTT ATT AGT GAT TTA ACT ACA TCA TCT ATT AGC GAC GCT V I S D L T T S S I S A V D K I Q P T S A S K P F V S S K S P R I I P E E D E E 4224 GTA ACA AGC AGT GAC ATC ATA GTA ATT GAT G M TCT ATT TCT CCA AGT AAA GCC AGT GCTGM GAT GAT CTGACA GCA AAG ATG GTA GM C U GM ATT GAT AAG GM TAT TTC A U TCA Y T S S D I I V I D E S I S P S K A S A E D D L T G K M V E P E I D K E Y F T S 4344 T U ACT GCT ACTCCA GTT GCA CGG CCT ACTGCA CCA CCC ACA GTG ATG GAG GCC ACAGAG GCC TTA UA CCA UA GAG GTG TCT CCT ACT T U CAC CCT GAC AGT GGA ACT GAC ATA AGA S T A T A V a R P T A P P T V u E A T E A L Q P Q E V S P T S H P D T D I R 4464 TTG TAT GTT ATTCM ATC ACA GGC AAT GAT ACA UGAT T CCG GTG AATGAG TTT TTG GATCTG TTC AGT CGCU C ATT CTT CCT U T GCA GTA GAT GAG ACC U T ACT GAT GU GAA T U

A U T

1641

TTT

TAT TGC TAT GTG GAA CAC ATG CAA GAT GAA GTT GTT CAT GTT TCT GTTCCT GAG AAG CTC ACC TTT GAG GAA GCA AM GAG CTA TGT Y C Y V E H M Q D E V Y H V S V P E K L T F E E A K E L C AAC ATG TAT GTGGCC TGG AGG AAT GGA TTT GAC CAA TGT GAC TAC GGC TGG CTG GCG GAT GGA AGT GTT CGT TIC CCT GCC TCT GTG ~ ~ Y ~ A U R N G F D Q C ~ Y G U L A D ~ V GTG AGA ACC CTG TAT CGC TAT GAG AACCM A U GGC TTT CCT TAC C U GAT AGC AAG TTT GAT GCC TIC TGC TACG M CGT MA AAA

1521

1601

GAA

6144 6264 6384

2041 6504 2081 6624 2121

FIG. 3. Nucleic acid and predicted amino acid sequences of PG-M core protein. A putative signal peptide is underlined with a dashed line. Ser-Gly and Gly-Ser sequences are underlined with thin lines. Potential N-linked glycosylation sites are indicated by asterisks. The polyadenylation signal is underlined with a thick line.

T

~

Q

~

~

cDNA Analysis of PG-M 2161 2201 2241 2281 2321 2361 2401 2441

TTT TTT GAC U G GGA TCT F F D Q G S C U TCC GCT GTG GTT U T P S A V V H W G ETC-Wh ETGXEIL TCX ~ L T V T S GAT CCA TTC TTC ATA D P F F I S TTG CCA TTT M T MGAC T L P P N N D GAT ATT CCC AGA TTA TTC D I P R L GAG T U TTG GCTGTG MA E

T U S AM

S

A V K E T T N T L S P ~ P F H P A S GCT M T GM ATT M T GM GAA ATT A U ACT ACT GCAGCA GAG CTGA U G M ACT Gu TAC T U ATG M

2481

K

2521

CTT L

2561

AGC AGA GTA S R V GCT

2601

A GCA

2641 2681 2721 2761 2801 2841 2881 2921 2961 3001 3041

3081 3121 3161 3201

A

D

3281 3321 3361 3401

3441 3481 3521 3561

K

GAG GM

E

E

T U TCT S S M T CGA N

*

A GAT

E I N ATTACA U T TAT

D

I

T

H

E E I T T T A TTTCTT GTG ATT GM GAC CCG TIC M C M U G

Y

F

L

V

I

E

D

P

Y

N

K

A GM

E

E L T ATG GAC CAT AGA

M

D

V

GCC E

P

E

S

~

E

C

S

G

Y

D

G

V

R R G ATA CTG AGC G M ACT CCT

S

H

D

S

F

P

D K T Q V F E ~ ACC AGT TCC C U GCT TTA GM GM GAG AGC TCC AGTU C TCC M C TCT

T

A

E

K

L

S

A Y M T 001 ACG

AGG GGA G M

TCT T U ATG U T CTG ACT A U GAT GAC GTA ACT GTT CCTTCT GTA S S H M L T T D D V T P V S V I L S E T P TTA CCT GM AGT TCTGGA GAG GGA TCT GGT TGG GAT GGT T GTT U GAT Tu TTT GCC CCT L

G

T

A

A

D

P

K

T

G

T

S

E

E

L

T

S

D

T

G

M

S

L

D

D

M A C U CTA

* G T S R CTT GM ATG GGA AM TCT

T

L

I

I

P

T

V

P

~

V

S

M

P

N

L

~

I

R

~

T

~

~

~

D

D

N

M

K

A

E

A

T

V

S

S

S

S

S

E

E

K

H

S

T

V

Y

M

Q

T

K

R

S

T

E

AGC

S AGT

A

S

T

E

Y

~

~

T

A

P

S

V

~

E

V

7944 E

L

T

D

D

R

R

H

V

T

L

N

*

V

E

D

K

Y

D

~

GGA

S TCT

I

L

GAT D

N

TTA GGT TCA GGG TAT GCC ATG CAG GCA TTT ACA K D L ~ S G Y A ~ ~ GAG TCT AM GM TCT GAG TTC GAG GCT AM U C ATT E S K E S E F E A K H I TCT ATT TCT GGA TTT TCTGGA ATG GGC CM G M GM I F M G Q E E ~ GAT GTA AGT ATG GTC ACC U G ACG ACT ATG TCCC M F D V S M V T T ~ S M S G M A U GAC T U GTG ACT TTA M T TCT GTA TCT C M

E

V

8304

V ~ T GGT 8424

S

Q

* 8544

Gu

F

A

F

P Y S N E ~ S T I S GTG GGT TCG GGTT M GGC A U GCA GTG T V _ G s V G A E TTG C U GGT TTCM T GGT GAG CAG TGT GM L P E G F N G a C E AGC TAT ATT GGT GCT CTC TGT GAG C M GAC Y C S E I G A L a D GM TGC CGC CTACM GGA GCC U C CTG A U C E A H T R L Q G L GAT TTC CGC TGG ACT GAT GGT AGC CCA CTG 0 F L R ~ T A2 D P TGG U T GAT GTTC U TGC M T TIC CAC CTG n N D V P C N Y H L GAG ATT M T TCC CTTA T T AGA TAT CAC TGC E I R C N S L I H Y TCC ACA TIC CAA AGG ACT TAT TCT M G MA s T Y Q R T Y s K K AGG CGC TGA GTG CTA U T GGT GAA TGG GGA

"

R

S

E

C M ATT

Q

I

TTA GAT L D A U GAG

T

E ATT I C M TAT a Y ACC TAC T Y AM GAT

AGC S

X

D

TAC TAC Y Y TTT CCA

L

L

N

T

I

E

L

~

CCA TGC AM AGT C P K S ATT GATGM TGC CAG TCT M C CCA C I D E N P Q TAT GGT TGG CAC M G ACT TGT GAT Y G n H K T C D TTA TCC CAT GAG G M C M GTC TTC E E L S H a V F GAG M C TGG CGA C U M C U G CCG E N * R P N a P ACC TGCM G AM GGA ACA GTT GCC K A T V T C K GGT TTC ATC C M CGT CAT ATT CCA G F I R H I P Q TATAM U T TCT TCA TCA GGA M G S 5 Y K H s4 K CCA TTT U G C U L A G TTC T M CTT

T

S

S

L

P

TGC CTT M T C N L M T GGA GCC A N G GGA U G TGC G F C Q Q GTGM C CGT ATT GGG I N R V G GATACT TTC TTT TCT

GGT G

M T CCC

P

P TGCCGC R C TTC C M

G

Q

0

F

S

N

D

S

F

TGTGGT CM G C a ACCATA CGT I R T GGA A U TCA T s G TCTGTG CCT

F

CCT P TGC C TTA L

TTT

E

V

T

N

GGT ACC TGC C Y G T A U TGT ATAGAT ax T C I D G TICAM TAC TTT GCC A F Y K Y U C GAC TACCAG TGG w H D Y a GCT GGA G M GAC TGT S A G E D C CCT G T I GTA G M U T GCA P V V E N A C M G G G M T G G A AGA TGG ~ G N RG n M T TCC TCAAM CAC TAC H 1 N * S S K CTA T U CTT GTAGGA GTA

*

_

C

L

S

D

CTAT U CGT GGT TCA TTT P Y R TGC CTC U T A U TTT T L C N T F CAC CGA CGT ACC TGG GAT R R T W H D ATT GGC CTC M T GAC M G I G L N D K TGG CAT GTT GTT ATA ATA V V I 1 n H M G ACC TTTGGG M G ATG K T F G K M GAT ATG CCTMA ATT A U D M P K I T CAT CGC TGG ATC AGG ACG H R U I R T TTA TGA M T TGT TTT G M

F

8784

N

~

L

I

G

T

9984 TAC ATCACATGT TGT I C T C CTG TGT CTA CCA 10104 L C L P ACG GCTGAG AGA 10224 T I E R ATG TTTGAG CGT 10344 ~

~~~

~

~~~

U F E R GAG U T GGG U G E AM

N

G

10464

Q

CCT CGT TAT10584 K P R Y TGC ATGM T CCG 10704 C M N P 10824 TGG CAG GAC TCA W

ACT

P

ACG

D

S

GAC

TCC 10944

R

GGT ATT CAC M T GCA TTT TGC AM M T ATT GTG TTT ATC AGA AM TTT M G AM GTA TAT AGA GGT AM GTG CTT ATG UiG ATA AM AGA M C CAT TTC CAG CCT ATA ATG ATC ATT AGT 11064 TTT CTA TAT GCC ATC ACT GTG GAC CAT TTT ATG ATA CTT ACC AGC CTT TTG C M TAT TTA M C AGC TGA TTT TAG CAC C M TGA AM TGT AM T M GAT GAT TTT M T GTT GTT T M TTG 11184 TTT TTT GTT TTT AM ATC CTG TAT ATA AM TGA AM GTC ACA CTA AGT TTG GGC ATA TTT ATG GTT AGA ATG GCC T U GAG GTC TTC GGT CAT TTA TGC U T TGT TTT TAT GTT TAC CTA 11304 GGC TGG AM TGG TTG G M T M CTT TAC TGA CTG AM TTT GCT ATT M C AGG TGA AGA T M A U C M M G TTA U T GAT TTT TTT TTT AM TGC AM CTG TGC U G TTA TCA CTC M G GTA 11424 ATT TAGAGC ACA T U GTG GGG ATA TAG CATG M CTA GGG ACT GCA ATG TGG M C TTT TTT TCT GCT CTC CAT AGC TCC CTC ATC ATT CGT GAG AM 11544 CAG TGT GTG TCC ATG TGAAMTGC GGA CTG U T GGA GGA CTT GTA TAT GAC M G GAG GGC CTGT M AM TCC M G TGC T M TAC ATT M C TTT ATCM C CAG GGC U C CCT TTT TGA M G AM G M AM CGA AMAM G M GM 11664 MAAM TGA CTT TCA C M CAT T U CTA CAA GGT CTA TAT TIC AGG TCT CTG CAT TCT AM ATG GAG ATA GACCTA GCG TTT GGG G A TCA ATT TTT GTA AM M C AM C M M CAA T AM 11784 AGT AM T U TTT TGT ATA TAT M C CAT TTT TTA ATC TTT T TTA U AGT M T G M TGT TTG TGT ATGM T GCT GCA GCT GTGM G TAC ATAU T AM TGA AGTMG CCA TIC TGA TTT M T 11904 TTA TTG GAT GTT ATT TTA CCC C MATG AGA AGA TTA AM GAG CTC GCCAGA GCA GTA TTA GCC CCT GAG CTC U C TTT TGGM C UCGT T TTT CCT TGG AGA ACA M A TTC CTG 12024 CTA M C AGG ATC ATT TGT TTG CGG TTAGGGGGT TGA GAT CTG CGT GAC TTG M G CGG M T CAG GAT CTT GGC TTT CTAG M AGA AGT GTG TGC ACA M G AGA TTA TAG GTT GTA ATT GGT 12144 Gu GTG TGTGCG TAG GAG TTT TTC TGT TAT GAT TTC TTA T CCC U GTG TGG TGA CAG CTG ACC TGG ATG TGT ACA GTA G M AGT GGT MGTA T GTT TGAM G GAC CTC TCC ACA T M AGG 12264 ACT GTG TCT GAT TAT TGG TTCG M GTT CCC TGC ACA ACT GCTG 12301

-

FIG.3"continued

mology as described above,their sizes are completely different. The PG-M core protein is 123 kDa larger than the versican core protein (10). This discrepancy might depend partly on the difference of animal species from which the proteoglycans were derived. However, a detailed comparison of the amino acid sequences has led us to expect that various sized core proteins are generated by alternative splicing. In Fig. 2, the regionbetween theX1andX2domainsshows very low homology between PG-M and versican, yet their corresponding lengths are very similar. On the other hand, the region between theX1domainandthehyaluronic acid binding domain of the two molecules is completely different in size, and their sequences show no similarity. Thisdifference in size isalmostthesameasthe difference of theirintact core proteins. Therefore, this region might be spliced alternatively, and a versican-like core protein might be generated. T o ex-

T

U T8904 H M T9024

9144 T U GAT GGA ACT ~ A K D AGC ATA TCT ACT9264 S I S T 9384 CCAT U ATA U T S P S I H ACA ATA G M GTT G M TTT TCT TCA GAG AGA GTG AM M T CCC AGT C M G M AGT GAT 9504 T I E V E F S S E R V K N P S a E S D 9624 ATA G M GTT GGA GTA TTTAM CCT GAC CAG GM GCA GTTA U ATG CTA ACT TCA TCT

CCAGGA U G GAT

~

E

8664

GGI

G TCA

Q

T

E R P R L S S A P V S D S P N S I E Y G V F R s D a E I V T I L T S S 9744 TTG GAG CCT TTG GAT AGA AGT TTGG M ACA CAG TCA GCT TTG CTT GGT CCT TTG CCTT M CAG GGT GAG ATC ACAA U ATT TCT TCT M C A T T GCA ACA M C MA TU GCC CCTGGA M C L E P L D R S L E T Q S I L L G P L L G ~ ~ E I T T I S S N I A T N * N T A P G N 9864 M T CCT TAT TCA M T GM CAG TU ACA ATA AGC TCT GAG TTG CTA MT ACT ATT GM CTT GTA ACT TU TU TTT TCC CTT CCG GM GTC ACT MT GGA TU GAT TTC CTG ATT GGC ACA

s

~

7824

R

N

E

TCT CTT TTT A U GM GM AM GAG ATT GTG 8064 A S L F T E E K E I V 8184 GTTGAT GAT AGA AGA U C GTT ACA CTGM T GTT

E T D S V S L N S CCTGAG GTA ATTATT CCA AGTA U TCA TCT'GCA M G GAT T U GAT CAG TCT GAT U T TCC G V S K Y P E V I I P ~ T ~ ACA GAC CTA GAT A U A U GTG TCT TCT CTG CTA GTG TTT GAG TCT CCT CCT G M TCA G M T D L D T T V S S L L V F S P E P E S E ACA GM TCT TCT GTA M T GAT CTG ATT A T T GAG G M M T GCT ACT GTT TCTGGA GAT TCT T E S S V N D L I I E E N ' A T V ~ D

CCT AM TCC TCAGTAACT GTT TGG TTA GTT U T GGA GTA TCTAM TAT P K S S V T V ~ L ~ N TTC AM GAG GTT ACT TCT GAT ATG GCA GCA ACT TAT AM CCA CCCACT F K E V S S D M A A T Y K P P T G M ACT ACT CCT U C TTT M T AM TTT GTA ACG GAG AGG TCA GAG GAG E S T P H F N K F V T E R S E E GAC TAC CCT ACT GCT TGG TTTM T TTT GGA G M AGA ACA AGC ACA GAT GTT CCA AM CTT D Y P T A F Y N F G E R T S T D V P K L AGG AGT ACTGAG AGA GAG AGA CCA AGG CTT T U TCC W CCT GTTT U GAT TCA CCC M C

S

V

D

GCA

GTGcaG ACA

ACT ACT TCC ATT U G C U AGT AGTG M TCA GTA ACA GCT GGA U T GGA CCA AM TTA GTA GAC M G GAT M T CTT TCT TTG D N * L S L T T S I Q P S S E S V T I G H G P K L V ACT CTCA U ACT ACT GTA CTC M T GM TTG GGA ATT TTC CTT C U A U GTA CCC TCA CTG GTA AGC C U U ATG T CCTU T T L T T T V L N E L G I F L P T V P S L V S P H U P H AGG A U AGC ACA ACA GAT GAT GTG TAT GAG CCT TAT ACT T U Gu M C M CAA C GTA ATAA U GAT CM AGC AM A U ATG R T S T T D D V Y E P Y T S A N N Q V I T O Q S K T M S GGT GATA M M G CCT ATG ATT CCT TCC A CTT U CCA GAT TTA ACT GAG ATG ACA G M MG GCT TTG ACA ACT AGAC U TTT L D K K P M I P S L T P D L T Y E T E K A L T T D T GCA ACT GTAT U TCT AGC AGT K T G M GM M G CAT TCT ACA GTA TAC C ATG M ACA AM T U GCA TCA ACAGAG TAT G M

P S 1464 T

A T K U P ATG G M GTT GM CTA ACT

GM GM CGA CTT CM ATC cu TCA GM AM ACA ACT ATT ATAGAC ATG GAT uc TCC MG TU ATG CCT GM GAT ATA ATA ACT I T L I E E R L ~ I P S E K T T I I D M D H S K S M P E D I ATA CGG TCA CM ACAGTC ACT GAT GAT M C ATG AM GCT G M GAG GAC MG TAT GATT U ATA CTTU T TTT TCTA U GTT G M GAG M T T TTT U

D GTC

7104 L 1224 T E 7344

P T P G D V S CTTGCA A U TCA Gu ACA AM ATG C U

I

G CTT M T

6984

L

S V ATG C U

Y

6744

M 6864

1584 T ~ ~ P A L CCTACT CCT GGA GAT GTT TCT 1704

K S L A T S AT UU GCA CCT TCT GTG

CAT ATT CCT GGG GTT TAT TCT GM GTT ATG ACA ACA U T GTA CCGGGA GAT GGA T U CAG ACT GTA ATC A U GGA TTA H I P G V Y S E V M T T H V P G D ~ Q T V I T G L ACTGCT GCT GAT CCAM U G ACA GGT ACA TCG G M GM CTG ACG AGT GAT GGG ACT ATG T U CTG GATATT ATT C U GTG R

~ AGC AGG

N

TAT Y L E M G GAT A U TTG ACTU T

P

T

TCT GTT TAT GGT GAT ATT AU CTA ATT

AGT

3241

GGTG M GCA GM ACT TTA ACTG M TCA TTT A U AM GCA T U GTT T U CCC ACTGGG AM CCA GM C U U G GAG CAG TAT GGC W A AM ACT GTCAGC ATG G E A E T L T B S F T K A S V S P T G K P E P Q E ~ Y G R K T V S GCC TACACT GCTGM CCT M T GAG CTA GTT A U AGT ACT GM U T GAC ATA ACC TCT TTG CAA ACT GTA ACT GAT A U GM ATG G M GAG AM GCA GCT M T A Y T A E P N E L V T S T E H D I T S L Q T V T D T E M E E K A A N J2T-GS:T ACLMT S O - C S T T T S J E L G U A T - G J G E I:TW-WWG Y-TX A W - a A ' 3 - a ATT TTAC U AM GM TCT TCTGGA G M GCA KG GAA F A T N L P L S E D V H S ~ E D R P R E I L P K A I E S E A T E AAGT U U G GCT M C U G M T CAT GTGGM TTT TTA ACT GTT ACT CCC ATC AGA C U U C GM T G M U M C AM GTA G M GCT GM TCT GAT G M A M ATA TTG T ~ A N H E H V E P L S V P T I R P H S E E N K V E A E S D E K I AGA GTT ACT GAG TCT GCA GTA ATTGM AGG AM TAC TTA AGT TCT CCA TTT ACA A U GAT GAG CAG GAG GAG GAG TTG GTACAA M T ATT TTT CCT ACA GM R V T E S A V I E R X Y L S S P F T D T ~ Q E E E L V ~ N I F P CTGACA CCT M U G GAA G M A M C U ACA M C MGAG T CTC ATCAGT GAT C U TTA TTTT U GGA CM GGA T U GGA GAT GM TTT ACT GTT ATT TCCT U GTA F L T P K E E K P T N N E L I S D P L F ~ ~ ~ D E F T V I G M ACC A U M T ACT TTA AGT CCT TGG CCT TTT CAT GCA AGT CCTGTA GGA CCT AM CTT TCC ACT M GAC G ACA C M GTC TTTGM AGT GGA AGC ACT GAC

L

N GAC A

14465

amine this possibility, various cDNA libraries were analyzed by PCR to detect the presenceof alternatively spliced forms of PG-M cDNA. Forthispurpose, we used the following combinations of oligonucleotide primers flanking the putative splice sites, a and b, c and d, or a and d (Fig. 4A). Although the reactions yielded 1.2- and 1.4-kb products from limb bud and CEF cDNA libraries, noprospective 4.1-kb product was amplified from either library (Fig. 4B). However, an unprospective 1.3-kb product was amplified from the CEF libraryby a combination of primers, a and d(Fig.4B, lune 6). The nucleotide sequence of this product indicated the presenceof an alternatively spliced form of PG-M which lacked a 2,781bp sequence (nucleotide position 1598-4378 in Fig. 3) shown as PG-M (Vl) inFig. 4C. As described before, clone Mk contains a unique sequence that is not consistent with PG-M cDNA.Analysis of the

D

~

cDNA Analysis of PG-M

14466 TABLEI

complement regulatory protein-like sequence at thecarboxylterminal end of the PG-M core protein. Although similar sequences were detected in chick cartilage proteoglycan (8), computer-assisted homology analysis (Table I) revealed that the amino-terminal and carboxyl-terminal portionsof PG-M core proteins showed an extremely high homology to the corresponding portions of the embryonic human fibroblast proteoglycan, versican (10). Therefore, it is likely that PG-M Homology ( W ) between is a chick equivalent of human versican. However, the hoDomain Position PG-M in PG-”and Versican mologous region occupies only about one-fifth of the PG-M sequence Nucleotide Amino acid core protein, and the remaining region, located at thecenter, seouence seouence shows little similarity to human versican and to any other HABR-A 250-585 73 68 proteins. HABR-B 586-888 77 82 HABR-B’ 889-1176 71 Immunological analyses of PG-M core protein revealed the 69 x1 4558-4908 61 44 presence of two different-sized core proteins in the extracts x2 8680-8886 54 38 of chick limb buds and chick embryonic fibroblasts. This was EGF-1 9901-10014 72 68 further supported by PCR and Northern blot analyses which EGF-2 10015-10128 78 84 showed that two different mRNA species corresponding to Lectin 10156-10515 85 96 each core protein were presentin the extracts. However, CRP 10516-10698 87 93 Southern blot analysis of chick genomic DNArevealed unique bands in all of the restriction digests, suggesting that the PGnucleotide sequence revealed the presence of the consensus M gene is present as a single copy in thechick genome. Thus, acceptor splice sequence, shown in Fig. 4C,and suggested that it is likely that two different PG-M molecules were generated Mk was derived from premature mRNA. These sequence data show that the position of the acceptor splice site in Mk is by alternative splicing. The two different PG-M core proteins were about 100 kDa consistent with the alternative splice site shown in Fig. 4C. different in size. Therefore, their molecular properties must The presence of different spliced forms of mRNA encoding be very different from each other. The alternatively spliced PG-M core protein was further supported by Northern blot domain is located at thecenter of the core protein. Including analysis. Three mRNA species of 10, 13, and > 20 kb in size were detected in totalRNA preparations from chick limb bud this region, the central part of the core protein contains a and CEF by hybridization with cDNA probe B encoding a number of Ser-Gly and Gly-Ser sequences that are presumed common sequence for PG-M and PG-M ( V l ) as shown in Fig. to be substituted with chondroitin sulfate chains. The pres5, lanes 3 and 4. However, the 10-kb mRNA was not hybrid- ence of chondroitin sulfate chains inthe splicing domain was ized to cDNA probe A encoding the splicing domain (Fig. 5, also suggested by antibody reactivity. The specific antibody to the alternatively spliced domain reacts preferentially with lanes 1 and 2 ) . Characterization of PG-M Core Protein-The absence of a the core protein obtained after chondroitinase ABC digestion 2,781-bp sequence due to alternative splicing introduces an as compared with intact PG-M (data not shown). However, in-frame deletion of 927 amino acids in the central part of almost all of the chondroitin sulfate attachmentsequences in the core protein (Fig. 4C).Therefore, we examined whether the PG-M core protein show little similarity to the putative we could detect two forms of translational productsin cultured consensus glycosaminoglycan attachment sequence, Ser-Glychick embryonic fibroblasts. For this purpose, proteoglycans Xaa-Gly (26). In addition to thepresent results for the amino were purified from the culture medium as described previously acid sequence of the PG-Mcore protein, complete amino acid (9). After chondroitinase ABC digestion, the core proteins sequences have been available for three different types of were analyzed by immunobloting using three different types chick chondroitin/dermatan sulfate proteoglycans. In these of antibodies described under “Materials and Methods.” The proteoglycans, typeIX collagen proteoglycan (28), PG-Lb result indicates that the antibody specific to the alternative proteoglycan (15), andchick aggrecan (8),a Ser-Gly-Xaa-Gly sequence is not common. Therefore, substrate specificity of splicing domain staineda single large core proteinband corresponding to PG-M (Fig. 6, lane 1). However, the anti- xylosyl transferases in avian and mammalian species might bodies to thelectin-like domain and to thewhole PG-M core differ. Immunoprecipitation analysis of 14C-labeled PG-M indiprotein stained two protein bands corresponding to PG-M cated that two PG-M core proteins were produced in embryand PG-M (Vl) (Fig. 6, lanes 2 and 3 ) . Southern Blot Analysis-Southern blot analysis of chick onic chick limb buds (Fig. 1, lune 2 ) . However, onlythe larger genomic DNA was performed to verify whether PG-M and one was predominantly detected after digestion of [35S]~u1PG-M ( V l ) are derived from the same genomic locus. Hybrid- fate-labeled PG-M with chondroitinase ABC (Fig. 1, lane 3 ) . ization was performed using a 536-bp cDNA probe whose This result strongly suggests that the chondroitinsulfate sequence is detected in both PG-Mand PG-M (Vl). As shown chains are present preferentially in the alternatively spliced in Fig. 7, a single band was detected in each of the seven domain. Since PG-M hasbeen shown to inhibit cell adhesion digests, indicating that PG-M and PG-M (VI) are encoded in by virtue of its bound chondroitin sulfate chains (30), the content of chondroitin sulfate chains in a molecule probably the same genomic locus. reflects its biological activity. Therefore, regulation of the DISCUSSION alternative splicing for the PG-M core protein mRNA might The present study showed that the mRNA encoding the be closely related to theregulation of cell-matrix interactions. The presence of a link protein-like sequence is consistent PG-M core protein containedan open reading frame of 10,686 nucleotides for a protein of M , 388,062. The deduced amino with the hyaluronic acid binding activity of purified PG-M, acid sequence of the PG-Mcore protein showed the presence as reported previously (6). Hyaluronic acid is an another major of a link protein-like sequence at theamino-terminal end and extracellular matrix molecule in embryonic limb buds and is two epidermal growth factor-like, a C-type lectin-like, and a thought to be involved in prechondrogenic mesenchymal conNucleotide and amino acid sequence similarities between each domain in chick PG-M and human versican The degree of homology of each pair is expressed as the percentage of identical residues using the DNASIS program (Hitachi Co., Japan). The name and characteristics of each domain are described under “Materials and Methods.” HABR, hyaluronic acid binding, EGF, epidermal growthfactor; CRP, complement regulatory protein-like.

cDNA Analysis of PG-M

14467

4.1 kb 1

I

1.2 kb

1.4 kb

m 4 b

a+

4

HABR

7

I

1 1

HxJ i

SPLICE

2

3

4

5

4 d

C+

6

7

8

9

1

"

0 4

7.74 kb

4

4.2s

4

2.69

41.88 + 1.49 4

...aaaaqctattttttttttqa

E

E

Q

'

0.93

TCATCCGGTGAATGAG... H

DE . L H S

P

V

P

E

L

S

E

E

Q

H

P

V

N

E

PG-M

V

N

N

E

E

Mk

PG-M

PG-M (Vl)

FIG.4. Generation of PG-M core protein isoforms by alternative splicing. Panel A, positions of four different primers used for PCR analysis are indicated by arrowheads as a, b, c, and d. Panel B, the following three different cDNA libraries, 12-day chick embryonic cartilage cDNA library (lanes I, 4, and 7). stage22-23 chick limb bud cDNA library(lanes 2 , 5 , and 8).and 10-day CEF cDNA library(lanes 3,6, and 9) were analyzed by PCR using different combinationsof primers as follows; primers a and b (lanes 1-3). primers a and d (lanes 46 ) , and primers c and d (lanes 7-9). The PCR products were then sized by agarose gel electrophoresis. DNA size markers are shown in lane IO, and their sizes are indicated on the right. Panel C , the 1.3-kb PCR product from the CEF cDNA library(lane 6 ) was sequenced, and a part of the sequence is presentedas PG-M (Vl),together with the corresponding partof PG-M cDNA sequence and the cloneMk sequence of an intron/exon boundary. The consensussplice acceptor sequence is indicated by bold letters, and the intron sequence is shownby small letters. densation (29).Therefore, binding of PG-M to hyaluronic acid mightalso have great influence on the regulation of extracellular matrix organization duringchondrogenesis. The carboxyl-terminal portion of the PG-M core protein contains epidermal growth factor-like, lectin-like, and complement regulatory protein-like domains. Of these domain structures, the lectin-like domain shows the highesthomology to the corresponding region of human versican. In thisregion, 119 of 124 amino acids are identical. Theoccurrence of domainswithsuchevolutionarily conserved primarysequences suggests that the carboxyl-terminalregion of PG-M might have some important biological functions. Forexample,

the corresponding domain structurein cartilage proteoglycan has actually been shown to have significant binding activity to galactose and fucose residues (31).In addition, the same subset of three domain elements describedabove has also been found in various binding elementsof cell adhesion molecules (32). Therefore, PG-M might have opposing activities, inhibition of cell adhesion by chondroitin sulfate chains in one case but mediation of cell adhesion by the carboxylterminal portion of the core protein in the other case. Thus, it is likely that PG-M may function in tissue differentiation and morphogenesis by regulating cell-substrate interactions both negatively and positively.

cDNA Analysis Of PG-M

14468

Origin

9.5 kb

-

2

3

4

-

4.4 kb

--c

2.4 kb

--c

1.4 kb

1

--

>20 kb

i

13 kb 10 kb

---

U

U

probe A

probe B

FIG.5. Northern blot analysis of the mRNA encoding PGM core protein.Total RNA from stage 22-23 chick limb buds (lanes 2 and 4 ) and 10-day chick embryo fibroblasts (lanes 1 and 3 ) was separated on a denaturing formaldehyde agarose gel. After transferring to a Hybond N+ membrane, the bound RNA was hybridized with probe A (lanes 1 and 2 ) or probe B (lanes 3 and 4 ) . The locations of each probe are shown at the bottom. Sizes of the RNA species used for calibration are indicated in the left in kilobases.

1

2

3

mmpOrigin +

200kDa

116 kDa

+

-

97 kDa

+

BPB

t

RG. 6. Immunoblotanalysisof PG-M core proteins.Partially purified PG-M from the culture medium of 10-day chick embryo fibroblasts was digested with chondroitinase ABC in thepresence of protease inhibitors. The obtained core proteins were electrophoresed on a 5% SDS-polyacrylamide gel and transferred to a nitrocellulose membrane. PG-M core proteins were stained with anti-splicing domain antibody (lane l ) , anti-lectin-like domain antibody (lane 2), and anti-PG-Mantibody (lane 3 ) . Protein molecular mass standards are indicated on the left, myosin (200 kDa), &galactosidase (116 kDa), phosphorylase b (97 kDa), and bromphenol blue (BPB).

6.37 kb 3.68 k h 1.93 kb 1.26 kb

0.70kb

FIG.7. Southern blot analysis of chick genomic DNA encoding PG-M core protein.Chick genomic DNAs were digested with seven different restriction enzymes indicated at thetop and thenwere separated an a 0.7% agarose gel. After transferring to a Hybond N' membrane, the bound DNA was hybridized to a 536-bp cDNA fragment whose sequence is detected in both PG-M and PG-M (VU cDNAs. Sizes of the DNA species used for calibration are indicated on the right in kilobases. Acknowledgments-We are grateful to Y. Hori (Seikagaku Corporation) for peptide synthesis, Drs. K. M. Yamada and M. Obara for the 10-day chick embryo fibroblast cDNA library, and Dr. M. Tanzer for a critical reading of this manuscript. REFERENCES 1. Thorogood, P. V., and Hinchliffe, J. R. (1975) J. EmbryoL Exp. MorphoL 33,581-606 2. Shinomura, T., and Kimata,K. (1990)Deup. Growth & Differ.32,243-248 3. Solursh, M. (1991) in Developmental Patterning of the Vertebrate Limb (Hinchliffe, J. R., Hurle, J. M., and Summerbell, D., eds) pp. 171-176, Plenum Press, New York 4. Swalla, B.J., and Solursh, M.(1984)Differentiution26.42-48 5. San Antonio, J. D., Winston, B. M., and Tuan,R. S. (1987)Dev. BioL 123, 17-24 6. Kimata K., Oike Y., Tani K., Shinomura, T., Yamagata, M., Uritani, M., and iuzuki, S.\1986)J.biol. Chem. 261,13517-13525 7. Shinomura, T., Jensen, K. L., Yamagata, M., Kimata, K., and Solunrh, M. (1990)Amt. Embty~L181,227-233 8. Chandrasekaran, L.,and Tanzer, M. L. (1992)Biochem. J. 288,903-910 9. Yamagata, M., Yamada, K. M., Yoneda, M., Suzuki, S., and Kimata, K. (1986)J. BioL Chem. 261,13526-13535 10. Zimmermann, D. R., and Ruoslahti, E. (1989)EMBO J. 8,2975-2981 11. Shinomura, T., Kimata, K., Oike, Y., Noro, A., Hirose, N., Tanabe, K., and Suzuki, S. (1983)J. Biol. Chem. 268,9314-9322 12. Hamburger, V., and Hamilton, H. L. (1951)J. MorphoL 88.49-92 13. Kimura, J. H., Shinomura, T., and Thonar, E. J.-M. A. (1987)Methods EnzymoL 144,372-393 14. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J., and Rutter, W.J. (1979)Biochemistry 18,5294-5299 15. Shinomura, T., and Kmata, K. (1992) J. BioL Chem 267,1265-1270 16. Young, R. A.. and Davis, R. W.(1983)Proc. Natl. Acad. Sci. U.S.A. 80, 1192-1198. 17. Feinberg, A. P., and Vogelstein, B. (1983)A d Biochem. 132,613 18. Sanger, F., Nicklen, S., and Coulson, A. R. (1977)Proc. NatL Acd. Sci U. S. A. 74,5463-5467 19. Thomas, P.S. (1980)Proc. NatL Acad. Sci. U.S.A. 77.5201-5205 20. Oike, Y., Kimata, K., Shinomura, T., Nakazawa, K., and Suzuki, S. (1980) Biochem. J. 191,193-207 21. Towbin. H..Staehelin. T.. and Gordon.. J. (1979) Proc. NutL Acd. Sci . U. S.A. 96,4350-4354 ' 22. Obara, M., Kang, M. S., and Yamada, K. M. (1988) CeU 63,649-657 23. Proudfoot, N. J., and Brownlee, G. G. (1976)Nature 263,211-214 24. Kozak, M. (1987) Nucleic Acids Res. 16,8125-8148 25. Von Heijne, G . (1984)J. Mol. Biol. 173,243-251 26. Bourdon, M. A., Krusius, T., Campbell, S, Schwartz, N. B., and Ruoslahti, E.(1987)Proc. Natl. Acad. Sci. U.S. A. 84.3194-3198 27. Bause, E. (1983)Biochem. J. 209,331-336

cDNA Analysis of PC-M

14469

28. Huber, S., Winterhalter, K. H., andVaughan, L. (1988) J. Biol. Chem. 30. Yamagata, M., Suzuki, S., Akiyama, S. K., Yamada, K. M., andKimah, K. (1989) J. Biol. Chem. 264,8012-8018 263,752-756 29. Tool, B.,Banerjee, S., Turner, R., Munaim, S., and Knudson, C . (1991) in 31. Halberg, D. F., Proulx, G., Doege, K., Yamada, Y., and Drickamer, K. (1988) J. Bml. Chern. 263,9486-9490 Developmental Patterning of the Vertebrate Limb (Hinchliffe, J. R., Hurle, J. M., and Summerbell, D.,eds) pp. 215-223, Plenum Press, New 32. Rosen, S. D.,Imai, Y., Singer, M. S., and Huang, K. (1992) Trends Glycosci. 4, 1-13 York