Comparative Genomics and Proteomics of Drosophila ... - Springer Link

4 downloads 0 Views 152KB Size Report
D. melanogaster the virtual products of genes from the ... The virtual protein product of only one of ..... norhabditis elegans: Pachytene Karyotype Analysis of.
Russian Journal of Genetics, Vol. 38, No. 8, 2002, pp. 908–917. Translated from Genetika, Vol. 38, No. 8, 2002, pp. 1078–1089. Original Russian Text Copyright © 2002 by Bogdanov, Dadashev, Grishaeva.

Comparative Genomics and Proteomics of Drosophila, Brenner’s Nematode, and Arabidopsis: Identification of Functionally Similar Genes and Proteins of Meiotic Chromosome Synapsis Yu. F. Bogdanov, S. Ya. Dadashev, and T. M. Grishaeva Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991 Russia; fax: (095)132-89-62; e-mail: [email protected] Received February 6, 2002

Abstract—The published principles of computer analysis of genomes and protein sets in taxonomically distant eukaryotes are expounded. The authors developed a search strategy to identify in genomes of such organisms genes and proteins nonhomologous in primary structure but having similar functions in cells dividing by meiosis. This strategy based on the combined principles of genomics, proteomics, and morphometric analysis of subcellular structures was applied to a computer search for genes encoding the proteins of synaptonemal complexes in genomes of Drosophila melanogaster, the nematode Caenorhabditis elegans, and the plant Arabidopsis thaliana. These proteins proved to be functionally similar to their counterparts in yeast Saccharomyces cerevisiae (protein Zip1p) and mammals (protein SCP1).

By the end of 2001, the genomes of four model eukaryotic organisms as well as the human genome were completely sequenced. The first eukaryotic organism whose genome was completely sequenced was unicellular budding yeast Saccharomyces cerevisiae [1]. It was followed by the nematode Caenorhabditis elegans [2], the fly Drosophila melanogaster [3, 4], and the “botanic Drosophila,” the cruciferous plant Arabidopsis thaliana [5]. These advances provided abundant evidence for comparative analysis of genomes and protein sets (proteomes). Specialized fields of knowledge developed that had all characteristics of an established science, i.e., specific problems and methods of investigation. Specific branches of these sciences, comparative genomics and comparative proteomics, appeared [6, 7]. The latter areas of research are closely related because information on the complete DNA structure of genes serves as basis for the computer-aided prediction of the protein primary structure. The reverse task, gene identification on the basis of the protein primary structure, can be also solved. Special computer programs such as BLASTP, etc. [6, 7] were developed for these purposes. Comparison of the genomes and proteomes in the organisms listed above is obviously important in the context of the general theory of evolution. SOME GENERAL PRINCIPLES OF COMPARATIVE GENOMICS AND PROTEOMICS At the early stage of development of comparative genomics and comparative proteomics, researchers

have reasonably set a simple task: to identify similar genes and proteins in organisms of different evolutionary complexity. The concepts of orthology and paralogy for genes and proteins have been formulated. Orthologs are genes and proteins of different species that fulfill a similar function and are closely homologous in primary structure, which theoretically indicates their common ancestry. In comparative analysis, the orthologs are usually grouped into clusters and families of genes or proteins. These clusters form branching phylogenetic trees. According to the evolution theory, orthologs are conserved genes and proteins. In chromosomes, orthologous genes from different gene families exhibit synteny, i.e., linear linkage with other genes assigned to other families of the conserved gene, which is a phenomenon of apparent physiological significance. These syntenic groups are conserved within the large taxons up to the taxonomic type. In other word, the genome organization into blocks is observed, which may be an important factor of evolution. In contrast to orthologs, the paralogs are nucleotide and amino acid sequences differing in function, which probably diverged from a common ancestor through duplications and other rearrangements. Comparative analysis of the completely sequenced genomes and proteomes of the first three eukaryotic organisms—yeast, nematode, and Drosophila—raised a question of how many proteins are shared by these three species? Analysis of the proteins predicted by using computer software showed that about 35% of Drosophila genes have putative orthologs in the nematode genome [7]. The search was based on the homol-

1022-7954/02/3808-0908$27.00 © 2002 MAIK “Nauka /Interperiodica”

COMPARATIVE GENOMICS AND PROTEOMICS

909

Table 1. Conserved amino acid sequences in proteins of yeast and nematode Groups of sequences

Open reading frames (ORFs)

Probability of random homology, P

total

one member per organism

more than one member per organism

yeast (n = 6217)

nematode (n = 19099)

1 × 10–100 1 × 10–50 1 × 10–10

236 552 1171

157 322 611

79 230 560

330 (5.3%) 888 (14.3%) 2497 (40%)

370 (1.9%) 1094 (5.7%) 3653 (19%)

Note: The results were obtained by reciprocal comparison of each yeast ORF with all ORFs of nematodes and vise versa using the BLASTP procedure. The numbers of coincidences were combined at the given probability P according to [6].

ogy of the primary structure on the condition that the primary structure of the entire protein molecule was at least 80% homologous with that of the entire predicted protein of the other organism. In that case, the proteins were considered orthologs. The estimate of 35% is minimal because some known orthologs were omitted because of molecule size requirements. In particular, these orthologs are homeodomain proteins, which are little similar beyond the bounds of homeodomains. The number of orthologous protein pairs did not appreciably decrease upon toughening the demands of similarity. This suggests that researchers actually identified orthologous proteins with a common function. About 23% of Drosophila proteins have the annotated orthologs among proteins of the two other organisms, yeast and nematode [7]. The functions of the common proteins are most likely the same in cells of all eukaryotes. Simultaneous protein comparison in the three organisms provides a complex pattern, which is difficult to analyze without the simpler previous experience. Therefore, in what follows, we briefly review earlier data on comparative analysis of genomes of the two organisms, unicellular yeast and the simply organized multicellular organism, nematode C. elegans. The body of this worm consists of as little as 950 cells, of which approximately 700 are somatic and the remaining ones belong to the reproductive system. The proteins of S. cerevisiae and C. elegans were totally compared by a joint team of researchers from Stanford and Boston Universities and from National Biotechnological Center in Bethesda (United States). In total, 6217 and 19 099 open reading frames (ORFs) have been studied in yeast and nematode, respectively; information is available in Saccharomyces Database at http://genome-www.stanford.edu/Saccharomyces/help/ worm/W-Y comparison.html. This study produced unexpected and interesting results. In the organisms examined, 57% of ortholog proteins were unique proteins: one protein in each organism [7]. In addition, a set of highly conserved proteins in each of these phylogenetically distant organisms was encoded by a few ORFs: 40 and 20% in yeast and nematode, respectively (Table 1). These proteins comprise the first comparison group. They are involved into core cellular processes RUSSIAN JOURNAL OF GENETICS

Vol. 38

No. 8

similar in these organisms: intermediate metabolism, DNA and RNA metabolism, protein folding, intracellular motion of the molecules and their degradation [6]. Analysis of the orthologous protein groups provided another result, which was more expected by the authors [6]. The regulating proteins and signaling pathways proved to be far more complex in the multicellular than in unicellular organisms. A comparison of the protein clusters in yeast and nematode showed that although there were common proteins involved in regulation and signal transduction, their number was increased in the multicellular organisms, because of a gradual change of domains resulting from their shuffling [7]. That observation is easily explained by the fact that multicellular organization requires cell specialization, export and import of cellular proteins, and development of intercellular communications. By comparison of different proteins in the yeast and nematode, the authors predicted some trends in evolution from uni- to multicellular organisms, such as (1) evolution of novel regulatory and signaling domains, (2) evolution in the mode of protein assembly from the former protein domains, and (3) expansion of some families of protein domains through a series of duplications. Another type of protein comparison groups in the yeast and nematode comprised 560 protein groups (Table 1) with more than two proteins per group, i.e., with at least one ortholog in any of the organisms examined (not necessarily in the nematode, which in general is characterized by a larger number of proteins). These were such proteins as RNA-polymerase and C-subunits of the DNA replication factor. The latter form clusters containing 12 proteins. However, on the whole, at the significance level of 1 × 10–50 that accounts for 80% homology in the primary structure, the number of proteins involved in DNA and RNA metabolism is only 10% higher in the nematode than in the yeast, which, in view of the universality of major molecular biological phenomena (genetic code, replication, etc.), was an expected result. In S. cerevisiae and C. elegans, the orthologs participating in DNA and RNA metabolism comprise 18% of all proteins, which are at least 80% homologous (Table 2). In the yeast and nematode, the number of proteins with similar function depends much on the type of this 2002

910

BOGDANOV et al.

Table 2. Percentage of families of the conserved proteins (orthologs) with at least 80% homology of the primary structure in yeast S. cerevisiae and nematode C. elegans at the threshold of similarity level, P = 1 × 10–50 (based on data by Chervitz et al. [6])

yeast) can be applied to another organism, in which the orthologs were detected from the primary structure. Comparative analysis of the proteins of core metabolism in yeast and nematode provided results that agree with these views.

Ratio of the Percent protein number of all conin nematode served proteins to that in yeast

Along with identification of orthologs from homology of the primary structure, structural orthologs can be also identified as the proteins with common domains and similar domain organization. In contrast to the first approach based on evolutionary considerations, the latter is a structural–functional approach based on the reasoning that the functions of the structurally similar proteins are the same in different organisms. The proteins assembled from several domains often contain at least two readily identifiable domains, which are similar or homologous in different organisms. Thus, a domain of the eukaryotic protein kinase (access no. IPR 000719) is shared by 199 and 388 proteins of S. cerevisiae and C. elegans, respectively; the zinc fingers domain type C2H2 (access no. IPR 001304) is encountered 47 and 138 times in different proteins of these organisms, respectively. Comparative computer analysis of these proteins successfully employed such databases as InterPro (http://www.ebi.ac.uk/interpro) and NCBI CDART (http://www.ncbi.nlm.nih.gov/Structure/lexington/ lexington.cgi? cmd=rps). Analysis by InterPro supplemented by “manual” analysis revealed 1400 protein families and individual domains, of which 984, 1133, and 1177 ones were identified in yeast, nematode, and Drosophila, respectively. Note that 744 families or domains were shared by these three organisms [7]. Hundreds of proteins in yeast, nematode, and Drosophila have similar domain organization and participate in cytoskeleton formation, regulation of cell cycle, cell adhesion, and other intra- and intercellular structures and processes.

Function Intermediate metabolism DNA and RNA metabolism Protein folding and degradation Transport and secretion Signal transduction Unidentified proteins Ribosome proteins Cytoskeleton

28 18 13

1.0 1.1 1.2

11 11 8 6 5

1.5 1.2 1.1 0.6 2.1

function. As for structural cell proteins (e.g., those of cytoskeleton) 2.1 orthologous proteins in the nematode cells on average fall to each protein in a yeast cell. Note that we mean the number of protein denominations rather than their bulk. In total, the orthologous cytoskeleton proteins comprise only 5% of all proteins which are 80% homologous in the two organisms, whereas the proteins with unknown function comprise 8% of all proteins at the same level of homology (Table 2). The overall physical mass of cytoskeleton proteins is greater than that of the proteins regulating DNA and RNA metabolism. However, this topic is beyond the scope of proteomics but rather a subject of cell biology. The above quantitative and qualitative analysis at the level of the entire genome comparison confirmed the evidence obtained previously by other methods, i.e., the fact that approximately the same number of orthologous proteins is involved in the core molecular genetic processes in phylogenetically distant organisms. In addition, it was found that the threefold difference in the total amount of proteins with similar functions revealed in yeast and nematode cannot be attributed to the variability of the protein clusters, although substantial variability has actually been detected. The point is that both yeast and nematode have species-specific proteins. In particular, sets of such proteins associated with intracellular signal transduction (humoral, ionic) were found [7]. Such conclusions are not new: they were drawn with respect to individual cell functions long before the advent of the age of genomics. Thus, analysis of the genomes showed that the comparable number of proteins similar in structure mediate major biochemical processes in the unicellular S. cerevisiae and multicellular C. elegans. Hence, the conclusions inferred from protein analysis in some organism (the first organism subjected to such an analysis was

What is the practical value of this study? The most impressive especially for the government and sciencefinancing organizations, was the fact that among 289 genes for human diseases, which had been already studied and available for rapid analysis, 177 and 150 genes had orthologs in D. melanogaster and C. elegans, respectively [7]. Thus, many problems, such as molecular biology of these genes, their phenotypic expression, and interaction with other genes can be studied with the more accessible and inexpensive models—D. melanogaster, nematode, and even yeast [8]. The first example of this kind was the experiments conducted as early as in 1985, when the mammalian signaling protein RAS was substituted by an orthologous protein in RAS-deficient yeast strains [7]. As far back as in 1926, Timofeeff-Ressovsky and Vogt substantiated the applicability of genetic and especially phenogenetic results obtained on Drosophila for recognizing genetic mechanisms of human diseases [9]. This idea was repeatedly discussed by Muller, a classic of genetic studies on Drosophila [10]. The results of comparative analyses of genomes and pro-

RUSSIAN JOURNAL OF GENETICS

Vol. 38

No. 8

2002

COMPARATIVE GENOMICS AND PROTEOMICS

911

teomes in the model organisms, which were published in 2000, initiated the overall cataloging of evidence allowing comparative genomics to be used in solving the medical and genetic problems [6, 7].

TF

RN

FUNCTIONALLY SIMILAR BUT NONHOMOLOGOUS GENES AND PROTEINS IN PHYLOGENETICALLY DISTANT EUKARYOTES At the present stage of the development of biological sciences connected with the genome and the cell, a new problem can be formulated, which is at the boundary of genomics and proteomics, on the one hand, and cell biology, on the other. The point is that in dividing cells of phylogenetically distant organisms, there are intracellular structures with similar functions though with partially or completely different ultrastructure. These are kinetochores of chromosomes, cell centers (animal centrioles, spindle polar bodies in fungi, and amorphous cell centers of plants), and the synaptonemal complexes playing the key role in synapsis and genetic recombination of homologous chromosomes during meiosis. All these structures account for the chromosome behavior during meiosis, their segregation to the poles of a dividing cell (kinetochores and cell centers) or formation and fixation of pairs of homologous chromosomes (synaptonemal complexes). In different kingdoms of eukaryotes, these organelles differ in ultrastructure suggesting that in phylogenetically distant taxons they might be assembled from different structural proteins. Among the latter, the orthologous proteins may be either absent, which would be a clearly obvious phenomenon, or present, which would be an additional evidence of the organic world unity. We had faced this problem in 2000, when analyzing and comparing specific genes of meiosis in D. melanogaster, S. cerevisiae, and other organisms [11, 12]. We were interested in genes encoding the proteins of the synaptonemal complex (SC). The formation of the SC superstructure is observed only during meiotic prophase I in most eukaryotes to play an important role in meiosis [13]. Mutations of genes responsible for the SC formation lead to a partial or complete inability of chromosomes to form pairs of homologs prior to or during crossing over as well as to such phenomena as reduced frequency or complete elimination of crossing over, achiasmaty, and disturbed segregation of homologous chromosomes during meiosis I. As a consequence, either aneuploidy or sterility of germinal cells is observed. The scheme of the SC formation is the same in all eukaryotes [13]. The meiosis-specific SC consists of the two protein axes along each homologous chromosome (lateral SC elements), which are positioned in parallel and connected by numerous transverse protein filaments (Fig. 1). The space between the lateral eleRUSSIAN JOURNAL OF GENETICS

Vol. 38

No. 8

CE

LE

Chr

CS

Fig. 1. Scheme of the synaptonemal complex. LE, lateral elements; CS, central space; CE, central element; TF, transverse filaments involved into CS formation; Chr, chromatin loops; and RN, recombination nodules.

ments is referred to as the central space. The chromatin fibrils are bound to the lateral elements. Thus, the SC serves as a frame temporarily holding homologous chromosomes in a strict order so that the homologous loci occurred one opposite of the other. The central space of the SC is occupied by recombination nodules, which are conglomerates of enzymes necessary for DNA recombination [13]. Ascomycetes, nematodes, insects, higher plants, mammals, and organisms of other taxa share the general structure of the SC but its details vary among these taxa. The structural SC proteins (those of the lateral elements and transverse filaments) are studied only in the four mammalian species [14–17] and in yeast S. cerevisiae [18, 19] (see [20] and [21] for review). In yeast and mammals, these functionally similar proteins showed no homology in their primary structure. This raises a question of how the functionally similar cells, namely the cells entering prophase of meiosis, create functionally similar subcellular structures from different proteins in phylogenetically remote organisms? What are the traits of the SCforming proteins which are decisive for the SC assembly according to the same scheme in different types and even kingdoms of eukaryotes? We proposed that the proteins of the transverse filaments in the central SC space are the simplest for such comparative analysis. In yeast and mammals the single protein containing about 900 amino-acid residues constitutes transverse filaments. This protein consists of the three domains: the central one with an extended coiled coil and two terminal globular domains incapable of forming the coiled coil [18]. Combined immunocytochemical analysis and electron microscopy showed that two pairs of molecules of these proteins form the transverse filaments of the SC [22] (Fig. 2). In each pair, the molecules lie in parallel and are similarly oriented in the central space of the SC; 2002

912

BOGDANOV et al.

LE

CE

20 nm

P < 0.001) was found between the width of the central SC space and length of the entire SCP1 and Zip1p protein molecules in mammals and yeast, respectively (Fig. 3). The correlation between the length of the central coiled-coil protein domains and the width of the SC central space was even higher (r = 0.90; P < 0.001). Such a high and significant coefficient of correlation suggests a functionally important dependence, i.e., the width of SC central space directly depends on the length of the central (linear in shape) domain of the proteins SCP1 and Zip1p constituting the transverse filaments [23]. These findings allowed us to undertake a computer search for D. melanogaster protein similar to the yeast Zip1p and mammalian SCP1 proteins. We proposed that the putative protein of D. melanogaster could also be assembled from the three domains and its central domain contained a coiled coil of the length equal to a half of the SC central space in D. melanogaster. Naturally, we primarily analyzed those proteins which are specifically expressed in meiosis.

LE

Fig. 2. Scheme of the central space of the synaptonemal complex (SC). Designations: dotted line, borders of chromatid axes within SC lateral elements; open rectangles, the terminal (globular) domains of protein constituting the transverse filaments; gray rectangles, central (coiled-coil) domains of the transverse-filament protein; small black rectangles, hypothesized protein–protein contacts between the terminal domain. Other symbols as in Fig. 1.

CS width, nm 200 180 160 140 120 100 80 60 40 20 400 600

800 1000 1200 1400 The protein molecule size, a.a.

Fig. 3. Dependence of the width of the SC central space on the size of the protein molecule involved in the formation of this space. CS, central space of SC; a.a., amount of amino acid residues in the protein. Black circles indicate rat, mouse, and human SCP1 proteins and various Zip1p proteins from yeast S. cerevisiae, (both wild-type protein and mutants with internal deletions and duplications of different length). Slanting straight line is the regression line for the above traits. Dotted line indicates 95% confidence interval of the regression line.

they are perpendicular to the lateral elements and form a tandem. The protein molecules are head-to-head (N-terminus-to-N-terminus) oriented. Their C-termini can bind DNA and are embedded into the lateral elements of the SC. The width of the central SC space depends on the size of the central rod-shaped domain. Deletions in the central domain revealed in the allelic mutants for the ZIP1 yeast gene were found to diminish the width of the SC central space to bring the lateral elements together [18, 19]. Based on the literature data we performed correlation analysis for two traits: the width of the SC central space and the size of the protein molecules forming the transverse SC filaments. A high correlation (r = 0.85;

Computer Identification of the SC Gene and Protein in D. melanogaster The computer-aided search for orthologs of the known proteins is based on a comparison of their primary structure even if the data on the secondary structure (domain organization) are used. Thus, the computer procedure SGD Worm–Yeast Protein Comparison (http://genome-www.stanford.edu/Saccharomyces/ worm/) is based on a comparison of amino acid sequences with known consensus sequences of the protein families and protein domains. However, in this way, both structural and functional analogs are unidentifiable in the phylogenetically distant organisms, because the role of these proteins in the formation of subcellular structures in the meiotic cells is not taken into account. Therefore, we employed another approach, when the results obtained in silico (from computer databases) are used in combination with those of the morphological and genetic experiments. Such analysis is still poorly automated and based on the “manual” comparison of the secondary structures (for example, the length of the coiled-coil region) in the proteins of interest. Among more than 80 genes controlling meiosis in D. melanogaster [12], only one, c(3)G (crossover suppressor on 3 of Gowen) located in the 89A2–5 region of the cytological chromosome map [24] is undoubtedly involved specifically in the SC formation [25]. However, neither the mechanism of its effect nor the relevant gene product were identified until 2001 [26]. One of the old hypothesis suggested that c(3)G may encode one of the SC components [25]. We aimed to analyze in D. melanogaster the virtual products of genes from the 3R chromosome region, which excessively overlaps the c(3)G locus. For this purpose, we studied all genes localized within the 88E–89B region of the Bridges

RUSSIAN JOURNAL OF GENETICS

Vol. 38

No. 8

2002

COMPARATIVE GENOMICS AND PROTEOMICS

cytological map. In particular, we decided to compare the domain organization and secondary structure of the predicted Drosophila proteins and the known SC proteins in yeast and mammals. Seventy-eight annotated genes from region 88E6–89B2 containing about 250 kb (according to the NCBI database) were analyzed in silico. The protein products of these genes had not been previously identified. The virtual protein product of only one of these genes proved to be similar to the SCP1 and Zip1p proteins in size and domain organization. That was the product of the CG17604 Drosophila gene according to the NCBI nomenclature. The gene product consists of the three domains: the two terminal and one central domain with rod-shaped and coiled-coil rod-shaped structure. The CG17604 gene is located on chromosome 3R at position 36250–36253 kb according to the NCBI molecular map. The virtual product of this gene contains 744 amino acids, and its central domain (495 amino acids according to the verified data) can form the coiled-coil structure similar to that of the central domains in Zip1p and SCP1 proteins. The width of the SC central space in D. melanogaster is 109 nm [27]. When these values were plotted (Fig. 4), the point obtained occurred within the 95% confidence interval of the regression line for the following two traits: the size of the coiled-coil region of the protein molecules and the width of the SC central space in mammals and yeast. In D. melanogaster, the coiled-coil region of the protein encoded by the CG17604 gene proved to be equal to a half-width of the SC central space. The same was revealed upon comparison of the SC structure in yeast and mammals. As judged from the primary structure [26], the predicted CG17604-gene product is homologous to various proteins, many of which can form coiled coils. In particular, it is homologous to the NUF1 protein of the spindle polar body (S. cerevisiae), to the myosin heavy chain (D. melanogaster, C. elegans, A. thaliana, Homo sapiens, Dugesia japonica), and to the laminar proteins (Mus musculus). The known SC proteins (yeast Zip1p [19], rat and mouse SCP1 [14, 16]) were also homologous to the above proteins. Interestingly, the amino acid sequence of the CG17604 gene product was homologous to neither SCP1 nor Zip1p, as well as no homology was found between the Zip1p and the proteins of SC filaments in mammals [19]. Thus, although the nucleotide sequence of the deduced gene CG17604 of D. melanogaster was not homologous to that of the ZIP1 and SCP1 genes, the three genes were similar in that they were partially homologous to the genes encoding the above large family of the structural proteins. In addition, we have analyzed the protein products of D. melanogaster genes from nine regions carrying other meiotic mutations: mei-9, mei-41, mei-217, mei218, mei-P14, mei-P22, mei-P26, mei-w68, mei-S282. These protein products were annotated by the Celera Genomics company. None out of 200 annotated gene products of the above regions could form a coiled coil RUSSIAN JOURNAL OF GENETICS

Vol. 38

No. 8

913

CS width, nm 200 180 160 140 120 100 80 60 40 20 100 200 300 400 500 600 700 800 900 Coiled-coil size, a.a. Fig. 4. Dependence of the width of SC central space on the size of the coiled-coil part of the protein molecule involved in the formation of SC transverse filaments. Designations as in Fig. 3. Coordinates of the open circle, open oval, and rectangle correspond to the widths of SC central space and lengths of the coiled coil of the CG17604 protein of D. melanogaster, 781586 protein of C. elegans, and AAD10695 protein of A. thaliana, respectively; coordinates of the hatched oval corresponds to the width of SC central space and half-length of the coiled coil of the Q11102 protein of C. elegans.

with length necessary to ensure formation of transverse SC filaments in Drosophila. The evidence obtained were additionally verified by estimating the isoelectric points (pI) of the known SC proteins from yeast and mammals, and of the CG17604 gene product. Using the ProtParam computer program, both entire molecules and individual domains were analyzed (Table 3). Except for pI of the N-terminal domain, all parameters of the CG17604 protein were similar to those of the proteins Zip1p and SCP1/SYCP1. The C-terminal domains of the above proteins having a total positive charge (basic, i.e., alkaline character) are known to be embedded into the lateral SC elements and, therefore, they are in contact with DNA; whereas the N-terminal domains with a total negative charge (acidic character) are directed into the SC central space, where they overlap and form the SC central element [16, 19]. In Drosophila CG17604 protein, the positive charge of N-terminal domain probably accounts for the specific morphology of SC central element with its clearly defined striated ultrastructure and a relatively large width (32 nm) [27]. Thus, we have detected a product of the CG17604 gene, which is similar in most parameters to the known proteins of SC transverse filaments of other organisms. According to the FlyBase data, the CG17604 gene is located in the 89A7-8 region. At the same time, the c(3)G gene, which is proposed to be identical to the CG17604 gene, was localized to the 89A2 region (Fig. 5). This discrepancy is probably caused by objective diffi2002

914

BOGDANOV et al.

Table 3. Parameters of the molecules of proteins identified experimentally [14–19] and predicted (this study), which are involved in formation of SC transverse filaments in organisms with completely sequenced genome Size of protein molecules and domains pI of proteins and domains Biological species, their entire length width N-terminal central C-terminal entire normal and mutant proteins length of the of the coiledof SC CS domain domain domain molecule molecule coil region M. musculus SCP1 H. sapiens SCP1 R. norvegicus SCP1 S. cerevisiae Zip1 Zip1-m2* Zip1-mc1* Zip1-mc2* Zip1-n1* Zip1-nm1* Zip1-2XH2** Zip1-3XH2** D. melanogaster CG17604 A. thaliana AAD10695 C. elegans Q11102 C. elegans T26844 C. elegans T27907 C. elegans Z81586 C. elegans WP:CE17456

993 973 946 875 583 484 776 732 767 1012 1280 744 921 1132 1083 772 484 213

713 677 717 632 285 170 634 578 512 799 1067 495 476 938 536 460 460 50

100 100 100 115 63 49 101 118 118 153 189 109 100–120 70–85 70–85 70–85 70–85 70–85

5.9 5.0 4.2 4.8

5.3 5.4 5.3 6.1

9.7 9.7 9.8 10.1

5.8 5.7 5.6 6.4

10.0 5.3 11.9 8.1 5.2 4.9

4.9 5.4 5.1 6.9 5.9 9.5

9.7 9.0 11.0 5.3 5.7 10.0

5.9 5.6 5.5 6.6 5.1 9.4

Note: CS, central space of SC; pI, isoelectric point of a protein (domain); *, yeast Zip1p proteins with deletions; **, the same proteins with duplications.

culties in mapping that region of the third D. melanogaster chromosome. This conclusion is further supported by a relative shift of positions of genes in this region on the molecular and cytogenetic NCBI maps (Fig. 6). We suggest, therefore, that it is the CG17604 gene that encodes the protein of transverse filaments of the central space of D. melanogaster synaptonemal complex and that this gene is most likely the known c(3)G gene. That suggestion can be experimentally tested. As such a test, Hawley and coworkers have analyzed the molecular organization of both the c(3)G gene and its product independently of and simultaneously with us [28, 29]. In Proceedings of the 17th

European Drosophila Research Conference (September 1–5, 2001), these authors reported that the c(3)G gene encoded a protein similar in structure to the Zip1p and SCP1 proteins [28]. In this study, the protein product of the c(3)G gene and monoclonal antibodies against that protein were obtained. As shown by immunofluorescent microscopy, in D. melanogaster oocytes the central part of bivalents in prophase was colored with these antibodies [29]. Thus, location of antibodies against the protein product of the c(3)G gene coincided with the proposed location of the CG17604 protein, which supports our suggestion that the c(3)G and CG17604 genes are identical.

89 A spno

89 B Aldox-1 Po ost rec c(3)G

tbi CG5404

CG5614 CG18505

spn-E glob1 ND23 CG4224 CG4560 CG14877 CG14875 CG4225 CG17604

Fig. 5. Genetic surrounding of the CG17604 gene according to the FlyBase data. Positions of the annotated genes revealed by computer-aided methods (including the CG17604 gene) and positions of some genes determined by genetic methods (including the c(3)G gene) are indicated on a cytological map of Drosophila chromosome 3 in region 89A–89B. RUSSIAN JOURNAL OF GENETICS

Vol. 38

No. 8

2002

COMPARATIVE GENOMICS AND PROTEOMICS

A Computer Search for Genes and Proteins of Synapsis in A. thaliana A search for specific sequence of the c(3)G gene in Drosophila was limited by a relatively small region and, therefore, a small number of the putative genes have been analyzed (78 genes). In other organisms, the genomes of which were completely sequenced, detection of genes with function similar to that of genes SCP1 and ZIP1 was complicated by a lack of mutations leading to disturbed formation of the synaptonemal complex. In these organisms, the region of search was expanded to the entire genome. To reduce the number of proteins examined, one more stage of analysis was necessary, namely, a search for the proteins with domain structure typical of the SCP1 and Zip1p proteins. We used the CDART (NCBI) software program (http://www.ncbi.nlm.nih.gov/Structure/lexington/ lexington.cgi?cmd=rps) to assess the secondary structure of analyzed proteins and to reveal those having domain organization similar to that of either Zip1p or SCP1 and the size adequate for the formation of SC transverse filaments in A. thaliana. To our knowledge, there are no published data on the width of SC central space in A. thaliana. Nevertheless, analysis of literature revealed an important feature of the general SC organization of most phylogenetically distant organisms, including higher plants: the width of the central space is at most 90–120 nm. Based on these considerations we detected a virtual protein AAD10695 in A. thaliana proteome, which was similar to the Zip1p and SCP1 proteins in size and domain organization (Table 3, Fig. 4). As mentioned above, the important feature of the known proteins of SC transverse filaments is the basic (pI > 8) character of their C-terminal domain. This was characteristic of the AAD10695 protein of A. thaliana. Thus, like in D. melanogaster, only one putative functional analog of the mammalian SCP1 and yeast Zip1p was revealed in A. thaliana, which is in agreement with the major evidence of comparative proteomics concerning highly specialized proteins. Specific Features of SC Genes and Proteins in Nematode C. elegans In C. elegans, several putative proteins annotated as orthologs for the Zip1p protein were identified in silico prior to our study. For instance, according to the WormBase data (http://www.wormbase.org/), the WP:CE17456 protein, a product of the syp2 gene, is a component of the SC central space in C. elegans [30]. Using the ProtParam software program (http://www.expasy. ch/tools/protparam.html) the amino-acid sequence of this protein was analyzed. The results obtained suggested that the WP:CE17456 protein is unable to form a coiled-coil structure with a length adequate (Table 3) for the formation of SC transverse filaments. We believe, therefore, that this protein does not constitute RUSSIAN JOURNAL OF GENETICS

Vol. 38

No. 8

1 88F

2

3 ~~

89A

89B

~~ ~

915 Genes

Surf4

Position on the cytologic map 88F3-88F4

Act88F

88F7-88F7

ENL/AF9 spno l(3)rN346 c(3)G

88E-88F 89A1-89A1 89A1-89A2 89A2-89A2

l(3)89Ac l(3)89Aa Scp2 T-A7.1 Mat89Bb l(3)S079311 l(3)neo46 dkn

89A2-89A5 89A2-89A5 89A-89A 89B1-89B1 89B-89B 89B-89B 89B1-89B6 89B-89B

Fig. 6. Cytogenetic and molecular maps of the 88F–89B region in D. melanogaster chromosome 3, where the c(3)G gene is localized according to NCBI data. The known genes and their positions are indicated on the cytological map: 1, cytological map; 2, gene positions on the molecular map; 3, gene positions on the cytogenetic map. Thin lines connect the sites of gene location on the two maps. Discrepancy between the molecular and cytogenetic maps for the studied region appears as a shift of the relative gene positions on the two maps.

the SC transverse filaments. According to the data of Proteome Inc. (http:/www.proteome.com/index.html), the protein Z81586 of C. elegans is also an analog of the SCP1. Indeed, the Z81586 may be a functional analog of the latter as judged from the molecule size, domain organization, the size of the coiled-coil central domain, and physicochemical properties. However, the point is that in C. elegans the SC morphology differs from that in yeast, mammals, and Drosophila. The distinctive feature of the SC in C. elegans was a lack of a clearly defined central element. In other organisms, the central element had an appearance of an electron-dense band in the central space lying in parallel to the lateral SC elements [31]. In some photomicrographs of the SC from C. elegans made by electron microscopy of ultrathin sections, the transverse filaments did not cross any central element on their way from one to the other lateral element of SC [32], which suggests a different type of molecular organization of the central space in C. elegans. In other organisms, two molecules directed towards each other are involved in the formation of the transverse filaments; they interact by their N-termini to form a central element (model 1). In contrast, in C. ele2002

916

BOGDANOV et al.

gans, the single long molecule probably extends from one to the other lateral SC element (model 2). If this is the case, both terminal domains of the protein should be basic. Schmekel and Daneholt proposed this mode of organization of the SC central space in the beetle Blaps cribrosa [33]. In view of the above, we have undertaken our own search for the analogs of the SCP1 and Zip1p among the proteins of C. elegans. Note that we selected proteins corresponding to both models: model 1 of two relatively short proteins positioned head-to-head and model 2 of only one long protein (Table 3). In the genome of C. elegans, we found three genes encoding such proteins. All these proteins (Q11102, T26844, and T27907) contained coiled coils of the adequate size. With regard to the Q11102 protein, which fits the model 2, a half-length of the coiled coil was taken, whereas the full-length coiled coils were taken for the two remaining proteins. Only the Q11102 protein met the requirement of a basic isoelectric point (pI) of the terminal domains according to model 2 (long proteins whose both terminal domains are basic). On the other hand, the only protein that fits model 1 (Fig. 4) by all parameters was the Z81586 protein. To decide between the two proteins, we used the Worm Base software to analyze the genetic surrounding of the two candidate genes encoding proteins Z81586 and Q11102, namely, the genetic map regions –4 ± 1 of chromosome 1 and –10 ± 3.5 of the X chromosome, respectively. None of the known genes responsible for normal meiosis was detected in these regions. Thus, we are still unable to decide between the two proteins. However, the uncommon morphology of the SC central space in C. elegans suggests that the Q11102 protein is the functional analog of the SCP1 and Zip1p proteins and that the transverse SC filaments in this nematode are formed by the long molecules of this protein, which spans the entire central space in contrast to yeast and mammals (in the latter, the SC central space is formed by two protein molecules directed towards each other). CONCLUDING REMARKS Total genome sequencing in the three model objects, yeast, nematode, and Arabidopsis, as well as the development of special computer programs allowed researchers not only to reveal the open reading frames of previously unknown genes, but also to predict the structural and functional parameters of the appropriate protein products. The two basic methods of an automated search for orthologs have limitations and can be used only for preliminary selection (except in some obvious cases). We have proposed an additional method of accurate identification of protein analogs in phylogenetically distant organisms. It consists in selection from computer databases of the proteins whose secondary structure corresponds to the spatial ultrastructural parameters of cell organelles. This approach proved to be productive, when applied to the key protein of the

synaptonemal complex in the three taxonomically distant organisms. The adequacy of this method was supported by the experiments on detection of this protein in Drosophila. It remains to be seen whether this approach is productive when applied to other proteins constituting the intracellular structures of other organisms. Of interest is the task to establish the mode of formation of the synaptonemal complexes and other cell structures assembled from proteins that have no homology in the primary structure but are structurally analogous in different kingdoms of eukaryotes [34]. ACKNOWLEDGMENTS This work was supported by the Russian Foundation for Basic Research (project nos. 99-04-48182 and 02-04-48761). REFERENCES 1. Goffeau, A., Barrell, B.G., Bussey, H., et al., Life with 6000 Genes, Science, 1996, vol. 274, no. 5287, pp. 563– 567. 2. The C. elegans Sequencing Consortium, Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology, Science, 1998, vol. 282, pp. 2012– 2020. 3. Adams, M.D., Celniker, S.E., Holt, R.A., et al., The Genome Sequence of Drosophila melanogaster, Science, 2000, vol. 287, pp. 2185–2195. 4. Myers, E.W., Sutton, G.G., Delcher, A.T., et al., A Whole-Genome Assembly of Drosophila, Science, 2000, vol. 287, pp. 2196–2204. 5. The Arabidopsis Genome Initiative, Analysis of the Genome Sequence of the Flowering Plant Arabidopsis thaliana, Nature, 2000, vol. 408, pp. 796–815. 6. Chervitz, S.A., Aravind, L., Sherlock, G., et al., Comparison of the Complete Protein Sets of Worm and Yeast: Orthology and Divergence, Science, 1998, vol. 282, pp. 2022–2028. 7. Rubin, G.M., Yandell, M.D., Wortman, J.R., et al., Comparative Genomics of the Eukaryotes, Science, 2000, vol. 287, pp. 2204–2215. 8. Kornberg, T.B. and Krasnov, M.A., The Drosophila Genome Sequence: Implication for Biology and Medicine, Science, 2000, vol. 287, pp. 2218–2220. 9. Timofeeff-Ressovsky, N.W. and Vogt, O., Über idiosomatishe Variationsgruppen und ihre Bedeutung für Dise Klassifikation der Krankheiten, Naturwiss., 1926, vol. 14, nos. 50–51, pp. 1188–1190. 10. Muller, H.J., Studies in Genetics, Bloomington: Indiana Univ. Press, 1962. 11. Bogdanov, Yu.F., The Molecular Concept of Meiosis Withstands a Test: Results of the Fourth European Conference on Meiosis, Genetika (Moscow), 2000, vol. 36, no. 4, pp. 585–590. 12. Grishaeva, T.M. and Bogdanov, Yu.F., The Genetic Control of Meiosis in Drosophila, Genetika (Moscow), 2000, vol. 36, no. 10, pp. 1301–1321.

RUSSIAN JOURNAL OF GENETICS

Vol. 38

No. 8

2002

COMPARATIVE GENOMICS AND PROTEOMICS 13. Zickler, D. and Kleckner, N., Meiotic Chromosomes: Integrating Structure and Function, Annu. Rev. Genet., 1999, vol. 33, pp. 603–754. 14. Meuwissen, R.L.J., Offenberg, H.H., Dietrich, A.J.J., et al., A Coiled-Coil Related Protein Specific for the Synapsed Regions of Meiotic Prophase Chromosomes, EMBO J., 1992, vol. 11, pp. 5091–5100. 15. Dobson, M.J., Pearlman, R.E., Karaiskakis, A., et al., Synaptonemal Complex Proteins: Occurrence, Epitope Mapping and Chromosome Disjunction, J. Cell. Sci., 1994, vol. 107, pp. 2749–2760. 16. Liu, J.G., Yuan, L., Brundell, E., et al., Localization of the N-Terminus of SCP1 to the Central Element of the Synaptonemal Complex and Evidence for Direct Interactions between the N-Termini of SCP1 Molecules Organized Head-to-Head, Exp. Cell Res., 1996, vol. 226, pp. 11–19. 17. Meuwissen, R.L.J., Meerts, I., Hoovers, J.M.N., et al., Human Synaptonemal Complex Protein 1 (SCP1): Isolation and Characterization of the cDNA and Chromosomal Localization of the Gene, Genomics, 1997, vol. 39, pp. 377–384. 18. Tung, K.S. and Roeder, G.S., Meiotic Chromosome Morphology and Behavior in zip1 Mutants of Saccharomyces cerevisiae, Genetics, 1998, vol. 149, pp. 817–832. 19. Dong, H. and Roeder, G.S., Organization of the Yeast Zip1 Protein within the Central Region of the Synaptonemal Complex, J. Cell Biol., 2000, vol. 148, no. 3, pp. 417–426. 20. Heyting, C., Synaptonemal Complex: Structure and Function, Curr. Opin. Cell Biol., 1996, vol. 8, pp. 389– 396. 21. Penkina, M.V., Karpova, O.I., and Bogdanov, Yu.F., Synaptonemal Complex Proteins: Specific Proteins of Meiotic Chromosomes, Mol. Biol. (Moscow), 2002, vol. 36, no. 1, pp. 1–11. 22. Schmekel, K., Meuwissen, R.L., Dietrich, A.J., et al., Organization of SCP1 Protein Molecules within Synaptonemal Complexes of the Rat, Exp. Cell Res., 1996, vol. 226, no. 1, pp. 20–30. 23. Bogdanov, Yu.F., Grishaeva, T.M., and Dadashev, S.Ya., The Drosophila melanogaster CG17604 Gene May Be a Possible Functional Homolog of the ZIP1 and SCP1 (SYCP1) Genes Coding for Synaptonemal Complex Proteins, Genetika (Moscow), 2002, vol. 38, no. 1, pp. 108– 112.

RUSSIAN JOURNAL OF GENETICS

Vol. 38

No. 8

917

24. Matsubayashi, H. and Yamamoto, M.-T., Dissection of Chromosome Region 89A of Drosophila melanogaster by Local Transposition of P Elements, Genes Genet. Syst., 1998, vol. 73, pp. 95–103. 25. Smith, P.A. and King, R.C., Genetic Control of Synaptonemal Complexes in Drosophila melanogaster, Genetics, 1968, vol. 60, pp. 335–351. 26. FlyBase Drosophila Database, http://flybase.bio. indiana.edu. 27. Carpenter, A.T.C., Electron Microscopy of Meiosis in Drosophila melanogaster Females. I. Structure, Arrangement and Temporal Change of the Synaptonemal Complex in Wild Type, Chromosoma, 1975, vol. 51, pp. 157–182. 28. Wayson, S.M., Page, S.L., Carey, B.W., et al., The Role of c(3)G in Promoting Meiotic Chromosome Synapsis and Exchange, The 17th European Drosophila Research Conference (1–5 September 2001), Edinburgh, 2001, p. 34. 29. Page, S.L. and Hawley, R.S., c(3)G Encodes a Drosophila Synaptonemal Complex Protein, Genes Dev., 2001, vol. 15, no. 23, pp. 3130–3143. 30. MacQueen, A.J., Colaiacovo, M.P., and Villeneuve, A.M., Molecules Underlying Meiotic Nuclear Reorganization and the Homolog Pairing Process, International C. elegans Meeting, 2001, p. 291. 31. Goldstein, P., The Synaptonemal Complex of Caenorhabditis elegans: Pachytene Karyotype Analysis of the dp1 Mutant and Disjunction Regulator Regions, Chromosoma, 1985, vol. 93, pp. 177–182. 32. Dernburg, A.F., McDonald, K., Moulder, G., et al., Meiotic Recombination in C. elegans Initiates by a Conserved Mechanism and Is Dispensable for Homologous Chromosome Synapsis, Cell (Cambridge, Mass.), 1998, vol. 94, pp. 387–309. 33. Schmekel, K. and Daneholt, B., The Central Region of the Synaptonemal Complex Revealed in Three Dimensions, Trends Cell Biol., 1995, vol. 5, pp. 239–242. 34. Bogdanov, Yu.F., Homologous Series of Meiotic Characters: Evolution and Conservation, Evolyutsiya, ekologiya, bioraznoobrazie. Materialy konferentsii pamyati Nikolaya Nikolaevicha Vorontsova, 1934–2000 (Evolution, Ecology, Biodiversity: Proc. Conf. in Memory of Nikolai Nikolaevich Vorontsov, 1934–2000), Krasilov, V.A., Ed., Moscow: Univ. Tsentr Dovuzovskogo Obrazovaniya, 2001, pp. 60–75.

2002