Pelagia Research Library An evolutionary Account of GPI Anchored ...

Available online at www.pelagiaresearchlibrary.com

Pelagia Research Library European Journal of Experimental Biology, 2011, 1 (1):148-155

ISSN 2248 –9215

An evolutionary Account of GPI Anchored Proteins Ashutosh Mani*, Swati Singh, Manish Dwivedi, Vijay Tripathi and Dwijendra K Gupta Centre of Bioinformatics, University of Allahabad, Allahabad, (U.P.) India ______________________________________________________________________________ ABSTRACT GPI anchors consist of three parts; protein, glycan and the phospholipids. The GPI anchored proteins work as cell surface hydrolases, protozoal antigens, adhesion molecules, mammalian antigens and involved in other significant cellular functions like dense packing of proteins on cell surface, increased protein mobility on cell surface , specific release from cell surface, control of exit from endoplasmic reticulum and toxin binding. Mutations in these proteins lead to Paroxysomal Nocturnal Haemogolbinuria and other disorders. This study was executed by combining comparative proteomics and phylogenetic approaches in order to address a cross family evolution of GPI anchor proteins from 23 different species. The results of revealed some unexplored specifics about the conserved domains GPI anchored proteins across different taxa of organisms. The results also demonstrated hierarchical assemblage based inconsistency in variation in the GPI anchored proteins. Keywords: GPI, Haemoglobinuria, Phylogeny, Hydrophobicity profile. ______________________________________________________________________________ INTRODUCTION Glycosylphosphatidylinositol (GPI anchor) is a glycolipid that can be attached to the C-terminus of a protein during post translational modification. It is composed of a hydrophobic phosphatidylinositol group linked through a carbohydrate containing linker (glucosamine and mannose glycosidically bound to the inositol residue) to the C-terminal amino acid of a mature protein [1]. The two fatty acids within the hydrophobic phosphatidyl-inositol group anchor the protein to the cell membrane [2]. GPI proteins have been found in a wide variety of eukaryotes : mammals (45 in humans), chickens (10), fish, rays, sea urchin, fruit flies (5), silk moth, ticks, grasshopper, protozoa (trypanosomes, leishmania, paramecium), fungi, slime mold, unicellular green alga, mung bean, even herpes virus (simian surface glycoprotein), but not in bacteria, and oddly nothing reported from nematode (out of 1208 proteins). GPI-proteins are evenly split between enzymes and binding, recognition, and transport non-catalytic proteins . This split correlates fairly strongly with whether internal tandem repeats are present (non-catalytic) or not 148 Pelagia Research Library

Ashutosh Mani et al

Eur. J. Exp. Bio., 2011, 1 (1):148-155

______________________________________________________________________________ (catalytic) [3]. A GPI-anchor unsurprisingly implies a signal peptide but by no means conversely. There is a division between O- and N-glycosylation. Identified functions are clearly appropriate to the extra-cytoplasmic location; GPI proteins are over-represented in neurons. At least one human disease, paroxysmal nocturnal hemoglobinuria, is a result of defective GPI anchor addition to plasma membrane proteins [4]. MATERIALS AND METHODS In order to search GPI-anchor protein family members we performed BLAST (Basic local alignment Search tool) [5] by using blastp program in the protein database at Natioanl Center for Biotechnology Information’s Entrez database [6]. Homo Sapiens GPI anchor protein’s gi|4504079|ref|NP_003792.1| amino acid sequence was selected as query. From the hits 23 sequences each from different species were selected for further studies.(Table 1) All the sequences were taken in FASTA format. The sequences were examined individually and aligned using CLUSTALW [7]. Bioedit version 7.0.9.0[8] was used for manual editing and analysis of sequences. Kyte and Dolittle method [9] was used to plot hydrophobicity profile. Eisenberg method [10] was used to plot hydrophobic moment profile with a window size of 13 residues having six residues on either side of the current residue and rotation angle, θ =100 degrees. For a conserved region search within the multiple aligned sequences minimum segment length was set to 10 residues, maximum average entropy was set to be 0.9 and the gaps were limited to 10 per segment. Multiple sequence alignment, phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 [11]. For pair wise and multiple alignments gap open penalty was -7 and gap extension penalty was -1. BLOSUM weight matrix was used for substitution scoring [12, 13]. Hydrophilic gap penalties were used to increase the chances of a gap within a run (5 or more residues) of hydrophilic amino acids; these are likely to be loop or random coil regions where gaps are more common. The multiple alignments of sequences of GPI anchor proteins were used to create phylogenetic trees. The evolutionary history was inferred using the Neighbour-Joining method [14]. All the characters were given equal weights. The bootstrap consensus tree inferred from 10000 replicates was taken to represent the evolutionary history of the taxa analyzed [15]. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the poisson correction method and are in the units of the number of amino acid substitutions per site [16]. All positions containing gaps and missing data were eliminated from the dataset (Complete deletion option). There were a total of 167 positions in the GPI anchor proteins’ final dataset. RESULTS AND DISCUSSION Multiple Sequence Alignment MSA (Multiple Sequence Alignment) of GPI anchor proteins resulted into 806 positions out of which 604 were parsimony informative, 715 variable sites, conserved sites were 72 and 96 singleton sites. B y statistical analysis of multiple aligned sequences it was observed that leucine, alanine , serine, valine, glycine, phenyl alanine, isoleucine are the most frequently present amino acids with frequency percentage of =15.0, 9.03, 7.22 , 7.03, 7.02, 5.14 and 5.14 respectively. While within conserved regions aspargine, glutamine, tyrosine, glycine and arginine were observed to be most frequent with frequency percentage of 13.97, 13.59, 12.48, 10.61 and 149 Pelagia Research Library

Ashutosh Mani et al

Eur. J. Exp. Bio., 2011, 1 (1):148-155

______________________________________________________________________________ 9.31respectively. Both the species of Arabidopsis thaliana show characteristic conserved regions of 13 and 23 amino acids belonging to position 145 to 157 and 339 to 362 in the MSA. Conserved Domain Search A conserved region search resulted into four regions from positions 170 to 192, 222 to 247, 273 to 290 and 301 to 322 (Figure 1). This conservation has already been upheld by minimal entropy shown by respective positions of the previous half of the MSA results. Entropy Plot An entropy plot i.e. measure of the lack of the information content and the amount of variability, was generated for all the aligned positions (Figure 2). The plot shows that entropy rarely touches a scale of two, showing minimal entropy at several positions subjected to previous half of the protein sequences, Hydrophobicity Profile A hydrophobicity profile plot shows that mean hydrophobicity of the protein for most of the positions is in all the species is below zero; occasionally it turns to be positive (Figures 3a and 3b). From the profile it is clear that the regions related to conserved positions also have a characteristic of possessing residues in a balanced way and the profile is always around zero value. However the regions after position 500 have greater hydrophobicity, subjected to less conserved part of the protein Name of Organism Homo sapiens Mus musculus Mus musculus(2) Bos taurus Drosophila melanogaster Ostreococcus tauri Neosartorya fischeri Sacchromyces pombe Saccharomyces cerevisiae Xenopus laevis Xenopus tropicalis Tribolium castaneum Tetraodon nigroviridis Pan troglodytes Rattus norvegicus Macaca mulatta Drosophila pseudoobscura Aspergillus fumigatus Af293 Arabidopsis thaliana Arabidopsis thaliana(2) Candida albicans Saccharomyces cerevisiae Brugia malayi

NCBI Access No. gi|4504079| gi|74267686| gi|9453837| gi|74267686| gi|24639992| gi|116059596| gi|119494946| gi|22001633| gi|190406124| gi|148234538| gi|62752867| gi|91087137| gi|47220384| gi|114579535| gi|51948452| gi|109087710| gi|125983108| gi|70992611| gi|27311615| gi|30687160| gi|68491370| gi|170573680| gi|170573680|

Length (AA) 621 621 621 617 615 616 501 380 614 615 615 669 645 759 621 561 672 622 699 699 567 614 357

Phylogeny The phylogenetic trees were constructed by using Neighbour–joining method (Figure 4 and 5) .The tree shows different organisms on tree nodes branched on the basis of their GPI anchor proteins. Arabidopsis thaliana being a plant species appears with a totally diverged branch from the main tree with a bootstrap support percentage of 69. Node for the fungal species namely Aspergillus fumigates, Neosartorya fischeri, Candida albicans and Sclerotinia sclerotiorum, 150 Pelagia Research Library

Ashutosh Mani et al

Eur. J. Exp. Bio., 2011, 1 (1):148-155

______________________________________________________________________________ Sacchromyces cerevisiae and Schizosacchromyce pombe has been supported by a bootstrap support value of 92. The Node for Arthropods (Drosophila melanogaster, Drosophila pseudoobscura and Tribolium castaneum) has been supported by bootstrap value of 89.Node for Mammals has been supported by 99 bootstrap support values. Table 1: Protein sequence used for comparative genomics and evolutionary studies

(a)

(b)

(c)

(d)

Figure 1- Four conserve domains (a), (b), (c) and (d) found by conserve domain search in MSA

151 Pelagia Research Library

Ashutosh Mani et al

Eur. J. Exp. Bio., 2011, 1 (1):148-155

______________________________________________________________________________

Figure 2- Entropy plot. X-axis shows the positions of MSA and the Y-axis shows entropy scores for individual positions in MSA.

Figure 3 a - Colour code for different species in hodrphobicity plot


Ashutosh Mani et al

Eur. J. Exp. Bio., 2011, 1 (1):148-155

______________________________________________________________________________

Figure 3b- Hydrophobicity plot. X-axis demonstrates the positions of MSA and the Y-axis shows hydrophobicity scores for individual positions in MSA

F ig u re 4 - Bootstrap consensus tree of GPI anchor proteins prepared by NJ method


Ashutosh Mani et al

Eur. J. Exp. Bio., 2011, 1 (1):148-155

______________________________________________________________________________

Figure 5-Bootstrap original tree of GPI anchor proteins prepared by NJ method

CONCLUSION This study presents the first comparative proteomic study and evolutionary analysis of the GPI anchor proteins based on molecular phylogeny across different families of organisms. Organisms belonging to lower hierarchical assemblage demonstrate less variation in the GPI anchor proteins in comparison to higher ones. The protein has a below zero average hydrophobicity index. The results endow with an excellent perception about the evolutionary order of GPI anchor proteins.This study established an overall framework of information for the family of GPI anchor proteins, which may facilitate and stimulate the study of this gene family across all the organisms. Acknowledgements AM and VT are Thankful to UGC New Delhi for research fellowship. MD is thankful to DST Govt. of India for research fellowship. This work has been supported by DBT BTISNet grant to DKG.


Ashutosh Mani et al

Eur. J. Exp. Bio., 2011, 1 (1):148-155

______________________________________________________________________________ REFERENCES [1] P. W. Janes, S.C. Ley, A.I. Magee and P.S.Kbouridis,Semin. Immunol. , 2000, 12, 23-34. [2] B. Eisenhaber, S. Maurer-Stroh, Maria Novatchkova, Georg Schneider, Frank Eisenhaber, Enzymes and axillary factors for GPI lipid anchor biosynthesis and post-translational transfer to proteins, Wiley Periodicals, Inc. 25:367-385, 2003. [3] S. Sabharanjak, Dev. Cell ,2002,2: 411–423. [4] J. Takeda, T. Miyata, K.Kawagoe , Cell ,1993,73:703-711. [5] S. F. Altschul, T.L. Madden, A. A. Schaffer, Jinghui, Z. Zhang, Webb Miller, D.J. Lipman, Nucleic Acids Res.,1997, 25:3389-3402. [6] NCBI/Entrez at weblink http://ncbi.nlm.nih.gov/entrez [7] D. Higgins, J. Thompson, T. Gibson, Nucleic Acids Res.,1994,22:4673-4680. [8] T. A. Hall., Nucleic Acids Symposium Series, 1999,41: 95-98. [9] J. Kyte, R F Doolittle, J Mol Biol, 1982, 157:105. [10] D. E. Eisenberg, M. Komaromy , R. Wall, Mol. Biol.,1984, 179(1):125-42179(1):125-42. [11] K. Tamura , J. Dudley, M. Nei, S. Kumar, Molecular Biology and Evolution ,2007,24:15961599. [12] S. F. Altschul, G. Gish, Methods Enzymol., 1996, 266:460-480. [13] S. Henikoff, J. Henikoff, Proceedings of the National Academy of Sciences USA,1992, 89. [14] N. Saitou, M. Nei, Molecular Biology and Evolution ,1987,4:406-425. [15] J. Felsenstein , Evolution, 1985,39:783-791. [16] E. Zuckerkandl , L. Pauling, Academic Press ,97-166,1965.