Evolutionary patterns of phosphorylated serines
© Kurmangaliyev et al; licensee BioMed Central Ltd. 2011
Received: 28 September 2010
Accepted: 9 February 2011
Published: 9 February 2011
Posttranslationally modified amino acids are chemically distinct types of amino acids and in terms of evolution they might behave differently from their non-modified counterparts. In order to check this possibility, we reconstructed the evolutionary history of phosphorylated serines in several groups of organisms. Comparisons of substitution vectors have revealed some significant differences in the evolution of modified and corresponding non-modified amino acids. In particular, phosphoserines are more frequently substituted to aspartate and glutamate, compared to non-phosphorylated serines.
This article was reviewed by Arcady Mushegian and Sandor Pongor.
Post-translational modifications play an important role in diversifying protein structure and function [1, 2]. Protein phosphorylation is one of the most important and widely distributed types of post-translational modifications. In eukaryotes, reversible protein phosphorylation plays a key role in the signal transduction and other processes [3, 4]. Recent advances in mass spectrometry allowed for large-scale identification of phosphorylation events . Analyses of these data have already revealed some specific structural and evolutionary features of phosphoserines. Phosphoserines tend to occur in intrinsically disordered regions [6–8] and regions corresponding to alternatively spliced gene segments . Phosphorylated amino acids are more conserved than their non-phosphorylated counterparts [7, 10–12]. Some very old phosphorylation events potentially can be common to organisms from Archaea to human .
Here we investigated another evolutionary aspect of protein modification sites. Since modified amino acids chemically are a distinct type of amino acids, in terms of evolution they might behave differently from their non-modified counterparts (on the top of the different level of conservation). To analyse differences in the evolution of standard amino acids and their modified counterparts, we reconstructed the evolution of phosphorylated amino acids in three groups of organisms. Particularly, we studied phosphorylation of serine in the human, fruit fly and yeast proteomes.
Phosphorylation sites were downloaded from the PHOSIDA  and PhosphoPEP  databases. For yeast and fruit fly we studied phosphoserines obtained in two high-throughput experiment each, by different groups of researchers [13–16]. For human we used datasets obtained in four different high-through experiments [17–20]. Phosphorylation is highly dynamic process, and the overlap of phosphorylation events identified in different experiments from various cell lines and tissues is relatively small. Sites observed to be phosphorylated in more than one high-througput experiment likely are modified in a more constitutive manner, or at least represent a more reliable dataset of phosphoserines.
We analysed the evolution of modification sites and their non-modified counterparts separately among eight vertebrates (human Homo sapiens; chimpanzee Pan trogolodytes; mouse Mus musculus; rat Rattus norvegicus; cow Bos taurus; dog Canis lupus familiaris; chicken Gallus gallus; and zebrafish Danio rerio), eleven fruit flies (Drosophila melanogaster; D. yakuba; D. erecta; D. sechecellia; D. ananassae; D. pseudoobscura; D. persimilis; D. wilistoni; D. mojavensis; D. virilis; D. grimshawi) and fifteen fungi (Saccharomyces cerevisiae; S. paradoxus; S. mikatae; S. bayanus; Candida glabrata; S. castelli; Kluyveromyces waltii; K. lactis; Ashbya gossypii; Debaryomyces hansenii; C. albicans; Yarrowia lipolytica; Aspergillus nidulans; Neurospora crassa; Schizosaccharomyces pombe). Orthologs of modified H. sapiens proteins were obtained from HomoloGene ; for D. melanogaster, from FlyBase ; and for S. cerevisiae, from FungalOrthogroups . Only orthologs with the highest identity to the modified protein were selected from each species. Multiple alignments were constructed using ClustalW .
As mentioned above, the evolutionary features and frequencies of phosphoserines may depend on structural context. Especially, phosphoserines tend to occur within intrinsically disordered regions of proteins [6–8]. To take this into account, we analysed serines from disordered regions and ordered regions of phosphoproteins separately. Intrinsically disordered regions were predicted by PONDR VSL2 .
Datasets of phosphorylated and non-phosphorylated serines
Initial sets of serine residues
phosphoserines observed more than once
Serines with at least one substitution to other types of amino acids, within ordered regions
phosphoserines observed more than once
Serines with at least one substitution to other types of amino acids, within disordered regions
phosphoserines observed more than once
The control sets consisted of non-modified serine residues from disordered regions of the same proteins. To measure the statistical significance of the difference between substitution vectors of modified and non-modified serines we performed bootstraping of control sets. To do that that, we generated 10000 random control sets of non-phosphorylated serines. Each control set was of the same size as the corresponding phosphorylated set (generic sets and subsets of reliable phophosites).
Structural features of phosphoserines may not be limited to disorder of surrounding protein regions, and may include other specific properties such as secondary structures, solvent availability etc. Therefore, to maximally eliminate the confounding effects, we created additional control sets containing non-modified serines located at the same protein regions as modification sites. Non-modified serines, was collected at the maximal distance of 10, 11 and 9 amino acid residues from phosphoserines, for yeast, fruit fly and human respectively. Again, the size of the control sets was the same as the size of the respective phosphoserine sets.
There are considerable other shifts of substitution rates common to all three taxa. Particularly, phosphoserines are relatively rarely substituted to alanines and cysteines (Figure 2). However, in these cases, the control-set substitution vectors of non-phosphorylated serines located in the same regions as phosphoserines were also shifted in the same direction as phosphoserines (as compared to all non-phosphorylated serines). Hence, these shifts are likely related not to modifications, but to specific features of these regions.
The rates of substitutions to aspartate and glutamate in the additional control sets of nearest non-phosphorylated serines also are not shifted, with the exception of vertebrates where they are also shifted toward higher values (but still to a much weaker extent than in case of phosphoserines). Note that these control sets may be contaminated by phosphoserines. Indeed, phosphoserines tends to co-occur, forming clusters . Therefore the sets of nearest non-phosphorylated serines likely contain phosphoserines which were not detected yet. Removing these phosphoserines would increase the significance of our observations.
The comparison with nearest non-phosphorylated serines takes into account the fact that phosphoserines tend to occur in intrinsically disordered regions. Methods used in large-scale phosphoproteomic experiments are based on selection of negatively charged peptides which results in a bias towards enrichment of phosphopeptides with acidic residues [29, 30]. This fact, coupled with the fact that phosphoserines may shift positions within rapidly evolving disordered regions  and general problems of alignments of such regions could distort our analysis. But this would have the same influence on our control sets of non-modified serines from the same regions of proteins. Hence the observed differences between these controls and phosphoserines cannot be explained by such artifacts.
In addition to serine phosphorylation, we analysed the evolution of another abundant type of protein modification, lysine acetylation. Recently two large datasets of human acetylation sites became available [32, 33]. We observed some differences between substitution vectors of acetylated and non-acetylated lysines, but the results obtained for these two sets of acetyllysines were discordant (data not shown). As noted in one of these papers , the spectrum of acetylated proteins is different between these two datasets obtained from different tissues. We observed that less than 2% of sites are common for both datasets. It is seems that the available acetyllysine data are not sufficient for meaningful analysis.
It should be taken into account that our substitution vectors are probably enriched with false-positive phosphosites. This results from of our over-simplified assumption that a site is modified from the first appearance of the corresponding residue in the evolutionary record. Additionally, phosphoserines from large-scale experiments may be false-positive sites. There is evidence that many phosphorylation sites could be non-functional or non-specific, as sometimes functional targets of phosphorylation are not particular sites, but entire protein regions [31, 34, 35]. On the other hand, the control sets could contain not yet detected phosphoserines. These false positives and false negatives should blur the differences between the substitution vectors of modified and non-modified residues. Most likely, the real level of differences is higher than the one observed here.
Reviewer's Report 1
Reviewer 1: Arcady Mushegian - Stowers Institute, Kansas City, USA
The idea of comparing of evolutionary substitution patterns of modified and non-modified residues in proteins is good, and the approach proposed by the authors, i.e., to reconstruct, using an ML model, the point at which the target of modification first emerged and then to see what it mutates to, is probably the only computational approach plausible at the moment.
I trust the authors that their implementation of this approach is technically sound, but, unfortunately, this is hard to ascertain from the submitted version of the manuscript, which reads as a preliminary draft devoid of the quantitative details. This has to change - please provide at least the following:
1. The collection of phosphorylated and acetylated sites: how many sites of each type in each organism are there?
A table with a description of the final datasets used for the construction of substitution vectors has been added to the revised version (Table 1).
2. The phosphorylation sites at least (also acetylated sites?) are said to occur more often in the intrinsically disorded regions. Taking the non-globular regions in the proteins (which can be identified, e.g., using Wootton and Federhen's SEG program) as a proxy for "intrinsic disorder", can it be said that the actual sample of modified residues that the authors were working with is indeed more commonly occurring in such regions? And how does this sit with the ability to align the proteins in these regions?
We predicted intrinsically disordered regions and recalculated substitution vectors separately for serine residues from disordered and ordered regions. Most of phosphoserines from the initial datasets came from protein regions predicted to be disordered (Table 1). Problems with alignments of such region are discussed in the revised version. Additional controls of non-modified serines from same regions of proteins were introduced to address this problem.
3. The "control sets" of non-modified serines (more accurately, not-observed-to-be-modified serines): are these found in the disordered/non-globular regions to the same extent as the modified ones? If not, the controls may be biased with regard to amino acid composition and to the regions of the protein molecules (e.g., buried vs exposed) - test this directly please.
Indeed, the amino acid composition of disordered regions and regions with a regular structure differs strongly. As described in response to comment #2, in the revised version we considered both phosphosrylated and non-phosphorylated serines from disordered and regular regions separately. Moreover, as discussed in the revised text, sets including only closest non-modified serines provide an even better control for artifacts that could be caused by specifics of regions surrounding modification sites.
4. The trends that the authors discuss are interesting but weak - to what extent this may be explained by the small sample sizes? What was the statistical test for which the P-values are reported?
The initial dataset of serines were large enough, but only a fraction of them were substituted to other amino acids as demonstrated by evolutionary reconstruction. The final datasets are described in Table 1.
To measure the statistical significance, we used bootstraps of control sets of non-modified serines. For all phosphoserines and, separately, for the subset of phosphoserines observed in more than one experiment, we generated 10000 random sets of non-modified serines of appropriate size. For additional controls using neighbouring sites, we compiled sets of nearest serines of the same size as the corresponding sets of phosphorylated serines.
5. In vertebrates, the "neighboring" serines from control set 2 seem to be faithfully following the trend towards change into D or E, with some separation from the control set 1. If this trend withstands the possible correction proposed in #2, perhaps this means that, in a "disordered" region that has several serines, any or all of them may targets of phosphorylation. Perhaps then it would be interesting to sum the substitution vectors over the region that has several serines, at least one of which is phosphorylated (i.e., how likely is it that at least one serine in this region is substituted by amino acid X?)
The phosphoserines tends to cluster in the sequence . Thus, as discussed in the revised version, the control set consisting of nearest non-phosphorylated serines could be contaminated by false-negative phosphoserines, not yet detected in experiments. On the other hand, as the trend in the control set of nearest serines is weaker, averaging of the substitution vectors would simply dilute the observation.
Reviewer's Report 2
Reviewer 2: Sandor Pongor - International Centre for Genetic Engineering and Biotechnology, Trieste, Italy
There is mounting evidence in recent years that the study of post-translational modifications has important lessons for understanding diverse aspects of protein evolution. It has been noted among others that phosphorylated sites tend to occur in those segments of the proteins that are intrinsically disordered and/or correspond to alternative splice sites. Currently there are insufficient data on the conservation of modified sites. Kurmangalyev and associates address this problem using carefully selected datasets and well-designed statistical analyses.
The authors conclude that there are significant differences in the evolution of modified and corresponding non-modified amino acids. In particular, phosphoserines are more frequently substituted to aspartate and glutamate, compared to non-phosphorylated serines. Similarly, acetyllysines are more rarely substituted to isoleucine and valine. These findings underline the importance of post-translational modifications when discussing the variation of residue conservations within sequence regions. The methodology is straightforward and sound and will be a useful template for future studies. The authors may want to add a few examples for situation where this approach can or can not be used.
As discussed in the revised text, the analysis of a newly available dataset of human acetylation sites  did not confirm our initial observations. This is likely due to low reproducibility of currently available datasets of avetyllysines (the overlap between two datasets is extremely small). This suggests that conclusions based on such analyses should be done carefully, on data obtained from different sources and for a variety of organisms. We have encountered a similar problem with phosphothreonines and phosphotyrosines, where the datasets were simply too small for reliable conclusions.
We are grateful to Dmitry Malko, Ekaterina Ermakova and Anna Lyubetskaya who shared their programs and data, and to Stefka Tyanova and Jürgen Cox for useful discussions. This study was partially supported by the state contract 2.740.11.0101, Russian Foundation of Basic Research (09-04-92745), and program "Molecular and Cellular Biology" of the Russian Academy of Sciences.
- Mann M, Jensen ON: Proteomic analysis of post-translational modifications. Nat Biotechnol. 2003, 21: 255-261. 10.1038/nbt0303-255.PubMedView ArticleGoogle Scholar
- Seo J, Lee KJ: Post-translational Modifications and Their Biological Function: Proteomic Analysis and Systematic Approaches. Journal of Biochemistry and Molecular Biology. 2004, 37: 35-44.PubMedView ArticleGoogle Scholar
- Hunter T: Signaling-2000 and beyond. Cell. 2000, 100: 113-127. 10.1016/S0092-8674(00)81688-8.PubMedView ArticleGoogle Scholar
- Cohen P: The origins of protein phosphorylation. Nat Cell Biol. 2002, 4: E127-E130. 10.1038/ncb0502-e127.PubMedView ArticleGoogle Scholar
- Ptacek J, Snyder M: Charging it up: global analysis of protein phosphorylation. Trends Genet. 2006, 22: 545-554. 10.1016/j.tig.2006.08.005.PubMedView ArticleGoogle Scholar
- Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK: The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004, 32: 1037-1049. 10.1093/nar/gkh253.PubMedPubMed CentralView ArticleGoogle Scholar
- Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M: PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol. 2007, 8: R250-10.1186/gb-2007-8-11-r250.PubMedPubMed CentralView ArticleGoogle Scholar
- Collins MO, Yu L, Campuzano I, Grant SG, Choudhary JS: Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder. Mol Cell Proteomics. 2008, 7: 1331-1348. 10.1074/mcp.M700564-MCP200.PubMedView ArticleGoogle Scholar
- Kurmangaliev EZh, Gel'fand MS: [Alternative splicing tends to involve phosphorylation sites]. Mol Biol (Mosk). 2009, 43: 572-574.Google Scholar
- Macek B, Gnad F, Soufi B, Kumar C, Olsen JV, Mijakovic I, Mann M: Phosphoproteome analysis of E. coli reveals evolutionary conservation of bacterial Ser/Thr/Tyr phosphorylation. Mol Cell Proteomics. 2008, 7: 299-307.PubMedView ArticleGoogle Scholar
- Malik R, Nigg EA, Körner R: Comparative conservation analysis of the human mitotic phosphoproteome. Bioinformatics. 2008, 24: 1426-1432. 10.1093/bioinformatics/btn197.PubMedView ArticleGoogle Scholar
- Boekhorst J, van Breukelen B, Heck A, Snel B: Comparative phosphoproteomics reveals evolutionary and functional conservation of phosphorylation across eukaryotes. Genome Biol. 2008, 9: R144-10.1186/gb-2008-9-10-r144.PubMedPubMed CentralView ArticleGoogle Scholar
- Bodenmiller B, Campbell D, Gerrits B, Lam H, Jovanovic M, Picotti P, Schlapbach R, Aebersold R: PhosphoPep - a database of protein phosphorylation sites in model organisms. Nat Biotechnol. 2008, 26: 1339-1340. 10.1038/nbt1208-1339.PubMedPubMed CentralView ArticleGoogle Scholar
- Bodenmiller B, Malmstrom J, Gerrits B, Campbell D, Lam H, Schmidt A, Rinner O, Mueller LN, Shannon PT, Pedrioli PG, Panse C, Lee HK, Schlapbach R, Aebersold R: PhosphoPep - a phosphoproteome resource for systems biology research in Drosophila Kc167 cells. Mol Syst Biol. 2007, 3: 139-10.1038/msb4100182.PubMedPubMed CentralView ArticleGoogle Scholar
- Hilger M, Bonaldi T, Gnad F, Mann M: Systems-wide analysis of a phosphataseknock-down by quantitative proteomics and phosphoproteomics. Mol Cell Proteomics. 2009, 8: 1908-1920. 10.1074/mcp.M800559-MCP200.PubMedPubMed CentralView ArticleGoogle Scholar
- Gnad F, de Godoy LM, Cox J, Neuhauser N, Ren S, Olsen JV, Mann M: High-accuracy identification and bioinformatic analysis of in vivo protein phosphorylation sites in yeast. Proteomics. 2009, 9: 4642-4652. 10.1002/pmic.200900144.PubMedView ArticleGoogle Scholar
- Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M: Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell. 2006, 127: 635-648. 10.1016/j.cell.2006.09.026.PubMedView ArticleGoogle Scholar
- Daub H, Olsen JV, Bairlein M, Gnad F, Oppermann FS, Körner R, Greff Z, Kéri G, Stemmann O, Mann M: Kinase-selective enrichment enables quantitative phosphoproteomics of the kinome across the cell cycle. Mol Cell. 2008, 31: 438-448. 10.1016/j.molcel.2008.07.007.PubMedView ArticleGoogle Scholar
- Oppermann FS, Gnad F, Olsen JV, Hornberger R, Greff Z, Kéri G, Mann M, Daub H: Large-scale proteomics analysis of the human kinome. Mol Cell Proteomics. 2009, 8: 1751-1764. 10.1074/mcp.M800588-MCP200.PubMedPubMed CentralView ArticleGoogle Scholar
- Olsen JV, Vermeulen M, Santamaria A, Kumar C, Miller ML, Jensen LJ, Gnad F, Cox J, Jensen TS, Nigg EA, Brunak S, Mann M: Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Sci Signal. 2010, 3: ra3-10.1126/scisignal.2000475.PubMedView ArticleGoogle Scholar
- Wheeler Geer LY, Marchler-Bauer A, Geer RC, Han L, He J, He S, Liu C, Shi W, Bryant SH: The NCBI BioSystems database. Nucleic Acids Res. 2010, 38: D492-D496. 10.1093/nar/gkp858.View ArticleGoogle Scholar
- Tweedie S, Ashburner M, Falls K, Leyland P, McQuilton P, Marygold S, Millburn G, Osumi-Sutherland D, Schroeder A, Seal R, Zhang H, The FlyBase Consortium: FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res. 2009, 37: D555-D559. 10.1093/nar/gkn788.PubMedPubMed CentralView ArticleGoogle Scholar
- Wapinski I, Pfeffer A, Friedman N, Regev A: Natural history and evolutionary principles of gene duplication in fungi. Nature. 2007, 449: 54-61. 10.1038/nature06107.PubMedView ArticleGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.PubMedView ArticleGoogle Scholar
- Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z: Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics. 2006, 7: 208-10.1186/1471-2105-7-208.PubMedPubMed CentralView ArticleGoogle Scholar
- Tarrant MK, Cole PA: The chemical biology of protein phosphorylation. Annu Rev Biochem. 2009, 78: 797-825. 10.1146/annurev.biochem.78.070907.103047.PubMedPubMed CentralView ArticleGoogle Scholar
- Song Q, Pallikkuth S, Bossuyt J, Bers DM, Robia SL: Phosphomimetic mutations enhance phospholemman oligomerization and modulate its interaction with the NA/K-ATPase. J Biol Chem. 2011.Google Scholar
- Schweiger R, Linial M: Cooperativity within proximal phosphorylation sites is revealed from large-scale proteomics data. Biol Direct. 2010, 5: 6-10.1186/1745-6150-5-6.PubMedPubMed CentralView ArticleGoogle Scholar
- Ficarro SB, McCleland ML, Stukenberg PT, Burke DJ, Ross MM, Shabanowitz J, Hunt DF, White FM: Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat Biotechnol. 2002, 20: 301-305. 10.1038/nbt0302-301.PubMedView ArticleGoogle Scholar
- Mann M, Ong SE, Grønborg M, Steen H, Jensen ON, Pandey A: Analysis of protein phosphorylation using mass spectrometry: deciphering the phosphoproteome. Trends Biotechnol. 2002, 20: 261-268. 10.1016/S0167-7799(02)01944-3.PubMedView ArticleGoogle Scholar
- Holt LJ, Tuch BB, Villén J, Johnson AD, Gygi SP, Morgan DO: Global analysis of Cdk1 substrate phosphorylation sites provides insights into evolution. Science. 2009, 325: 1682-1686. 10.1126/science.1172867.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhao S, Xu W, Jiang W, Yu W, Lin Y, Zhang T, Yao J, Zhou L, Zeng Y, Li H, Li Y, Shi J, An W, Hancock SM, He F, Qin L, Chin J, Yang P, Chen X, Lei Q, Xiong Y, Guan K: Regulation of Cellular Metabolism by Protein Lysine Acetylation. Science. 2010, 327: 1000-1004. 10.1126/science.1179689.PubMedPubMed CentralView ArticleGoogle Scholar
- Choudhary C, Kumar C, Gnad F, Nielsen ML, Rehman M, Walther TC, Olsen JV, Mann M: Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science. 2009, 325: 834-840. 10.1126/science.1175371.PubMedView ArticleGoogle Scholar
- Landry CR, Levy ED, Michnick SW: Weak functional constraints on phosphoproteomes. Trends Genet. 2009, 25: 193-197. 10.1016/j.tig.2009.03.003.PubMedView ArticleGoogle Scholar
- Tan CS, Jørgensen C, Linding R: Roles of "junk phosphorylation" in modulating biomolecular association of phosphorylated proteins?. Cell Cycle. 2010, 9: 1276-1280. 10.4161/cc.9.7.11066.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.