Evolutionary patterns of phosphorylated serines
Biology Direct volume 6, Article number: 8 (2011)
Posttranslationally modified amino acids are chemically distinct types of amino acids and in terms of evolution they might behave differently from their non-modified counterparts. In order to check this possibility, we reconstructed the evolutionary history of phosphorylated serines in several groups of organisms. Comparisons of substitution vectors have revealed some significant differences in the evolution of modified and corresponding non-modified amino acids. In particular, phosphoserines are more frequently substituted to aspartate and glutamate, compared to non-phosphorylated serines.
This article was reviewed by Arcady Mushegian and Sandor Pongor.
Post-translational modifications play an important role in diversifying protein structure and function [1, 2]. Protein phosphorylation is one of the most important and widely distributed types of post-translational modifications. In eukaryotes, reversible protein phosphorylation plays a key role in the signal transduction and other processes [3, 4]. Recent advances in mass spectrometry allowed for large-scale identification of phosphorylation events . Analyses of these data have already revealed some specific structural and evolutionary features of phosphoserines. Phosphoserines tend to occur in intrinsically disordered regions [6–8] and regions corresponding to alternatively spliced gene segments . Phosphorylated amino acids are more conserved than their non-phosphorylated counterparts [7, 10–12]. Some very old phosphorylation events potentially can be common to organisms from Archaea to human .
Here we investigated another evolutionary aspect of protein modification sites. Since modified amino acids chemically are a distinct type of amino acids, in terms of evolution they might behave differently from their non-modified counterparts (on the top of the different level of conservation). To analyse differences in the evolution of standard amino acids and their modified counterparts, we reconstructed the evolution of phosphorylated amino acids in three groups of organisms. Particularly, we studied phosphorylation of serine in the human, fruit fly and yeast proteomes.
Phosphorylation sites were downloaded from the PHOSIDA  and PhosphoPEP  databases. For yeast and fruit fly we studied phosphoserines obtained in two high-throughput experiment each, by different groups of researchers [13–16]. For human we used datasets obtained in four different high-through experiments [17–20]. Phosphorylation is highly dynamic process, and the overlap of phosphorylation events identified in different experiments from various cell lines and tissues is relatively small. Sites observed to be phosphorylated in more than one high-througput experiment likely are modified in a more constitutive manner, or at least represent a more reliable dataset of phosphoserines.
We analysed the evolution of modification sites and their non-modified counterparts separately among eight vertebrates (human Homo sapiens; chimpanzee Pan trogolodytes; mouse Mus musculus; rat Rattus norvegicus; cow Bos taurus; dog Canis lupus familiaris; chicken Gallus gallus; and zebrafish Danio rerio), eleven fruit flies (Drosophila melanogaster; D. yakuba; D. erecta; D. sechecellia; D. ananassae; D. pseudoobscura; D. persimilis; D. wilistoni; D. mojavensis; D. virilis; D. grimshawi) and fifteen fungi (Saccharomyces cerevisiae; S. paradoxus; S. mikatae; S. bayanus; Candida glabrata; S. castelli; Kluyveromyces waltii; K. lactis; Ashbya gossypii; Debaryomyces hansenii; C. albicans; Yarrowia lipolytica; Aspergillus nidulans; Neurospora crassa; Schizosaccharomyces pombe). Orthologs of modified H. sapiens proteins were obtained from HomoloGene ; for D. melanogaster, from FlyBase ; and for S. cerevisiae, from FungalOrthogroups . Only orthologs with the highest identity to the modified protein were selected from each species. Multiple alignments were constructed using ClustalW .
As mentioned above, the evolutionary features and frequencies of phosphoserines may depend on structural context. Especially, phosphoserines tend to occur within intrinsically disordered regions of proteins [6–8]. To take this into account, we analysed serines from disordered regions and ordered regions of phosphoproteins separately. Intrinsically disordered regions were predicted by PONDR VSL2 .
For each phosphorylated serine, we have reconstructed the evolution of this site in the corresponding taxonomical group using a fast modification of the maximum likelihood algorithm (A. Goland, in preparation). Since we cannot reconstruct the moment in evolution when a residue had become modified, we assumed that it coincides with the oldest residue of the given type in a given tree (Figure 1). Then we calculated the number of substitutions of ancestral putative modification sites to other amino acids, and calculated the vectors of substitution frequencies.
Only a fraction of phosphoserines from the initial datasets were aligned to other types of amino acids in our data, and very small number of them occurred in ordered regions. Thus further analyses were performed only for serines from regions predicted to be intrinsically disordered. The final datasets of phosphorylated and non-phosphorylated serines included only sites that experienced at least one substituition to other types of amino acids and originated from disordered regions of phosphoproteins (Table 1). Some phosphorylation events were observed in more than one experiment, and this subset was also analyzed separately.
The control sets consisted of non-modified serine residues from disordered regions of the same proteins. To measure the statistical significance of the difference between substitution vectors of modified and non-modified serines we performed bootstraping of control sets. To do that that, we generated 10000 random control sets of non-phosphorylated serines. Each control set was of the same size as the corresponding phosphorylated set (generic sets and subsets of reliable phophosites).
Structural features of phosphoserines may not be limited to disorder of surrounding protein regions, and may include other specific properties such as secondary structures, solvent availability etc. Therefore, to maximally eliminate the confounding effects, we created additional control sets containing non-modified serines located at the same protein regions as modification sites. Non-modified serines, was collected at the maximal distance of 10, 11 and 9 amino acid residues from phosphoserines, for yeast, fruit fly and human respectively. Again, the size of the control sets was the same as the size of the respective phosphoserine sets.
Differences in the substitution vectors between phosphorylated and non-phosphorylated serines from disordered regions varied among different groups of organism, but some trends were stable and significant (Figure 2). Rather unexpectedly, we did not observe any preference for substitution of phosphoserines to other aminoacids that may be phosphorylated, that is as threonine and tyrosine. At the same time, phosphorylation converts serine into a negatively charged amino acid, and, as one can see in Figure 2 in all three datasets phosphoserines are more frequently substituted to aspartate and glutamate than non-phosphorylated serines. In both cases the substitution rates of phosphoserines are much higher than in all bootstraps of control sets (P-value << 10-4). In the case of the more reliable subsets of phosphoserines observed in several experiments, the subtitution rate to aspartate and glutamate is even higher, and also lies outside the interval of bootstraps that in this case is wider, as the sample size is smaller. At that, artificial substitution of serine to aspartate and glutamate, called phosphomimetic mutation, is widely used to confirm phosphorylation of serine [26, 27].
There are considerable other shifts of substitution rates common to all three taxa. Particularly, phosphoserines are relatively rarely substituted to alanines and cysteines (Figure 2). However, in these cases, the control-set substitution vectors of non-phosphorylated serines located in the same regions as phosphoserines were also shifted in the same direction as phosphoserines (as compared to all non-phosphorylated serines). Hence, these shifts are likely related not to modifications, but to specific features of these regions.
The rates of substitutions to aspartate and glutamate in the additional control sets of nearest non-phosphorylated serines also are not shifted, with the exception of vertebrates where they are also shifted toward higher values (but still to a much weaker extent than in case of phosphoserines). Note that these control sets may be contaminated by phosphoserines. Indeed, phosphoserines tends to co-occur, forming clusters . Therefore the sets of nearest non-phosphorylated serines likely contain phosphoserines which were not detected yet. Removing these phosphoserines would increase the significance of our observations.
The comparison with nearest non-phosphorylated serines takes into account the fact that phosphoserines tend to occur in intrinsically disordered regions. Methods used in large-scale phosphoproteomic experiments are based on selection of negatively charged peptides which results in a bias towards enrichment of phosphopeptides with acidic residues [29, 30]. This fact, coupled with the fact that phosphoserines may shift positions within rapidly evolving disordered regions  and general problems of alignments of such regions could distort our analysis. But this would have the same influence on our control sets of non-modified serines from the same regions of proteins. Hence the observed differences between these controls and phosphoserines cannot be explained by such artifacts.
In addition to serine phosphorylation, we analysed the evolution of another abundant type of protein modification, lysine acetylation. Recently two large datasets of human acetylation sites became available [32, 33]. We observed some differences between substitution vectors of acetylated and non-acetylated lysines, but the results obtained for these two sets of acetyllysines were discordant (data not shown). As noted in one of these papers , the spectrum of acetylated proteins is different between these two datasets obtained from different tissues. We observed that less than 2% of sites are common for both datasets. It is seems that the available acetyllysine data are not sufficient for meaningful analysis.
It should be taken into account that our substitution vectors are probably enriched with false-positive phosphosites. This results from of our over-simplified assumption that a site is modified from the first appearance of the corresponding residue in the evolutionary record. Additionally, phosphoserines from large-scale experiments may be false-positive sites. There is evidence that many phosphorylation sites could be non-functional or non-specific, as sometimes functional targets of phosphorylation are not particular sites, but entire protein regions [31, 34, 35]. On the other hand, the control sets could contain not yet detected phosphoserines. These false positives and false negatives should blur the differences between the substitution vectors of modified and non-modified residues. Most likely, the real level of differences is higher than the one observed here.
Reviewer's Report 1
Reviewer 1: Arcady Mushegian - Stowers Institute, Kansas City, USA
The idea of comparing of evolutionary substitution patterns of modified and non-modified residues in proteins is good, and the approach proposed by the authors, i.e., to reconstruct, using an ML model, the point at which the target of modification first emerged and then to see what it mutates to, is probably the only computational approach plausible at the moment.
I trust the authors that their implementation of this approach is technically sound, but, unfortunately, this is hard to ascertain from the submitted version of the manuscript, which reads as a preliminary draft devoid of the quantitative details. This has to change - please provide at least the following:
1. The collection of phosphorylated and acetylated sites: how many sites of each type in each organism are there?
A table with a description of the final datasets used for the construction of substitution vectors has been added to the revised version (Table 1).
2. The phosphorylation sites at least (also acetylated sites?) are said to occur more often in the intrinsically disorded regions. Taking the non-globular regions in the proteins (which can be identified, e.g., using Wootton and Federhen's SEG program) as a proxy for "intrinsic disorder", can it be said that the actual sample of modified residues that the authors were working with is indeed more commonly occurring in such regions? And how does this sit with the ability to align the proteins in these regions?
We predicted intrinsically disordered regions and recalculated substitution vectors separately for serine residues from disordered and ordered regions. Most of phosphoserines from the initial datasets came from protein regions predicted to be disordered (Table 1). Problems with alignments of such region are discussed in the revised version. Additional controls of non-modified serines from same regions of proteins were introduced to address this problem.
3. The "control sets" of non-modified serines (more accurately, not-observed-to-be-modified serines): are these found in the disordered/non-globular regions to the same extent as the modified ones? If not, the controls may be biased with regard to amino acid composition and to the regions of the protein molecules (e.g., buried vs exposed) - test this directly please.
Indeed, the amino acid composition of disordered regions and regions with a regular structure differs strongly. As described in response to comment #2, in the revised version we considered both phosphosrylated and non-phosphorylated serines from disordered and regular regions separately. Moreover, as discussed in the revised text, sets including only closest non-modified serines provide an even better control for artifacts that could be caused by specifics of regions surrounding modification sites.
4. The trends that the authors discuss are interesting but weak - to what extent this may be explained by the small sample sizes? What was the statistical test for which the P-values are reported?
The initial dataset of serines were large enough, but only a fraction of them were substituted to other amino acids as demonstrated by evolutionary reconstruction. The final datasets are described in Table 1.
To measure the statistical significance, we used bootstraps of control sets of non-modified serines. For all phosphoserines and, separately, for the subset of phosphoserines observed in more than one experiment, we generated 10000 random sets of non-modified serines of appropriate size. For additional controls using neighbouring sites, we compiled sets of nearest serines of the same size as the corresponding sets of phosphorylated serines.
5. In vertebrates, the "neighboring" serines from control set 2 seem to be faithfully following the trend towards change into D or E, with some separation from the control set 1. If this trend withstands the possible correction proposed in #2, perhaps this means that, in a "disordered" region that has several serines, any or all of them may targets of phosphorylation. Perhaps then it would be interesting to sum the substitution vectors over the region that has several serines, at least one of which is phosphorylated (i.e., how likely is it that at least one serine in this region is substituted by amino acid X?)
The phosphoserines tends to cluster in the sequence . Thus, as discussed in the revised version, the control set consisting of nearest non-phosphorylated serines could be contaminated by false-negative phosphoserines, not yet detected in experiments. On the other hand, as the trend in the control set of nearest serines is weaker, averaging of the substitution vectors would simply dilute the observation.
Reviewer's Report 2
Reviewer 2: Sandor Pongor - International Centre for Genetic Engineering and Biotechnology, Trieste, Italy
There is mounting evidence in recent years that the study of post-translational modifications has important lessons for understanding diverse aspects of protein evolution. It has been noted among others that phosphorylated sites tend to occur in those segments of the proteins that are intrinsically disordered and/or correspond to alternative splice sites. Currently there are insufficient data on the conservation of modified sites. Kurmangalyev and associates address this problem using carefully selected datasets and well-designed statistical analyses.
The authors conclude that there are significant differences in the evolution of modified and corresponding non-modified amino acids. In particular, phosphoserines are more frequently substituted to aspartate and glutamate, compared to non-phosphorylated serines. Similarly, acetyllysines are more rarely substituted to isoleucine and valine. These findings underline the importance of post-translational modifications when discussing the variation of residue conservations within sequence regions. The methodology is straightforward and sound and will be a useful template for future studies. The authors may want to add a few examples for situation where this approach can or can not be used.
As discussed in the revised text, the analysis of a newly available dataset of human acetylation sites  did not confirm our initial observations. This is likely due to low reproducibility of currently available datasets of avetyllysines (the overlap between two datasets is extremely small). This suggests that conclusions based on such analyses should be done carefully, on data obtained from different sources and for a variety of organisms. We have encountered a similar problem with phosphothreonines and phosphotyrosines, where the datasets were simply too small for reliable conclusions.
Mann M, Jensen ON: Proteomic analysis of post-translational modifications. Nat Biotechnol. 2003, 21: 255-261. 10.1038/nbt0303-255.
Seo J, Lee KJ: Post-translational Modifications and Their Biological Function: Proteomic Analysis and Systematic Approaches. Journal of Biochemistry and Molecular Biology. 2004, 37: 35-44.
Hunter T: Signaling-2000 and beyond. Cell. 2000, 100: 113-127. 10.1016/S0092-8674(00)81688-8.
Cohen P: The origins of protein phosphorylation. Nat Cell Biol. 2002, 4: E127-E130. 10.1038/ncb0502-e127.
Ptacek J, Snyder M: Charging it up: global analysis of protein phosphorylation. Trends Genet. 2006, 22: 545-554. 10.1016/j.tig.2006.08.005.
Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK: The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004, 32: 1037-1049. 10.1093/nar/gkh253.
Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M: PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol. 2007, 8: R250-10.1186/gb-2007-8-11-r250.
Collins MO, Yu L, Campuzano I, Grant SG, Choudhary JS: Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder. Mol Cell Proteomics. 2008, 7: 1331-1348. 10.1074/mcp.M700564-MCP200.
Kurmangaliev EZh, Gel'fand MS: [Alternative splicing tends to involve phosphorylation sites]. Mol Biol (Mosk). 2009, 43: 572-574.
Macek B, Gnad F, Soufi B, Kumar C, Olsen JV, Mijakovic I, Mann M: Phosphoproteome analysis of E. coli reveals evolutionary conservation of bacterial Ser/Thr/Tyr phosphorylation. Mol Cell Proteomics. 2008, 7: 299-307.
Malik R, Nigg EA, Körner R: Comparative conservation analysis of the human mitotic phosphoproteome. Bioinformatics. 2008, 24: 1426-1432. 10.1093/bioinformatics/btn197.
Boekhorst J, van Breukelen B, Heck A, Snel B: Comparative phosphoproteomics reveals evolutionary and functional conservation of phosphorylation across eukaryotes. Genome Biol. 2008, 9: R144-10.1186/gb-2008-9-10-r144.
Bodenmiller B, Campbell D, Gerrits B, Lam H, Jovanovic M, Picotti P, Schlapbach R, Aebersold R: PhosphoPep - a database of protein phosphorylation sites in model organisms. Nat Biotechnol. 2008, 26: 1339-1340. 10.1038/nbt1208-1339.
Bodenmiller B, Malmstrom J, Gerrits B, Campbell D, Lam H, Schmidt A, Rinner O, Mueller LN, Shannon PT, Pedrioli PG, Panse C, Lee HK, Schlapbach R, Aebersold R: PhosphoPep - a phosphoproteome resource for systems biology research in Drosophila Kc167 cells. Mol Syst Biol. 2007, 3: 139-10.1038/msb4100182.
Hilger M, Bonaldi T, Gnad F, Mann M: Systems-wide analysis of a phosphataseknock-down by quantitative proteomics and phosphoproteomics. Mol Cell Proteomics. 2009, 8: 1908-1920. 10.1074/mcp.M800559-MCP200.
Gnad F, de Godoy LM, Cox J, Neuhauser N, Ren S, Olsen JV, Mann M: High-accuracy identification and bioinformatic analysis of in vivo protein phosphorylation sites in yeast. Proteomics. 2009, 9: 4642-4652. 10.1002/pmic.200900144.
Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M: Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell. 2006, 127: 635-648. 10.1016/j.cell.2006.09.026.
Daub H, Olsen JV, Bairlein M, Gnad F, Oppermann FS, Körner R, Greff Z, Kéri G, Stemmann O, Mann M: Kinase-selective enrichment enables quantitative phosphoproteomics of the kinome across the cell cycle. Mol Cell. 2008, 31: 438-448. 10.1016/j.molcel.2008.07.007.
Oppermann FS, Gnad F, Olsen JV, Hornberger R, Greff Z, Kéri G, Mann M, Daub H: Large-scale proteomics analysis of the human kinome. Mol Cell Proteomics. 2009, 8: 1751-1764. 10.1074/mcp.M800588-MCP200.
Olsen JV, Vermeulen M, Santamaria A, Kumar C, Miller ML, Jensen LJ, Gnad F, Cox J, Jensen TS, Nigg EA, Brunak S, Mann M: Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Sci Signal. 2010, 3: ra3-10.1126/scisignal.2000475.
Wheeler Geer LY, Marchler-Bauer A, Geer RC, Han L, He J, He S, Liu C, Shi W, Bryant SH: The NCBI BioSystems database. Nucleic Acids Res. 2010, 38: D492-D496. 10.1093/nar/gkp858.
Tweedie S, Ashburner M, Falls K, Leyland P, McQuilton P, Marygold S, Millburn G, Osumi-Sutherland D, Schroeder A, Seal R, Zhang H, The FlyBase Consortium: FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res. 2009, 37: D555-D559. 10.1093/nar/gkn788.
Wapinski I, Pfeffer A, Friedman N, Regev A: Natural history and evolutionary principles of gene duplication in fungi. Nature. 2007, 449: 54-61. 10.1038/nature06107.
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.
Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z: Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics. 2006, 7: 208-10.1186/1471-2105-7-208.
Tarrant MK, Cole PA: The chemical biology of protein phosphorylation. Annu Rev Biochem. 2009, 78: 797-825. 10.1146/annurev.biochem.78.070907.103047.
Song Q, Pallikkuth S, Bossuyt J, Bers DM, Robia SL: Phosphomimetic mutations enhance phospholemman oligomerization and modulate its interaction with the NA/K-ATPase. J Biol Chem. 2011.
Schweiger R, Linial M: Cooperativity within proximal phosphorylation sites is revealed from large-scale proteomics data. Biol Direct. 2010, 5: 6-10.1186/1745-6150-5-6.
Ficarro SB, McCleland ML, Stukenberg PT, Burke DJ, Ross MM, Shabanowitz J, Hunt DF, White FM: Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat Biotechnol. 2002, 20: 301-305. 10.1038/nbt0302-301.
Mann M, Ong SE, Grønborg M, Steen H, Jensen ON, Pandey A: Analysis of protein phosphorylation using mass spectrometry: deciphering the phosphoproteome. Trends Biotechnol. 2002, 20: 261-268. 10.1016/S0167-7799(02)01944-3.
Holt LJ, Tuch BB, Villén J, Johnson AD, Gygi SP, Morgan DO: Global analysis of Cdk1 substrate phosphorylation sites provides insights into evolution. Science. 2009, 325: 1682-1686. 10.1126/science.1172867.
Zhao S, Xu W, Jiang W, Yu W, Lin Y, Zhang T, Yao J, Zhou L, Zeng Y, Li H, Li Y, Shi J, An W, Hancock SM, He F, Qin L, Chin J, Yang P, Chen X, Lei Q, Xiong Y, Guan K: Regulation of Cellular Metabolism by Protein Lysine Acetylation. Science. 2010, 327: 1000-1004. 10.1126/science.1179689.
Choudhary C, Kumar C, Gnad F, Nielsen ML, Rehman M, Walther TC, Olsen JV, Mann M: Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science. 2009, 325: 834-840. 10.1126/science.1175371.
Landry CR, Levy ED, Michnick SW: Weak functional constraints on phosphoproteomes. Trends Genet. 2009, 25: 193-197. 10.1016/j.tig.2009.03.003.
Tan CS, Jørgensen C, Linding R: Roles of "junk phosphorylation" in modulating biomolecular association of phosphorylated proteins?. Cell Cycle. 2010, 9: 1276-1280. 10.4161/cc.9.7.11066.
We are grateful to Dmitry Malko, Ekaterina Ermakova and Anna Lyubetskaya who shared their programs and data, and to Stefka Tyanova and Jürgen Cox for useful discussions. This study was partially supported by the state contract 2.740.11.0101, Russian Foundation of Basic Research (09-04-92745), and program "Molecular and Cellular Biology" of the Russian Academy of Sciences.
The authors declare that they have no competing interests.
YK and MG conceived the study. YK compiled the data. AG developed algorithms. YK and MG performed calculations. YK and MG analyzed the results and wrote the paper. All authors have approved the final version.
About this article
Cite this article
Kurmangaliyev, Y.Z., Goland, A. & Gelfand, M.S. Evolutionary patterns of phosphorylated serines. Biol Direct 6, 8 (2011). https://doi.org/10.1186/1745-6150-6-8
- Phosphorylation Event
- Candida Glabrata