Extensive sequence turnover of the signal peptides of members of the GDF/BMP family: exploring their evolutionary landscape
© Veitia and Caburet. 2009
Received: 22 May 2009
Accepted: 16 July 2009
Published: 16 July 2009
Skip to main content
© Veitia and Caburet. 2009
Received: 22 May 2009
Accepted: 16 July 2009
Published: 16 July 2009
We show that the predicted signal peptide (SP) sequences of the secreted factors GDF9, BMP15 and AMH are well conserved in mammals but dramatic divergence is noticed for more distant orthologs. Interestingly, bioinformatic predictions show that the divergent protein segments do encode SPs. Thus, such SPs have undergone extensive sequence turnover with full preservation of functionality. This can be explained by a pervasive accumulation of neutral and compensatory mutations. An exploration of the potential evolutionary landscape of some SPs is presented. Some of these signal sequences highlight an apparent paradox: they are encoded, by definition, by orthologous DNA segments but they are, given their striking divergence, examples of what can be called functional convergence.
This article was reviewed by Fyodor Kondrashov and Eugene V. Koonin.
A typical signal peptide (SP) involves a hydrophobic alpha-helical region which is called the h-region. This hydrophobic segment is generally shorter (i.e. approximately 7–15 residues) than required for a transmembrane helix. The h-region is close the N-terminus of the protein but it is generally preceded by a slightly positively charged n-region which is variable in length (i.e. 1–12 residues). The cleavage site for the signal peptidase lies between the h-region and a c-region, a stretch of 3–8 amino acid involving rather polar and uncharged amino acids [1, 2].
TGFβ superfamily members are secreted proteins that play important roles in developmental and physiological processes in mammals and other organisms. They are classified into the TFGβ/Nodal/Activin group and the BMP/GDF group [3–7]. In our analysis, we will focus on the SPs of some members of the BMP/GDF group, namely BMP15, GDF9 and AMH. BMP/GDF factors are synthesized as inactive precursors (pre-proproteins). They comprise an N-terminal SP, a propeptide and a mature region located at the C-terminal part of the protein. Thus, production of the mature bioactive polypeptide requires extensive post-translational processing: SP removal, dimerisation and further cleavage [8, 9].
Percentage of protein sequence identity for proteins of the BMP/GDF family, between human and mouse and between human and chicken, for the total protein and the relevant signal peptide.
H. sapiens versus M. musculus % sequence identity
H. sapiens versus G. gallus % sequence identity
Wright (1964) and Kimura (1990) have defined compensatory mutations as those masking the deleterious effect of another mutation or as mutations that are independently deleterious but neutral when combined [12, 13]. However, this criterion can be relaxed to non deleterious mutations that compensate for the effects of potentially deleterious ones. The most obvious cases of canonical compensatory mutations are provided by alterations affecting the secondary helical structures of tRNA and rRNA molecules, whose effects are counterbalanced by changes restoring base pairing [14, 15]. Compensatory mutations in the context of proteins and cis-regulatory sequences are also well known .
The analysis of the SP of some members of the TGFβ superfamily is very instructive to understand the evolution of neutral and compensatory mutations in protein coding regions. Although it is a difficult exercise to predict the sequence of events leading to the emergence of several compensated mutations, we will explore this issue with the particularly interesting case of the SP of GDF9 in mammals.
Figure 2B shows the simplest example of compensatory mutation: in Papio anubis, the SP of GDF9 displays only 2 divergences from the consensus sequence. The first one involves a C in the consensus and a Y in the sequence from Papio anubis. The 'mutation' C12Y (in the context of the consensus), is predicted to be deleterious when present alone, as it drives down the SP activity to 51% of that of the consensus. The second mutation I21V has a slightly positive effect, yielding 113% of the activity of the consensus sequence (when alone), and is able to compensate the negative impact of the first one, as the two mutations together provide 94% of the consensus activity.
Given that compensated mutations separately can be disadvantageous and that they are unlikely to appear concertedly, one puzzling aspect is the sequence of steps leading to their appearance. Theoretical studies of compensatory mutations related to RNA secondary structure show that "almost all bases [of the RNA molecule] can be substituted sequentially without ever changing the shape [phenotype] of the molecule" . This is linked to the existence of neutral mutational networks that opens the possibility of changing the genotype while preserving the phenotype [18, 19]. In the context of the murine SP of GDF9, the simplest scenario would predict that the first change expected to have appeared in the ancestral murine sequence is I21S (mutation 5) which does not alter SP activity (and might even increase it), and that the other mutations might have appeared later in whatever order, because they are always compensated by S21.
Rapid sequence turnover also bears important practical consequences. For instance, we have recently detected a BMP15 mutation leading the potentially damaging variant S5R in a patient with severe ovarian dysfunction . This mutation lies within the SP of BMP15, and very recently, Rossetti et al. found that it decreases significantly the activity/amount of the secreted protein . This is in agreement with our in silico analyses using Phobius and SignalPep that predicted a quantitative alteration of SP processing. In order to further assess the potential deleterious effect of this amino acid change we used the SIFT software, which uses protein sequence conservation data and the physicochemical properties of amino acids to calculate the probability for an amino acid substitution of being deleterious . Ser5 in BMP15 is conserved in vertebrates ranging from the zebrafish to mammals. However, the divergent chicken sequence does have an obviously compensated arginine at position 5. Thus, depending on the inclusion or exclusion of the chicken sequence in the alignment, the mutation p.S5R is predicted to be either very pathogenic or not pathogenic at all.
It is known that sex and reproduction-related genes, as is the case of BMP15, GDF9 and AMH [23, 24], can have increased evolutionary rates . This might explain at least in part the divergence of the SPs observed here. For some of these SPs, sequence turnover is so important that the original protein segments have been almost entirely replaced by new sequences, fully retaining a SP function. This can be explained by an important accumulation of neutral and compensatory mutations through an evolutionary scale. Thus, sequences encoding SPs deriving from common ancestral sequences (which is obvious from the underlying gene structures and homology outside the SP regions) can be highly divergent at present: they are orthologous by definition but their corresponding encoded peptides are by definition functionally convergent.
Fyodor Kondrashov, Centre for Genomic Regulation, Barcelona, Spain
Review of Veitia and Caburet, titled "Extensive turnover of the signal peptides of some members of the GDF/BMP family: whatever happened to these sequences?
This is a beautiful story of the compensatory nature of signal peptide evolution and one of the few attempts out there to actually define the nature of fitness ridges in protein space. Figure 2C is a wonderful depiction of one of the most important questions in macroevolution – how different genotypes are connected in fitness space. The use of a consensus sequence as a rough representative of the ancestral state is one of my favorite ideas and, of course, has its biases, but they should be relatively small if used in a correct phylogenetic setting as has been done here. More works should replicate what Veitia and Caburet have done here.
1) I think that the title of the paper does not do justice to the content. I think that the most wonderful result is not the extensive turnover of sequence (we know this must be possible because we see many different highly divergent orthologs) but in the fact that the authors can reproduce the fitness landscape of an entire functional unit.
Authors' response: We agree with the referee and we have changed the title accordingly.
2) The compensation of a deleterious allele by a neutral variant is known as a Dobzhansky-Muller incompatibility, and has been described in a molecular level as compensations of disease mutations.
Authors' response: Dobzhansky-Muller incompatibility has indeed been described in eukaryotic hybrids, and the molecular basis of this incompatibility is thought to be due to mismatches in macromolecular complexes and cellular networks (failure of intergenic compensation). Nevertheless, we failed to link this to our findings, which imply intragenic compensation.
3) Overall the methodology is clear, but I think that it would be good to have a formal methods section that describes the approaches more generally. In particular, it is not clear to me why the specific 6 sites shown in Figure 2C are shown (or if these are the differences between consensus and mouse, why the mouse and not the horse?). In general, would it be possible to devise a scheme similar to that in Figure 2C that takes into account more species, and a greater number of sites and states in those sites? I realize that the number of possible combinations increases exponentially with the number of sites, but perhaps the authors could focus on actually defining the path that evolution may have traversed in these species: removing combinations that probably have not yet been observed in evolution may help simplify the schematics. I believe that these fitness ridges are the most exciting part of this research and expanding on this would give us wonderful insights into evolution.
Authors' response: Although we would be glad to answer positively to such an enthusiastic request, we failed to devise a proper way to draw a diagram that would take into account all the listed possibilities (more species, more sites, several states for one site). Nevertheless, we expanded this part to include methodological explanations on the general approach and to display additional examples, both simple (Figure 2A ) and more sophisticated (heptagonal diagram, Figure 4 ).
4) The question at the very end of the discussion I think is too simple. Surely, homology (orthology) is defined by common ancestry and not by the practical limitations of being able to identify it.
Authors' response: We agree with the referee. However, our point here was not that we were not able to recognize orthology, which could be properly done. Instead, we wanted to highlight the apparent paradox posed by a subset of the signal peptides analyzed. Indeed, some of them are completely divergent in spite of being encoded by orthologous DNA fragments, yet they are strong SPs. In our opinion, they are clear examples of functional convergence.
Eugene V. Koonin, National Center for Biotechnology Information, NIH, USA.
In this very interesting brief paper, Veitia and Caburet examine the conservation and divergence of the signal peptides in three sets of growth factors of the BMP/GDF family. This analysis reveals an anomalous pattern of evolution whereby the signal peptide sequences are highly conserved within mammals but are extremely divergent in non-mammal vertebrates, despite conservation of the salient properties of signal peptide. They then hypothesize that this extreme divergence involves compensatory mutations and apply a straightforward but clever approach to demonstrate the existence of such compensation by measuring the change in the quality of signal peptide prediction in different mutants. The effect of compensation indeed comes out loud and clear. This is an interesting observation of a rather general appeal (despite the fact that the analyzed data set is quite small) because rarely can compensatory amino acid replacement be demonstrated in such an explicit manner.
Although the work is interesting and certainly worth publishing, I think there are areas where considerable improvement is possible, and then, I am also confused about one of the conclusions. First potential improvements, then the confusion.
1) The signal peptides are short, so the number of mutations involved is rather small. Thus, at least in some cases, a complete enumeration of all sequences of replacements should be possible, and then, the neutral network and the attainable and prohibited paths in them can be presented explicitly. I think this would greatly increase the value of the work
Authors' response: Indeed, this is what we now propose in our figures2, 3and4, where we provide two examples of exhaustive enumeration of the possible combinations of mutations. Figure4shows an almost complete neutral network, whereas the diagram in Figure3Cdisplays neutral, deleterious and even slightly 'hypermorphic' mutations. Possible paths for evolution of this sequence can be deduced, starting from any single mutation and following the various lines inward, avoiding deleterious combinations. The central combination contains all mutations (i.e. it is the present state of the sequence in relevant species). We think that 7 mutations and their various combinations is the maximum that can be displayed clearly in a polygonal diagram.
2) I realize that this is a brief Discovery Note, and yet, I think it would be most helpful to include some background, that is, what is the characteristic level of conservation divergence of signal peptides within the same phylogenetic range. That would allow the reader to better assess the novelty of the results presented in the paper.
Authors' response: We have included a table (1) displaying the % of sequence idendity for the total protein and the relevant signal peptide for the BMP/GDF family, in human/mouse and human/chicken alignments. There are several possibilities, ranging from strong conservation in the three species including the signal peptide (BMP4), passing through cases of overall conservation excluding the signal peptide (GDF8), to cases of rather low conservation in the three species (BMP15).
3) My confusion: I do not understand why the authors talk about convergence in this case, and even use exclamation points to emphasize this conclusion. As far as I know, convergence implies that similar sequences (the same amino acid residues) evolve independently in different lineages, from dissimilar ancestors. The sequences in question may well be orthologs but the essence of convergence is that they start from distant points and get closer to each other in the course of evolution (cf. the famous case of monkey lysozymes studies by A. C. Wilson and coworkers). Is this what was observed here? I did not get this impression but, if this is the case, it should be made much more transparent.
Authors' response: The referee is right. Indeed, we should have insisted more on the fact that we were dealing with functional convergence for highly divergent sequences stemming from a common ancestral one.
Bone Morphogenetic Protein
Sma- and Mad-related (mothers against decapentaplegic) proteins
Transforming Growth Factor, Beta
R.A.V and S.C are supported by the ARC (Association pour la Recherche contre le Cancer), the University Paris Diderot-Paris7, the CNRS (Centre National de la Recherche Scientifique), the Inserm (Institut National de la Santé et de la Recherche Médicale). R.A.V is member of and supported by the IUF (Institut Universitaire de France).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.