Reviewer's report 1
Sandor Pongor, International Centre for Genetic Engineering and Biotechnology, Padriciano 99, I-34012, Trieste, Italy
Detecting positive selection at the DNA sequence level is of substantial interest in view of the role it may play in pathogenic events. Glazko and coworkers show, using three statistical tests developed for the purpose, that the mutational hotspots of the TP53 gene evolve by positive selection. In view of the general interest of the topic I sought the advice of Dr. Lawrence Banks who is a biologist working on p53 mutations. We both felt that the calculations are thoroughly planned and the results support the main message of the paper.
The presentation of the paper could however be improved with special respect to the wide audience of Biology Direct. The most important criticism is that p53 mutations can lead both to inactivation and/or to GOF type changes, and these two groups may need to be analyzed separately. Currently, the reader may not see clearly if positive selection was found only in the case of GOF-type mutations or also in the case of inactivating mutations.
Author response: We agree that it should be stated with full clarity that both gain-of-function and loss-of-function mutations in tumor suppressors, in particular, p53, are important. Therefore such changes have been made in several places in the manuscript; in particular, see the last paragraph in the Results and Discussion section.
Minor points:
In the Background section it might be useful to add 1) a brief description of mutation types found in p53 as well as their biological roles; and 2) a paragraph describing the mathematical approaches to detecting positive selection. These sections may help the reader in understanding what has been done and what is being accomplished in this work.
Author response: We decided not to expand the Background section because both issues are already addressed there, even if briefly, and the reader interested in methods for detecting positive selection is referred to Ref. [13].
It may be useful to carry out the statistics on subgroups (GOF or inactivation)
Author response: Some statistics on this point is available in Ref. [16] (Table 1). However, it has to be realized that, although we detect the statistical excess of non-synonymous over synonymous substitutions, the tests describe here, by themselves, do not allow us to assign an individual substitution to the gain-of-function or loss of function category. Again, an attempt to address this issue is given in Ref. [16]but the number of mutations for which the distinction could be made is quite small. We make comments to that effect in the revised discussion.
In order to show the strength of the statistical methods presented here it might be useful to consider tests similar to those described by Jianzhi Zhang (Mol. Biol. Evol. 21(7): 1332–1339, 2004).
Author response: The statistical analysis presented here was done within a very different conceptual framework from that described by Zhang (maximum likelihood models). The present tests employed the multiple test (Bonferroni) correction and, accordingly, were highly conservative.
Reviewer's report 2
Christopher Lee, Department of Chemistry, University of California-Los Angeles, Los Angeles, CA, USA
This paper extends the authors' previous work indicating evidence of positive selection in p53 "hotspot" mutations, to show that non-synonymous mutations show a significantly greater tendency to cluster (in "hotspots") than do synonymous mutations, even when some mutational biases are taken into account. This work addresses an important biomedical question, and provides an advance, albeit incremental. I do have some questions which might benefit from further analysis by the authors:
1. Both in the abstract and introduction, the authors emphasize the importance of taking into consideration the effect of "nucleotide context" on mutational bias, as a motivation for this study. However, as I understand it, this study takes into consideration nucleotide composition (i.e. frequency of single nucleotides), not nucleotide context (e.g. frequency of nucleotide triplets, to consider the effect of one adjacent nucleotide on either side of the nucleotide under study). Since nucleotide context can have large effects on mutation rate (e.g. CpG effects), this is an important issue. For the very reasons that the authors articulated in their Introduction, many readers will expect direct tests of whether nucleotide context affects the authors' results.
The difficulty, of course, is that it is harder to match nucleotide context (e.g. triplet frequencies, 64 different numbers) than nucleotide composition (just 4 numbers). The NSMC procedure would probably not be able to construct samples with matching triplet frequencies, without some modifications. One possible solution would be to include ALL sites (including unmutated sites, instead of just sites where mutations were observed) in the analysis. First, generate a random sample of synonymous sites (a specific number of sites, with a specific triplet profile, and a specific number of observed mutations).
Now generate a random sample of non-synonymous sites of the same size, with the same triplet profile. Finally, generate equal-sized random samples of mutations from each set of sites, and analyze the number of "hotspots" as in the NSMC method. Including non-mutated sites in this sampling process should make it possible to match the triplet profiles between the syn vs. non-syn samples, and I don't see a reason why non-mutated sites should be excluded.
If such analysis is practical I think it could greatly strengthen the paper, by directly addressing the question of nucleotide context. At any rate, the existing analysis in the manuscript should be clearly described as testing "nucleotide composition" not "nucleotide context", and the difference between these should be emphasized. The authors should point out that even if composition is controlled for, nucleotide context could have large effects on mutation rate, so the current results should be interpreted with some caution.
Author response: The CpG effects have been accounted for in the NSMC test; to emphasize this, we mention this control in the revised abstract. However, the currently available data on somatic mutations is insufficient to examine other, subtler effects of the nucleotide context. As for including non-mutated sites, we were concerned that this approach could lead to uncontrollable increase in the error rate due to the different and unknown intrinsic mutation rates of different sites.
2. The NSMC analysis, while conceptually simple, needs to be described in more detail, in the Methods section. Currently, there is only an outline of NSMC, presented in the Results section, which leaves out many details (e.g. sampling with replacement or without replacement? I assume the latter), such that one could not replicate the calculation with any confidence that equivalent results would be obtained from the same input data.
Author response: Indeed, sampling without replacement was employed, and this is mentioned in the revised legend to Figure 1. Otherwise, however, we felt that the description of the test was sufficient for reproduction.
3. The manuscript frequently uses the term "positive selection", in a way that sometimes seems like a catch-all name for any significant divergence from the purely "mutational" process represented by synonymous sites. This may confuse readers who think of positive selection in terms of the very specific meaning Ka/Ks > 1, since that is not what this paper shows. Instead, the NNH>NSH "more hotspots" criterion gets at a somewhat different issue, namely the clustering of observed mutations at certain sites ("hotspots").
First, it should be noted that such clustering could be produced without Ka/Ks>1. For example, if most codons had Ka/Ks = 0.1, and a few sites had Ka/Ks = 1, this also could give rise to more "hotspots" compared with the synonymous sample (where no variability in selection occurs from site to site). Indeed, even if Ka/Ks = 1 everywhere, the fact that there are typically twice as many non-synonymous mutations than synonymous mutations at each codon could in principle give NNH>NSH. I think the authors should address this issue in the manuscript, either by providing control tests showing that their results cannot be explained by such models, and/or by mentioning such issues in the Discussion.
Second, the authors may want to replace a number of occurrences of the phrase "positive selection" with something more precise for their results, e.g. "selection for non-synonymous mutations at specific sites (hot spots), relative to their less frequent occurrence at other non-synonymous sites or at synonymous sites"; or just "evidence of selective pressure at hotspots". When the authors really want to use the phrase "positive selection", it would be useful to cite direct evidence that Ka/Ks > 1 for at least a subset of the sites.
Author response: We already know that, at least in the case of somatic mutations of p53, Ka/Ks >> 1 (Table 1and Ref. [14]) which implies positive selection in the traditional sense. In this paper, we addressed a specific issue of origin of hotspots using different tests, within the "selection vs. mutation" framework. We believe that the NSB test adequately tests the hypothesis that "...Ka/Ks > 1 for at least a subset of the sites".
4. Since I'm not in the p53 field, it's unclear to me how cancer researchers can make use of the specific data presented in this paper. Perhaps the authors could add some further discussion of this to the paper.
Author response: The last paragraph of the revised Results and Discussion section addresses this issue.
Reviewer's report 3
Mikhail Blagosklonny, Cancer Center, Ordway Research Institute, Albany, NY, USA
This study has demonstrated a selective advantage for hot spot p53 mutants compared with rare mutants. This has a biological meaning. p53 proteins form tetramers. Mutant p53 can either inactivate wt p53 or complement mutant p53, depending of particular mutation. Also, mutant p53 interacts with p63 and p73, thus modulating their functions. Similarly, the distinction between tumor suppressors and oncogenes might be blurred for p63 and p73, see: Mills AA. p63: oncogene or tumor suppressor? Curr Opin Genet Dev. 2005 Dec 13; in press.
Author response: Unfortunately, large collections of mutations are unavailable for either p63 or p73.