Reviewer 1: Dr Mikhail Gelfand,
Institute for Information Transmission Problems, RAS, Bolshoi Karetny per. 19, Moscow 127994, Russia and Faculty of Bioengineering and Bioinformatics, Moscow State University, Vorobievy Gory 1-73, Moscow 119992, Russia.gelfand@iitp.ru
The authors present a model explaining the following observation: while the use of the UGA stop codon depends on G-content, the UAG frequency is almost constant in genomes with highly diverse G-content. While I see no problems with the observations and the model, I have some editorial comments and questions.
The authors state several times – starting with the very first sentence of the abstract – that the usage of stop codons has not been rigorously studied. This is not correct. In the 90’s, several papers considered the usage of stop codons and its dependence on the local context, including tandem stops and tetranucleotides involving stop-codons. I think these papers should be mentioned.
Author response: Indeed, the term “usage” in this context is not very precise. We acknowledge that there have been studies of stop codon usage in the local context, that is to say that some stop codons have a preferred local context, however, in this manuscript we discuss only the evolution and genomic frequencies of the three different stop codons, which to our knowledge has not been rigorously considered previously. We cite some of the relevant literature and use the word “frequency” which we believe is not as ambiguous as “usage” in this context.
How the 11 studied genome pairs were selected?
Author response: We selected all genome triplets with 0.03 < KS < 0.22 that were available in the ATGC database. We now report this in the Methods section.
Is the G/A content the same in the 3rd codon position in all codon pairs? If not, why this is a good parameter?
Author response: There are three pairs of two-fold degenerated codon families: AAG/A, GAG/A, CAG/A. G-content at the third position of every pair is indeed highly correlated with overall G-content (see the figure below).
Dependency between G content in the third position of two-fold degenerated codon families and overall G content for AAG/A (blue), GAG/A (red), CAG/A (green).
And in any case, what are the reasons to suspect that the selection regime in the amino-acid-encoding codons is the same as in the stops (the former may depend on concentrations of tRNAs and the codon-anticodon interactions; the latter, on interactions with the release factors). What about the A/G choice in the four-fold codon families?
Author response: Indeed, we have created the model based on this assumption because it allowed us to reduce the number of parameters and make the system of equations solvable. However, we can also show that this assumption does not affect our main result that the TAG codon is selectively disadvantageous. Specifically, from system of equations (4) it follows that exp(S
2
) = f
TGA
/f
TAG
. Thus, we can solve for the selective impact of TAG (S
2
) solely based on the frequencies of TAG and TGA without making the assumption that the selective regime is the same in stop and amino acid codons. Since S2 is positive for almost the entire range of G content it follows that the TAG codon provides a selective disadvantage relative to the TGA codon. Unfortunately, we cannot estimate S
2
by comparing the frequencies of TAG and TAA codons because we cannot independently estimate the component of fTAA from (4). We now present the new estimate of S
2
in Figure 5 and the main text.
The reasoning in page 6 is not clearly presented, and misprints add to the confusion. How is formula S2 = ln ((fG(1-fTAG))/fTAG) used? Do I understand it correctly that the next formula S2 = ln (3.6fG + 0.4) results from a fit to observations (comparison of genome pairs)? – I think, this should be explained more explicitly.
Author response: Yes, this is what we mean, and we rewrote this section to hopefully make this clearer.
By the way, the two formulas for S2, theoretical and observed ones, yield a dependence between fG and fTAG – does it hold?
Author response: Yes, there is a slight dependence as can be seen from Figure 1.
Finally, reference to equation (
5
) in the preceding paragraph should be about equation (
4
), and the sentence “S2 has a clear G-content dependence is well approximated…” probably should be “S2 has a clear G-content dependence that is well approximated…” .
Author response: If the referee means this sentence “Thus, selection on G-content,, affects only G-content itself and does not change the form of the relationship between G frequency and stop codon usage as is evident from expressions (4).” then we mean that in the system of equations (4) G-content (f(taa,tga,tag) does not depend on S1. The other typo is corrected.
Polarization of substitutions using parsimony may be dangerous if there is selection towards a specific, preferred nucleotide: in some cases two parallel nonpreferred-to-preferred substitutions may occur, and they will be interpreted as a single preferred-to-nonpreferred substitution, hence skewing the substitution statistics.
Author response: This is true, however, these data has been obtained for a number of species with different GC-content and low sequence divergence. Therefore, we believe that it is unlikely that the use of parsimony have produced a systematic error of substantial effect that jeopardizes our conclusions.
Reviewer 2: Dr. Arcady Mushegian,
Stowers Institute for Medical Research, Kansas City, Missouri, United States of America and Department of Microbiology, Kansas University Medical Center, Kansas City, Kansas, United States of America.arm@stowers.org
The manuscript by Povolotskaya et al. puts forward a simple model of nucleotide substitutions in the stop codons in bacteria, and tests it against the genome-wide data. One of the main conclusions is that TAG may be globally suboptimal, with each of the remaining two codons turning out more fit under different values of GC content.
One biological explanation of these data may be in the phenomenon of overlapping ORFs in bacterial operons. TAG is the only codon that does not accommodate a minimal overlap, whereas TAA can give one kind of stop-start codon overlap (TAATG) and TGA even two kinds (ATGA and TGATG). Perhaps if the authors restricted their sample to the termination codons in the last (or only) genes in operons, they would see much less difference between fitness of those two and TAG?
Author response: The idea that the observed pattern of stop codon frequency in bacterial genomes is explained by gene overlap has occurred to us as well. However, we observe the same relationship between G-content and stop codon frequency in overlapping and non-overlapping genes. We now report these data in a new figure that is Additional file 2 Figure S2 in the new version of the manuscript. We have considered only tail-to-tail overlaps due to a much higher certainty of stop codon annotation compared to the uncertainty in the annotation of many start codons.
Reviewer 3: Dr. Shamil Sunyaev,
Dr. Shamil Sunyaev, Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, 77 Ave. Louis Pasteur, Boston MA 02115, USA. ssunyaev@rics.bwh.harvard.edu
This manuscript presents an analysis of stop codon usage in bacterial species.
The authors report that TAG codon is un-preferred in most bacterial species and that its frequency does not depend on GC content. They suggest presence of weak selection against TAG codon due to unknown mechanism. One potential mechanism may involve dependency of efficiency of one of the release factors on GC content. I find the results of great interest. I only have two minor technical comments.
1) The analysis is based on Bulmer equations, which hold only if evolution is mutation limited. It would be great to briefly discuss applicability of this model to a wide variety of bacterial species.
Author response: Bulmer’s model assumes that the fate of a new mutation is decided independently of other mutations, that is to say that generally only one mutation is segregating in the population at the same time. This is certainly true if we consider only mutations in stop codons. In most bacterial genomes there are 2–5 thousand protein coding genes making it rather unlikely that more than one stop codon polymorphism is segregating at the same time.
2) Approximation of selection coefficient against TAG codon as a sum of contributions due to selection against GC content (S1) and selection against this specific codon (S2) ignores the S1*S2 term. It is OK if both selective forces are assumed to be small. It would be great if this assumption would be spelled out.
Author response: The referee is absolutely correct, we assume that both of the selective forces are small. We have added an explicit statement to this effect in the text.