Reviewer 1: David Ardell, The Linnaeus Centre for Bioinformatics, Uppsala University
With this manuscript Novozhilov et al. bravely enter the contentious field of modelling the evolution of the genetic code. Novozhilov et al. have contributed some original approaches, concepts and techniques to the field in this work. Although the details of the method are omitted, they convincingly use linear programming to address the question of which amino acid cost matrix would minimize the cost of mistranslation assuming a fixed pattern of translational misreading error. Having solved this problem they then apply their solution to estimate which codon assignments are most deleterious in the standard code. Their conclusion that the placement of arginine is exceedingly maladaptive echoes several earlier uncited works, although their have reached this conclusion by original means. The uncited earlier works include Tolstrup et al. (1994) JMB 243:816, who showed that an artificial neural network trained to learn the standard code segregates amino acids in its internal representation in groups that perfectly correspond to a measure of their hydrophilicity with the exception of arginine. Tolstrup et al. themselves cite earlier work indicating the misfitting assignment of arginine, including Swanson(1984) Bull. Math. Biol. 46:187, Taylor and Coates, and T.H. Jukes (1973) Nature 246:22 who discussed evidence that arginine was a late addition to the genetic code. Finally, arginine is the only amino acid for which strong evidence has been made of a stereochemical association of an amino acid with its codons (Knight and Landweber(2000) RNA 6(4):499).
Authors' response: We appreciate Ardell pointing out this earlier work concerning arginine's position; cited in the revision.
I also appreciated the application of principal components analysis on representations of genetic codes based on the "codon distances" of amino acids. This is a nice way to measure genetic code similarity in the face of the large equivalence classes of genetic codes under the cost metric that they used (for instance, swapping the two purines or the two pyrimidines in either the first or second codon positions or both, while holding the amino acid assignments fixed, will yield codes with the same cost).
Finally, in terms of incremental improvements to the field, the authors promote an original model for the space of possible genetic codes, and an original mechanism of genetic code change. Their conclusion that the genetic code is a partially optimized random code, is appealing and not controversial to me, although I am quite sure it will continue to be controversial (and perhaps ignored) by others.
However, I take issue with some assumptions and lines of reasoning in this work, which I now outline in decreasing order of relevance to its overall impact:
1. Have the authors meaningfully analyzed a fitness landscape and plausible evolutionary trajectories of genetic codes? This would require adequate measures of 1) fitness and 2) a mechanism of evolutionary change. The authors repeatedly confuse the distinction between "cost" and "fitness" throughout this paper, despite pointing out the distinction themselves at one point in the paper. They also rightly conclude that a true treatment of fitness requires consideration of the population of genes that the genetic code is translating. The important point they neglect is that these genes will also influence how genetic codes can change. Because of this evolutionary constraint that genes place on genetic codes, the likelihood of a swap of amino acids or anticodons between two alloacceptor tRNAs is virtually nil, especially after translation evolved to be accurate enough, so that 20 amino acids could be translated consistently. This precondition is necessary for such a swap (as modelled) to even be meaningful. Perhaps that is why we see no evidence of such radical variation in genetic codes on Earth today. The fact that fitness is not adequately measured in this work and the way that codes change is misrepresented, leaving the basis of their conclusions in doubt. To this I may add that the ruggedness of the cost landscape that they describe is an inevitable consequence, at least in part, of the aforementioned symmetries in the cost metric that they used (leading to equivalence classes of codes as mentioned previously).
On the other hand, the authors' postulated mechanism of swaps of amino acids between pairs of codon blocks is adequate to show that the standard code is sub-optimal, although this has been shown before by others.
Authors' response: There are two distinct points in this comment. One is the alleged confusion between costs and fitness. On this count, we plead not guilty. The cost is defined unequivocally, and the inverse relationship between this cost and fitness is explained right after this definition. In the rest of the text, we speak of reduced cost in the more technical sense and of increased fitness where it comes to a more biologically oriented discussion. We believe this creates clarity rather than confusion.
The second point is that the model of code evolution might not be realistic. Here, we plead guilty. The model is deliberately oversimplified to allow straightforward conclusions on the relationships between the standard code and various random codes, and we emphasize this in the revised Discussion.
2. Is the block structure of the standard genetic code inevitable? There are two components to the block structure: the number of codons assigned to each amino acid and the clustering of redundant codons by the third codon positions. Certainly the "wobble" rules in the third codon position might reasonably be assumed invariant throughout the history of the code. But different tRNA isoacceptors may be altered in their reading capacity through mutations and modifications of their first anticodon bases, i.e. changed in which wobble rule they use. Furthermore, extant altered genetic codes vary in the number of codons assigned to different amino acids. Our own earlier claim (Ardell and Sella, 2001) not withstanding, there is clear evidence in extant life that certain amino acids have most likely inherited or invaded codons from others, thus neither the block structure of the genetic code nor its amino acid expressivity has been invariant throughout its evolution.
Authors' response: In the revised version of the manuscript, we added a caveat emptor (in the discussion section) where we emphasize that we, essentially, explore a toy model of the code's evolution that ignores the expansion of the number of amino acids and involves only codon series swaps. The gist of this paper is the determination of the place of the standard code in the code space, in relation to various classes of random codes, and we believe that, in this respect, the model we employ is adequate.
3. Using the code itself to decide among different measures of the cost of amino acid replacements, or to infer the nature of translational error, without other evidence, is fallacious. Especially considering the author's own conclusions that the code is non-optimal.
We do not actually use the code itself to decide among cost measures. It is another matter that some such measures (e.g., PAM or Blosum matrices) are themselves dependent on the code and therefore hardly appropriate. We do not believe there is anything fallacious in this.
4. Even though it is widely used, the cited experimental justification for the translational misreading probability scheme in equation 2 is weak, especially in that translational misreading is more transition-biased in misreading of the second codon position than in the first. The data are extremely limited on these points! The cited references are: Friedman and Weinstein, 1966, Woese, 1965 and Parker 1989. In the first reference, only the data translating poly-U are directly interpretable (the poly-UG data has as its highest incorporation Phenylalanine, demonstrative that the mRNA was a random copolymer). Their data (Table 2, page 990) does show a transition-bias of misreading of this one codon as the overall rate of error is increased. But there is no evidence that this bias is greater in the second codon position than the first codon position. In contrast, the data reviewed by Woese (1965) for poly-U show no sign of transition biased misreading in the first codon position at all, but a sign of it in the second codon position. Therefore, the data from these two sources are inconsistent. Furthermore, they arefor only one codon, and Woese writes that the pattern of misreadingof other codons that could be assayed at that time was very different. Importantly, even very recent studies of translational misreading either experimentally in vitro (E. Bouakaz et al. http://publications.uu.se/abstract.xsql?dbid=6324) or using molecular dynamics simulations (Almlöf et al. (2007) Biochemistry 46:200) center only on the UUU or UUC codons. All authors agree that more studies are necessary with other codons to generalize conclusions. Parker's review of in vivo misreading rates (Table 1, page 277) in no way allow the reader to draw general conclusions regarding the form of translational misreading errors in the different codon positions, other than the general position effect.
On the other hand Kramer and Farabaugh's recent work (cited here in this paper) do demonstrate a greater transition-bias in misreading in position 2 than position 1, in vivo, of all possible one-mutant neighbors to the lysine codon AAA. Nonetheless, this raises the following two questions for me: 1) what translation system under which conditions is the best experimental model for the primordial translation systems under which the genetic code evolved? and 2) Are the highly evolved translation systems studied today biased by actually having co-adapted to the genetic code, so that error frequencies are greatest where costs of errors are weakest?
Authors' response: In the revised manuscript, we are more cautious about the differences in the transition bias between codon positions. The questions asked by Ardell are interesting and relevant. Like he, we currently have no definitive answers.
As a general point, this paper would benefit very much from separating materials and methods from the results for clarity. In many turns the paper is well written, but in other ways combining M&M and Results makes the paper badly organized and forces the reader to piece together important details of how the work was done from scattered sections of the paper. For example, only incidentally can the reader learn how many different genetic codes were actually analyzed in evaluation of the 4 sets o, r, O and R.
Authors' response: We disagree regarding the amalgamation of M&M and Results. We initially attempted to write the paper in a more traditional manner but found that, in this case, the main methodological approaches were virtually inseparable from the results. The numbers of evaluated codes are now indicated explicitly.
(p. 9 and elsewhere): Although we (Ardell, 1998, Sella and Ardell,2001, Ardell and Sella 2001, 2002) have shown that 1) mathematical forms such as your eq. 1 are minimized by pairing large terms of p (.|.) with d(.,.), and that 2) codes that imply such pairings are indeed more fit in certain population genetic models, it is only inviting confusions and misunderstanding for the authors' to use the term "fitness" to describe the quantity being optimized. This point is correctly touched on in the paper, but then treated misleadingly elsewhere. May I suggest to call it what it is, which is "cost"?
Authors' response: already addressed above. In general, we do not see conflation of costs and fitness.
Please detail, regarding software used, how the linear programming problem was solved for reproducibility. Why not provide source code in supplementary methods?
Authors' response: The linear programming problem was solved with a standard routine LPSolve presented in Optimization package of Maple 9.5.
Reviewer 2: Allan Drummond, Harvard University (nominated by Laura Landweber)
Review of "Evolution of the genetic code: partial optimization of a random code for translational robustness in a rugged fitness landscape", submitted to Biology Direct.
The logical development and main results of the paper are as follows. First, a cost function for genetic codes is specified, and its terms explained (including choices for a distance measure between amino acids). A framework for generating alternative codes is introduced, with a set of assumptions to winnow the search space by roughly 66 orders of magnitude to a tractable set, most importantly the assumption that the block structure of the standard code is a mechanistic consequence of the translational apparatus and therefore non-blocked codes may be safely set aside. The standard code is compared with alternative codes and found to outperform the vast majority of them given a few variants of the assumed cost function. Improvement opportunities for the genetic code are identified by an attempt to minimize the cost function via changes to the distance measure. A greedy minimization algorithm is introduced to search locally for improved variants of an initial code via swaps of codon families. Using this algorithm, the question of whether the standard code should be considered optimized for error minimization is addressed: optimized versions of the standard code and random blocked codes are obtained, and it is found that the standard code's cost, and that of its optimized version, can be matched or beaten by optimized versions of many random blocked codes. The paper's major conclusion is that the standard code is rather unremarkable in its error minimization when compared with other blocked codes.
Overall, I find the subject exciting, the approaches as daring as would be expected from this leading group, and the conclusions interesting. The authors make major assumptions with which I'm not entirely happy, with the justification that they are necessary to make progress. My concern is that unless the assumptions are good, progress is not actually being made, and the topic is better left alone.
I suggest that some assumptions be clarified and buttressed with evidence where they conflict with compelling alternative arguments. The results derived from the greedy minimization algorithm should be substantially revised, as several important claims about this algorithm's output (e.g., that it finds shortest paths) are incorrect. Finally, long-standing questions about the inferences one can draw about evolutionary trajectories, possible or actual, from the output of analytical or computational optimization algorithms should be addressed.
To begin, the simplifying assumptions made to render the search for better codes tractable bear closer examination. In particular, the limitation of searches to codes having the block structure of the universal code is defended, and then used, in a novel way. Given the goal of interpreting simulated trajectories of code modification as informative about the actual process of code evolution (so that, for example, the concept of "close to a fitness peak" and the data in Figures 7 and 8 have meaning in biology as well as the simulation), the authors must establish that the simplifications are biologically reasonable. The burden is heavier here than on other works that make similar assumptions (e.g. the works of Freeland and colleagues) but in which no claims about evolutionary trajectories or the mode of evolutionary exploration are made.
The major assumption leading to the reduction in the search space is that "...the block structure of the code is a direct, mechanistic consequence of the mode of interaction between the ribosome, mRNA, and the cognate tRNA ". The premise is worded in a way suggesting that biophysics alone suffice to impose the observed block structure, without invoking selective pressure against mistranslation. This is to my knowledge a completely novel and exciting idea, and substantial evidence should be presented to support it. I was unable to connect the contents of  (Spirin, RNA Biol. 2004) with this premise, and would be helped by exposition on what is being assumed and what is known.
By contrast, the authors might mean the alternative where the block structure of the code arises both from the mode of interaction (e.g., third-position binding contributes most weakly to discrimination, and codon-anticodon mismatches involving transitions are more stable than those involving transversions) and selective pressure for error minimization, which jointly favored a code structure in which third-position transition errors are largely synonymous – a blocked code. That natural selection favors translational error minimization seems obvious; the question at issue is whether the structure of the genetic code contributes alongside other adaptations such as ribosomal structure, kinetic proofreading, synthetase editing activity, biased codon usage for translational accuracy (Akashi Genetics 1994), biased codon usage for error minimization (Archetti JME 2004), tolerance of proteins to mistranslation, etc. If this weaker but plausible assumption is what the authors mean, then it becomes less clear how unblocked codes can be eliminated from consideration in evolutionary pathways, since they are merely assumed to be less fit (as are most codes in the reduced landscape), not unviable, and there are overwhelmingly (~10^66-fold) more of them in the space of all codes, such that selection must work hard to eliminate them.
Indeed, there are extant codes that have a more consistent block structure than the standard genetic code, such as the vertebrate mitochondrial code in which there are no single-codon families (unlike Trp and Met in the standard code). Such block structures are apparently consistent with the mechanism of translation, but are not considered in the present study. I recommend that in the manuscript the authors more muscularly defend the omission of unblocked and differently-blocked codes from evolutionary trajectories.
Authors' response: Indeed, what we mean is that the first two bases contribute substantially more to the recognition of a cognate tRNA than the third, wobble base. This was made explicit in the revision, and the references in support of these differential contributions of bases are cited ([49–51]). Drummond's point is well taken in that this does not render codes with different block structures impossible "in principle". However, it does make them improbable, and given that for any simulation to run to completion, a relatively small domain of the code space needs to be chosen, fixing the block structure seemed like the best choice. We explain all this in the revision. This may not amount to a "more muscular" defense suggested by Drummond but this is how things are.
The latter half of the work is mainly concerned with how optimized the standard genetic code is. Given that evolution is a stochastic process, the natural way to think about optimization is to ask what proportion of mutations increase versus decrease the score – a truly optimal code will have zero improvement mutations, and a highly optimized code will be improved by only a tiny fraction of the many possible mutations. Many workers have estimated this proportion by locating the standard code's score relative to a sampled distribution of alternative scores. The authors have taken another approach, using the number of "greedy" codon-family swaps separating a given code from a local optimum to measure how optimized it is. The use of distances between a given code's score and an optimum – here, the minimization percentage, MP – to ascertain the strength of selection has been criticized (Freeland, Knight and Landweber, TiBS 2000), essentially because it improperly treats these distances as a linear measure. The present work compares MP's between codes and is subject to the same criticism. The problem is exacerbated here because the MP is computed relative to each code's local greedy minimum. If, for example, the standard code has an MP of 0.93 and a competing code an MP of 0.8, one cannot conclude that the standard code is more optimized in the sense of having fewer mutations which improve it. That is, the difference between MPs is not equivalent to the difference in optimization level. An alternative is that the codes in the standard code's neighborhood generally score well, so that obtaining a high MP is easy and many mutations would improve the standard code, making it poorly optimized, while the competing code's neighborhood is filled with poor-scoring codes, and it is heavily optimized with few better-scoring codes to move to. Rugged landscapes of the sort explored here are more likely to have such features. The authors should directly address the criticisms regarding MP and search-derived versus stochastically derived measures of optimization, and should sample the local landscape around each code to address the concerns about level of optimization.
Authors' response: In order to compare codes, we employed both a statistical approach and an optimization approach. As a measure of the distance between codes, we used not only the number of codon swaps but also the difference in the error cost values as can be seen in Figs.3, 4, 5, 7and8. It is not clear why we cannot use MP in the context of the present study. The criticisms of Freeland et al. 2000 addressed the conclusion that, considering the low MP value of the standard code, the code could not evolve under selective pressure to reduce the effect of translation errors. We do not argue with that critique. We use MP only to compare different random codes with the standard code under exactly specified rules for fitness landscape search.
The mutation-selection balance/non-linear adaptation argument should be considered in the Conclusion where the authors ask, "Why did the code's evolution stop where it stopped?" An answer I glean from much of the error-minimization literature cited in the present work is that it might be wildly improbable for selection to push the level of error-minimization any higher, given countervailing pressure from mutation. If the genetic code is "one in a million" in the sense favored by Freeland and Hurst (JME 1998), that is a high level of optimization by most standards. (A demonstration that many mutations improve that one-in-a-million code would be compelling contrary evidence.) Algorithmic optimization of the sort carried out here is blind to such statistical features – in greedy minimization, the first optimization step is as easy as the last step, because all possible alternatives must be evaluated each time, whereas in a blind sampling-based process such as evolution, the farther uphill one climbs, the more improbable improvement becomes and the less likely it is to persist once attained. This is the essence of Freeland et al. TiBS (2000)'s criticism.
In the same vein, there are several standard evolutionary hypotheses which seem to be missing for why the present genetic code should not be optimal:
- Error minimization was not the sole target of selection. If any other traits were under substantial selection in primordial genomes, and these traits were not perfectly congruent with error minimization, then an evolutionary process favoring increased fitness would yield a sub-optimal genetic code.
- The effective population size was not infinite. Natural selection cannot distinguish fitness differences smaller than the reciprocal of the effective population size. As a consequence, any mutations (to tRNAs, synthetases, release factors, ribosomal components, etc.) which improve the error minimization of the genetic code, but confer a selective advantage below this threshold, would not be expected to reach fixation except by drift. One could in principle estimate how many codes have such a property, and thereby estimate how much optimality would be "left on the table" simply because of the nature of the evolutionary process.
- Mutation-selection balance was achieved. Suppose that error minimization has a bell-shaped distribution, and high levels of error minimization are selectively advantageous, but not infinitely so. The higher error minimization is pushed by selection, the more strongly it is opposed by an increasing proportion of deleterious mutations, until equilibrium – likely at a sub-optimal level of EM – is reached. (Mutation-selection balance is a more mainstream term for the "non-linear adaptation" argument touched on above and briefly by the authors in the Introduction.)
If any of these three standard hypotheses have merit, then the genetic code is expected to be sub-optimal with respect to its robustness to mistranslation. The present work should address these hypotheses in their Discussion, where presently the "balance of two forces" argument addressing the same point is made.
Authors' response: We appreciate these interesting comments. However, in our opinion, this demand puts the plank unrealistically high for any analysis of the evolution of the code. We do not know this "expected non-optimality".
The authors measure the stepwise distance from a code to a local peak of the landscape, thereby ascribing significance to this peak. A serious concern is that the algorithm by which this greedy peak is found sheds no light on what general significance the peak possesses. Let us assume that the greedy peak is found to be N steps away from the starting point. The greedy peak is not guaranteed to be a) the closest peak, b) the tallest peak within N steps, or c) the closest or tallest peak approachable using exclusively uphill steps. In a rugged fitness landscape, there is additionally no guarantee that the height of or distance to the greedy peak are informative about the height of or distance to these peaks.
Further, the authors state, "Using this algorithm we can find the shortest evolutionary trajectory from a given starting code to its local minimum of the error cost function (i.e. to a local fitness peak)." This statement is incorrect. Greedy paths will not in general be shortest paths. This can be seen most clearly in Figures 7 and 8, which plot minimization paths of >26 steps, and concluding point #6 which states that a typical code can reach its local peak in 15–30 steps. Given the algorithm used (where a set of mutually accessible codes is uniquely specified by the position of 14 four-codon and 14 two-codon blocks), any code can be changed into any other accessible code in (14-1) + (14-1) = 26 swaps. (The problem is equivalent to a list-reordering problem, and a list of n items can be put in any specified order in n-1 swaps or fewer). It is impossible for a shortest path in the described model to be longer than 26 steps.
As a consequence, Figures 7 and 8 and the accompanying text should be revised. The aim of the experiment is to determine how far various codes are from the local minimum. If "how far" is the shortest-path difference to the closest local minimum, then the data should be retaken using a suitable approach such as dynamic programming. If "how far" is meant in an evolutionary sense, then neither the greedy path nor the shortest path are expected to be representative of evolutionary trajectories, which are blind and therefore subject to entropic constraints as well (cf. the arguments of Freeland and colleagues and the mutation-selection balance comment above). The greedy algorithm is a dubious choice for evolutionary studies, since, for example, the probability of an evolving population moving from one code to the next in any evolutionary model should be a function of the probability of occurrence of the proper mutation and the probability of subsequent fixation, and the greedy algorithm ignores the probability of occurrence altogether. It is easy to imagine deterministic algorithms of equivalent computational complexity which do not ignore such statistics – when all alternatives are being assessed, as in the greedy algorithm, the population mean, median, and so on are deterministic.
Authors response: It is true that, allowing any codon swaps, we can reach any code in (14-1) + (14-1) = 26 steps (not 20). If we knew the final state (the global minimum) this would be an easy problem. Without such knowledge, theoretically, it is possible to find the global minimum using dynamic programming but, practically, this problem is not solvable due to immense number of possible codes. If we allow only swaps that yield the largest fitness increase, then, we find the closest peak (we define a peak as a state from which no codon swaps yield fitness increase) and the tallest peak (because, under the given algorithm only one peak can be reached from any starting point) for this algorithm. We should note that this is, to the best of our understanding, the most reasonable deterministic algorithm of evolution imaginable. It would be a different approach if we added some kind of stochasticity in the landscape search. We decided to use the aforementioned, simple, and therefore, tractable, deterministic, greedy algorithm. In the revised manuscript, we clarified this point in the description of the search algorithm by making it explicit that the algorithm finds an optimization path in which each step involves the maximum possible increase of the code robustness, and added a statement on caveats in the Discussion.
The validity of a fitness landscape, whether quantitative or illustrative (Fig. 9), derives from that of its metric (distance measure) and fitness function (height). The distance metric chosen here is swaps. That is to say, one step across the landscape equals one exchange in the meaning of two families of codons which encode different amino acids. The authors assert that it is likely the genetic code evolved, at least in part, by such swaps. Woese (BioScience 20(8):471–480 + 485 ) considered several mechanisms for the evolution of the code. Like the authors, he noted that codon reassignments were almost certain to be strongly deleterious, but instead argued that this stumbling block favored an alternate class of evolutionary paths, namely refinement of ancestral stochastic overlapping codon – amino-acid assignments into more precisely delineated families. Evidence addressing Woese's argument, specifically in support the importance of swaps, should be provided early in the manuscript, as the plausibility of the conclusions depend on the acceptance of this premise.
Authors' response: Again, this work is not an attempt to reconstruct a truly realistic scenario for the evolution of the code but rather to determine the status of the standard code in the code space, compared to various sets of random codes, and delineate possible evolutionary links between the standard code and different random codes. This is clarified in the revision.
The authors note that 9 or 11 swaps is required to take the standard code to its greedy minimum, and refer to this distance as "relatively small"; this interpretation should be justified. As implied above, 9–11 mutations suffice to move any code halfway across the entire vast space of all blocked codes (but recall that the greedy minimum is not necessarily the closest or deepest local minimum). As these swaps are macromutations – several physical mutations would likely be required to swap two four-codon families – the distance would certainly be even larger in reality. I suggest providing the above calculation of the maximum shortest-path length to put whatever distance is found upon revision in perspective.
Authors' response: "Relatively small" means, literally, relatively small with respect to other random codes. It does not seem to us that additional justification is necessary.
The term "translational robustness" has previously been used to refer to the robustness of individual proteins to mistranslation (Drummond et al. PNAS 2005; Koonin and Wolf, Curr. Op. Biotech. 2006). Here, it is being applied for the first time to the genetic code to denote the idea that certain codes have error spectra which lead to disrupted protein fold or function less than others. These are different phenomena – in the biophysics of how robustness might be obtained and modified, and the scope of consequences if it is altered, among other aspects – and using the same term risks inducing the opposite impression. The phenomenon under consideration has been the subject of many previous works, so the field's common use of "error minimization" might be considered. If a new term is sought, "error robustness" would incorporate the robustness concept while carefully distinguishing it from previous work. If the term must be kept, a short description of how its use here differs from the earlier definition would help to minimize confusion.
Authors' response: We agree, the terminology here deserves more attention. "Translational robustness" could be ambiguous but "error minimization" is not a good phrase either because the structure of the code does not minimize errors, it minimizes their effect. So we went through the manuscript and made changes, speaking of "robustness to translation errors" or, where no ambiguity is perceived, simply, of "robustness".
Reviewer 3: Rob Knight, University of Colorado, Boulder
In this manuscript, Novozhilov et al. provide a more detailed exploration of the level of optimality of the genetic code and the evolutionary trajectory of optimization than has previously been available. Specifically, they use a standard approach to measuring the "cost" of a genetic code in terms of the weighted frequency of errors of different severity, and measure the trajectory of codes using a hill-climbing optimization algorithm. They recapture the uncontroversial result that the genetic code is much better at minimizing errors than a random genetic code (as has been shown by many authors), but is at neither a local nor global optimum (as has also been shown previously). However, the results go beyond what has previously been done by comparing the evolutionary trajectory of the standard genetic code to the trajectories of other, random codes to get an estimate of what the overall process should look like.
I believe that the authors overstate their result that the standard genetic code is "not special". Their own results show that it is difficult to explain except as the result of an optimization process: the argument that the standard genetic code is a global optimum is not to my knowledge taken seriously in the field, so the results cannot be seen as overturning it (see discussion between Steve Freeland, Massimo Di Giulio and myself in TiBS in 2000, which is cited appropriately in the paper). Rather, they show that, like most other features of organisms, the genetic code is optimized but not optimal, and probably reflects a range of constraints beyond the specific feature being examined. The manuscript could also benefit from being shortened substantially, as it appears to be relatively long in relation to its news value.
Authors' response: When we say that the standard code is "not special", we mean that it very well could have evolved by partial optimization of a purely random code of the same block structure. Various edits have been made to clarify this but the gist remains. In a way, this is a shift from a half-full glass touted in several previous studies which emphasized that the standard code was "one in a million" or even better than that, to a half-empty glass whereby the standard code appears rather trivial when the entire landscape is considered.
It is interesting that many of the non-canonical genetic codes in fact do have different block structure or amino acid counts than the canonical genetic codes: indeed, this seems to be the main way that the genetic code is currently evolving (see Knight et al. 2001 Nature Reviews Genetics for a discussion, and Caporaso et al. 2005 J Mol Evol for some simulations involving alternative block structures – the authors might consider citing this latter paper in the discussion of alternative models for code evolution). It might be worth relaxing some of these assumptions in the simulations to model more accurately how we think the code is changing today, although this is perhaps beyond the scope of the present manuscript. We did some work on the optimality of the non-canonical codes in Freeland et al. Mol Biol Evol.
Authors' response: The specifics of the deviant codes in modern organisms, indeed, seem to be beyond the scope of the paper. It is hard to be sure that these in any ways recapitulate the original evolution of the code.
I think the statement "the standard code is unremarkable" is misleading, for the reasons mentioned above. It is still far better at minimizing errors than the vast majority of codes: perhaps the authors could restate what they mean more clearly.
Authors' response: Restated and qualified in the revision.
I still disagree that the earlier paper cited, by the same authors, adequately addresses the evidence in favor of a stereochemical effecton the modern code structure. The Caporaso et al. 2005 JME paper shows that there is plenty of "room" for adaptation even if substantial parts of the code were fixed by stereochemistry, for example. Similarly, the present paper does not really discuss the evidence supporting coevolutionary models, although it could be argued that both lines of evidence are outside the scope of this work.
Authors' response: In this context, the previous paper by the same authors (YIW and EVK) is cited as a succinct review of the evidence in support (or lack thereof) of the stereochemical hypothesis. The main conclusion is that there is no compelling evidence in favor of that hypothesis, and we stand by that. As for the being plenty of room for adaptation, we do not seem to be in dispute on this.
The discussion of the circularity of using PAM matrices should cite Di Giulio's 2001 J Theor Biol paper on the topic, and I also show in my PhD thesis that even very small contamination of a substitution matrix with the genetic code matrix can lead to artifactual statistical significance of the optimality of the genetic code.
Authors' response: The Di Giulio reference was added . Unfortunately, at this time, we do not have access to the thesis.
The assertion that "using this algorithm we can find the shortest evolutionary trajectory from a given starting code to its local minimum of the error cost function" is not true – the shortest trajectory might involve a transition to a worse solution to get to a better one. The algorithm employed can only find the shortest continuously improving path, which might be much longer than a direct route. This point should be clarified in the manuscript.
Authors' response: Indeed, this point has been clarified – see the response to Drummond's comments above.