Reviewer 1: David Ardell, University of California, Merced
In this concise and well-written manuscript, Bernhardt and Tate present their case that the tRNA anticodon loop originated in part by duplication of a hypothetical NCCA 3' overhang of a primordial hairpin that was the ancestor of modern tRNAs. They assert that the universally conserved C35, C36 and A37 features of modern glycine tRNAs are therefore homologous to the CCA tail of modern tRNAs, and that – by implication (by the Principle of Continuity, using Crick's name for it) – glycine was the first amino acid encoded in the genetic code and assigned to GGN codons.
There are many reasons why this is a compelling argument, because, as documented by the authors, it synthesizes and rationalizes disparate evidence, observations and claims: that modern tRNAs originated by duplication of a primordial hairpin, that the position of tRNA introns after position 37 is well-conserved in all domains of life, that glycine is widely acknowledged to be encoded early in the genetic code, that experimentally, acceptor stem hairpins may be glycylated by modern-day glycyl-tRNA synthetases, and of course that glycine is universally encoded by GGN codons.
Additionally, although the authors do not raise this point, their hypothesis could explain the nucleotide distribution of the so-called "cardinal" nucleotide 37, 3' to the anticodon, which is nearly universally some modification of A, or much less frequently a modification of G, which is also a purine.
As compelling as all of this, we (the readers) must openly address the logic of the authors' arguments in the face of alternative possible explanations, and in particular, critically assess the direct evidence that they present for their case. We must also bring out tacit assumptions where necessary and attempt to evaluate them critically.
We are accustomed to applying well-understood models of sequence evolution to statistically evaluate similarity due to homology. However, in doing this with tRNAs we must be very careful, as these molecules violate the assumptions of these conventional methods – owing to their small size and intense functional constraint. For further discussion and reference of this issue for tRNAs please refer to Widmann et al [3].
The chief alternative explanation – to homology – for common features of different molecules is that they have common functional constraint. Of course, some – perhaps many – functionally constrained characters in tRNAs may ultimately be explained by common ancestry anyway. That is because they could have arisen by the "freezing" of specific random "accidents." By "accident" we simply mean that other variants of certain traits may have had at one time comparable fitness, but that then, through the evolutionary refinement and augmentation of function, other components of the biological system those traits interact with become co-adapted to specific variants, locking them in so that they could function well together. Thus, in a kind of biological symmetry breaking, historical accidents may become frozen and acquire novel functional constraint by exaptation.
Authors' response: We would agree that random accidents provide 'opportunity' in the sense of "a combination of circumstances favourable for the purpose" [43], that then, through 'evolutionary refinement and augmentation of function', can indeed become functional constraints. In terms of the nascent anticodon-codon interaction and the subsequent evolution of coded protein synthesis, the proto-anticodon loop (N)CCA sequence created by the ligation of the two hairpins may have been absolutely required (the requirement for a purine in position 37 is discussed further in a later response to this reviewer). Coded protein synthesis was not predestined. As argued by others, it must have arisen in small, incremental steps, each of which had a selective advantage in itself (or, at least, was evolutionarily neutral) in order to be maintained. In fact, in terms of the functional requirements for the development of coded protein synthesis, a better descriptor than 'frozen accident' for what we are proposing might be 'necessary' or 'essential accident'. We acknowledge that coded protein synthesis could possibly have arisen by an alternative mechanism, in which case the presence of the (N)CCA sequence in the proto-anticodon loop may not have been required. However, we would assert that the appearance of this sequence constituted an essential pre-step for the advent of coded protein synthesis as it is seen today.
The major evidence presented by the authors for their hypothesis is the nearly universal conservation of C35, C36 and A37 among glycyl tRNAs and identity of the CCA sequence with the CCA tail. They do not attempt any further demonstration of extended similarity of the anticodon loop to the 3' half of the acceptor stem, or indeed statistically reinforce the underlying specific hypothesis of tRNA origin by hairpin duplication due originally to Di Giulio. So their argument relies strongly on prior work, forcing us to evaluate some of that as well.
Authors' response: Apart from the highly conserved CCA sequence, we have not found extended similarity of the anticodon loop to the 3' half of the acceptor stem in glycine tRNAs. However, it is perhaps interesting to note that a cloverleaf formed by the ligation of hairpins (as well as the hairpins themselves) based on the anticodon arm consensus sequence in Figure 2A, i.e. with the sequence 5'- AG CU...GC C/UU NCCA-3', would be able to form the two terminal base pairs of the (acceptor) stem: U33-A38 and C/U32-G39 (underlined).
I must add at this point that the mitochondrial evidence that they present is entirely superfluous to their claims. The origin and divergence of mitochondria happened long after the origins of tRNA and the genetic code. Therefore, mitochondrial variation or lack thereof can shed little light on their hypothesis. Putting it another way, if mitochondrial glycyl tRNA genes lacked these features, this could have been written off as a derived peculiarity of mitochondria that neither supports nor undermines their claims.
Authors' response: In response to this criticism, we have separated the analysis for cytoplasmic and organellar tRNAs. However, although mitochondrial sequences may not tell us about the origins of tRNA, the fact that the anticodon loop CCA is retained in these and chloroplast sequences enforces the argument that this sequence has indeed been 'frozen' in place in contemporary tRNA sequences: CC due to its fundamental importance in coding for glycine, and A37 due to its probable role in strengthening the anticodon-codon interaction through base stacking [17].
Now surely, the absolute conservation of C35 and C36 nucleotides among glycyl-tRNAs is not surprising since there are no alternative genetic codes involving glycine.
Authors' response: There are a number of alternative genetic codes, for example in Candida albicans [38] and the ciliates Tetrahymena thermophila and Paramecium tetraurelia [39], not to mention in mitochondrial genomes [37]. Interestingly, however, in none of these are the GGN glycine codons reassigned to another amino acid, indicating that the assignment of (N)CC anticodons to glycine has been frozen in place. In fact, this is true for the entire bottom row of the genetic code table, which contains codons with the sequence GNN, including the codons for alanine, aspartic acid, glutamic acid and valine as well as glycine, all of which are believed to be 'early' amino acids (see Table 1, left column).
Furthermore, the near universality of A37 among glycyl-tRNAs is in fact superseded by the overwhelmingly high incidence of A37 in a majority of tRNA classes in eukarya and bacteria [15, 44]. This so-called cardinal nucleotide has been discussed by Yarus [45] as playing a role in an "extended anticodon". Yarus showed that post-transcriptional modifications of A37 (and the minor variant G37) are correlated in E. coli with anticodon sequence, particularly position 36. This observation was generalized through sequence analysis to many more species of bacteria by Saks and Conery [44]. Archaea also always have a purine at this position, although the average frequency of G37 is much higher [15].
These modified purines at position 37 have a wealth of well-characterized functions in translation. They are involved in stabilizing the openness of the anticodon loop [46], constraining the motional dynamics of the anticodon stem-loop to better function in codon pairing [47], and stabilizing the interaction of position 36 with the first base of the codon [48]. Finally modified purines at position 37 are important for maintaining translational reading frame [49].
Do these facts invalidate the author's claims? Not necessarily, for the reasons I outlined above: historically once-arbitrary states of tRNAs may become functionally constrained through co-adaptation of other components of the translational apparatus. But until somebody demonstrates that translation could equally well have evolved with other nucleotides, say pyrimidines at the cardinal position instead of purines (more commonly A), the possibility remains that A37 in glycine is nearly inevitable for other reasons besides the one that the authors claim. Putting it another way, an important question is whether purines at position 37 are exaptations of a frozen accident or whether only purines could do the jobs they do in tRNAs.
Authors' response: The fact that only purines could do the job they do at position 37 in no way invalidates our argument. No doubt A37 has been 'frozen' in place in contemporary tRNAs because it provides a compelling functional advantage in stabilizing the anticodon-codon interaction [17]. We believe it must have been in this position from the beginning, as it would have been in an anticodon loop formed in the way we have proposed by the ligation of two hairpins with 3'-CCA termini. Had it not, the anticodon-codon interaction, and indeed protein synthesis itself, may not have arisen. It would be hard to overstate the centrality of the anticodon-codon interaction, which is at the very heart of the mechanism of protein synthesis. A purine (A or G) in position 37 is required to stabilize this interaction and, in fact, enable it to occur. Equally, and as we have commented in the manuscript, an NCC anticodon was required for the establishment of the nascent anticodon-codon interaction due to hydrogen bonding considerations [20]. This can also be seen in the case of the glycine tRNAs from Staphylococcus aureus and Staphylococcus epidermidis which have been co-opted for the role of cell-wall synthesis and are unable to function in ribosomal protein synthesis [16]: they have a pyrimidine rather than a purine at position 37. Pertinent to this point, an analysis of the post-transcriptional modification of N37 is also instructive. It has been proposed that such modification serves to strengthen anticodon-codon interactions involving weaker A-U base pairs relative to those involving G-C base pairs [50]. Interestingly, of a set of post-transcriptionally modified tRNAs taken from three species where the structures of almost all tRNA sequences with their modifications have been elucidated (Escherichia coli (bacteria), Haloferax volcanii (archaea) and Saccharomyces cerevisiae (eukaryote cytoplasm)), none of the nine glycine tRNAs have a modified A37, the lowest proportion of any tRNA (Additional file 1, part A). It has been suggested that early tRNA molecules contained no modified nucleotides [8]. If this is correct, the earliest anticodon-codon interactions would not have depended on post-transcriptional modification, with such modification possibly being introduced at a later stage of genetic code evolution (perhaps with the advent of protein enzymes) in order to utilize additional codons/amino acids. Significantly, the glycine tRNAs from the three organisms described above have the lowest average number (6.2) of post-transcriptionally modified nucleotides in the molecule as a whole compared with the other tRNAs (Additional file 1, part B). While the fact that glycine tRNAs have a C at position 36 (the five tRNAs with the lowest proportion of modified N37s all belong to the bottom row of the genetic code table, meaning they possess C36) might provide an explanation for an unmodified nucleotide at position 37 [45], the relative lack of modification of the glycine tRNA molecule as a whole can not be explained on this basis. Despite relying on a slightly different argument, it suggests that glycine tRNA represents an early tRNA that did not require extensive post-transcriptional modification to function.
Then what are we left with? I am a bit hungry for further demonstration of the underlying hypothesis that tRNAs originated by duplication of a primordial hairpin, ultimately ancestral to both the D and T arms of modern tRNAs. With this underlying hypothesis strengthened, one is more likely to consider seriously the authors' claims. The primordial hairpin duplication origin of tRNAs is a large and fairly complicated body of literature and I did not evaluate it exhaustively. I will focus on two papers that are easiest to understand because they most directly use well-understood methods: those are Di Giulio [51] and Widmann et al [3]. Di Giulio [51] used parsimony-based methods to study the similarity of the 3' and 5' halves of reconstructed ancestral tRNA sequences. Besides the intrinsic limitations of the parsimony analysis as noted in Widmann et al [3], there are biases in ancestral sequence reconstruction (ASR) with either parsimony or likelihood-based methods. In fact another study that used ASR to reconstruct ancestral tRNA sequences found that they did not even fold properly into the canonical secondary structure [52].
It is worth noting that objections of a quite similar nature have also been raised about the means by which Jordan et al [11] reported universal trends in amino acid compositional gain and loss in proteomes, which is cited by the authors in support of their hypothesis. These objections were raised by Goldstein and Pollock [53]. Incidentally, and perhaps in support of the authors, similar claims to Jordan et al [11] were made using entirely different methods and comparisons by Ivanov [54].
Widmann et al. [3] took an alternative approach to assess the question of paralogy of the two halves of the tRNA cloverleaf. They compared the distributions of similarity of actual extant 5' and 3' halves of tRNAs to those expected from a null model of random tRNAs. My chief objection to this otherwise excellent approach is that their null model of random tRNAs (generated by a shuffling procedure) are not filtered or verified to actually fold into a minimum free energy cloverleaf structure. It is possible that the requirements of this structure place additional constraints on sequence variation that would reduce the significance of their results.
Authors' response: We would agree with the reviewer that the hairpin duplication origin of tRNA has not been proven, but we believe it is a credible hypothesis. Problems with phylogenetic analysis of tRNA may be due partly to a loss of evolutionary signal across 4 billion years of evolution in what are very short sequences of RNA [3]. We would also wish to make the following points:
1. The hairpin duplication theory for the origin of tRNA seems to us to be based primarily on two separate strands of evidence: the presence of a canonical intron insertion position between nucleotides 37 and 38 in the middle of the molecule, at the position one would expect if indeed tRNA were formed from two similarly-sized halves; and the experiments of Schimmel et al. demonstrating that hairpins containing 3'-CCA termini are able to be specifically aminoacylated by contemporary aminoacyl-tRNA synthetases, indicating, in light of the structural relationship between hairpins and the tRNA cloverleaf (see Figure 1) and on the basis of the Principle of Continuity, that such hairpins were the precursors to tRNA. That these two lines of evidence so beautifully complement and support each other, we believe, adds considerable weight to the theory that tRNA originated from the duplication of hairpins that were able to be specifically aminoacylated at their 3'-CCA termini, and perhaps participated in some form of noncoded protein synthesis.
2. Point 1 notwithstanding, there is some disagreement in the literature as to whether tRNA was formed from a hairpin duplication or from the ligation of two different hairpins [3]. It should be noted that the ligation of two identical hairpins is not actually required for our argument. All that is necessary is that both hairpins contained 3'-CCA termini, one of which was incorporated into the nascent anticodon loop at the position we have proposed. The reason we have discussed the mechanism in terms of hairpin duplication is that it seems the simplest due to considerations of symmetry (see Figure 1). As we know, however, nature is not always simple or symmetrical!
In summary, the ideas in this paper are internally consistent and sound, and pull together many disparate facts. In my opinion, however, the exclusive evidence for the authors' specific claims is not overwhelmingly strong, and more could be done to bolster their claims.
Reviewer 2: Rob Knight, University of Colorado
In this paper, Bernhardt and Tate propose an interesting new mechanism for the origin of the genetic code from primordial glycine tRNAs. Although speculative, the paper suggests that tRNAs might have had a monophyletic origin from duplication and divergence of tRNA-Gly, primarily because CCA occurs immediately upstream of the intron in the vast majority of tRNA-Gly sequences examined (this pattern could be due to the inclusion of the terminal CCA in a duplicated hairpin, as has been proposed by several authors). As Gly is encoded by GGN codons, this pattern would automatically introduce a glycine anticodon adjacent to the intron position. The authors conclude that glycine was the primordial tRNA, produced by hairpin duplication, and that other specificities arose from this original activity.
Several lines of evidence would make this argument more compelling.
First, although CCA is found next to the intron positions in a majority of tRNA-Gly sequences, this does not necessarily mean that this is the ancestral state. Building a phylogenetic tree of the tRNA sequences and demonstrating that the earliest-diverging branches have CCA would be reassuring (i.e. it is necessary to show that the earliest-diverging groups don't retain some rare but ancestral alternative). Rooting the tree is an issue if it is assumed that other tRNAs branch from within modern tRNA-Gly sequences, but several methods are available and should be used.
Authors' response: In response to this helpful critique we have revised our manuscript to include Figure 3, a phylogenetic tree of genomic glycine tRNAs. This figure shows that the 18 sequences not containing the anticodon loop CCA sequence are on isolated branches of the tree rather than spread throughout, indicating that this is a derived character, although the tree does not provide evidence of the ancestral glycine tRNA sequence.
Second, some evidence that tRNA-Gly is the ancestral tRNA would be helpful. Again, trees could be built with a sample of tRNAs of different specificities. If other tRNA specificities branch from within tRNA-Gly, we would expect standard tests for monophyly on tRNA-Gly sequences to fail. If, however, tRNA-Gly sequences are monophyletic and branch from within some other specificity, the hypothesis would not be supported. Again, this analysis will be complicated by difficulties in rooting and in building a tree with so few characters, so a negative result will not be conclusive, but clear patterns, if obtained, could greatly aid in confirming or disconfirming the hypothesis.
Authors' response: This has been attempted but without clear patterns emerging, and so the results have not been included in the paper. However, some support for glycine tRNA being the ancestral tRNA has been provided recently by Fujishima et al. (2008) [55]. Carrying out a phylogenetic/network analysis of 1953 archaeal tRNAs, they found that archaeal glycine tRNA might represent the ancestral sequence. Although starting from a different set of suppositions than ours (they believe that the split tRNA genes of Nanoarchaeum equitans represent the ancestral state of tRNA, rather than being derived from intron-containing tRNAs as we would argue), they conclude that 'minigenes encoding 5' and 3' tRNA sequence of tRNAGly were the origins of other tRNA genes in the very early stage of tRNA evolution'.
Third, some justification for why CC rather than CA became the first fixed anticodon seems necessary (as both CC and CA would be present in all sequences duplicating the terminal CCA). There are several arguments based on thermodynamics and/or inspection of the canonical genetic code table that could be advanced here, although none stands out.
Authors' response: Although we have not included with our proposal a possible sequence for the first anticodon loop, it seems reasonable to believe that the CCA sequence has always been in its current location, with (N)CC in the anticodon position. This is because seven nucleotide loops such as the anticodon loop interact predominantly through the central three nucleotides [17]; positions 35 and 36 seem to be particularly important for the strength of the anticodon-codon interaction, which with CCA in its current position are occupied by 'C's, providing two strong G-C interactions. Also, (as elaborated in our responses to the first reviewer), it appears likely that an A (or G) is required in position 37 in order to stabilize the anticodon-codon interaction by base stacking on the anticodon-codon helix [17]. Therefore CCA in its current position fulfills two important requirements for enabling a strong intermolecular interaction. In contrast, (C)CA in the anticodon position would probably have required a modified nucleotide in position 37 in order to stabilize the anticodon-codon interaction (due to the presence of an 'A' at position 36), as previously discussed. In the contemporary genetic code the (C)CA anticodon occurs in tryptophan tRNA, which usually has a modified nucleotide at position 37 (Additional file 1, part A).
Some minor points
It might be useful to exclude chloroplasts and mitochondria on the grounds that we know these sequences are derived (from within the cyanobacteria and the alpha-proteobacteria respectively), and therefore cannot tell us about origins. However, examination of these sequences separately, because they can be rooted with outgroups from free-living bacteria, could be useful for testing whether the CCA is maintained by selection in modern tRNA-Gly sequences.
Authors' response: As already noted, the mitochondrial and chloroplast data support the argument that the anticodon loop CCA sequence has been frozen in the vast majority of all glycine tRNAs. The fact that only 1 of 219 of the glycine tRNA gene sequences analysed from mitochondria and chloroplasts does not possess an anticodon CCA demonstrates this sequence has been maintained in these lineages.
The issue with SELEX against glycine isn't coupling to the column (indeed, Gly columns are typically used as counterselections in amino acid selections), but rather the belief that a single H side-chain provides a target that will be very difficult to bind specifically given the steric issues with the linker and any protecting group used for the carboxyl or amine (depending on how the amino acid is coupled to the column).
Authors' response: This point was originally unclear in the manuscript and has now been modified.
Selections against a Gly column to the exclusion of other aminoacylated columns are thus not likely to generate aptamers that bind free Gly in solution with reasonable specificity. It is possible that this could be worked around using the Breaker lab's allosteric selection paradigm, which allows selection of sequences that bind targets free in solution, although this technique can only isolate very high-affinity binders that might not be relevant to the code's origin (much worse Kd's are available through the affinity chromatography approach). However, to my knowledge, these experiments have not been attempted.
Reviewer 3: Eugene Koonin, National Center for Biotechnology Information, NIH
This article proposes a very specific hypothesis on the origin of the first steps in the evolution of the genetic code. According to Bernhardt and Tate, tRNAGly was the first tRNA to evolve, and more specifically, it evolved via the duplication of an RNA hairpin containing a 3'-CCA sequence and subsequent ligation of the two half-tRNAs that created the anticodon. The ligation, according to the hypothesis, was catalyzed by the evolutionary predecessor of the group I self-splicing intron that is present in the anticodon loop of tRNAGly. The tRNAGly is supposed to have given rise to the rest of the tRNAs, presumably, via a series of duplications – these subsequent steps are not really discussed in the paper but rather implied by Figure 1.
Authors' response: We have revised the manuscript to include mention of the evolution of the original glycine tRNA into tRNAs specific for other amino acids by a process of duplication and mutation.
This may sound harsh but, for the sake of clarity, I will state my position in straightforward terms: to me, this is more of a free-wheeling speculation than a useful hypothesis. There are, I believe, three lines of argument marshalled in support of the hypothesis: i) glycine is widely believed to be one of the first, primordial amino acids, ii) according to Di Giulio's hypothesis, tRNAs evolved by duplication of "clover leaf halves", iii) tRNAGly contains a nearly universally conserved CCA sequence that is located in the anticodon loop and next to the universal intron insertion site. The first argument is reasonable but non-specific and weak; in any case, glycine could not have been the only primordial amino acid, and it is not at all clear why tRNAGly should have come first. The second argument that is presented in the present manuscript almost as an established fact is only a hypothesis itself, and not a particularly strong one; again, regardless of its validity, it says nothing about glycine specifically. The third argument is central to the article and is the only one that stems from actual sequence analysis. Unfortunately, the conservation of this CCA sequence is a simple consequence of the fact that all tRNAGly contain CC in the 2nd and 3rd positions of the anticodon, and this in turn is a straightforward consequence of the structure of the genetic code. Why glycine is encoded by GGX is an interesting question, and the answer may or may not be frozen accident but there seems to be no connection with the possible presence of the CCA-OH acceptor sequence in the primordial hairpin that gave rise to tRNAGly. To me, this key proposal of the present paper is arbitrary.
The problems that, in my view, invalidate this paper are by no means unique, but rather, common to many ideas on the origin of life. The problem is extremely hard, and the temptation to engage in free speculation is strong and understandable. Unfortunately, this does not provide for a useful, let alone falsifiable hypothesis.
Authors' response: Our responses to the first reviewer are relevant to the issues raised here. To summarize:
1. We are proposing that the first (glycine) tRNA arose within an environment containing up to 11 different RNA hairpins, each aminoacylated with a specific amino acid [4] (Table 1, middle column). If tRNA has evolved from a single ancestral molecule, there must have been one that came first. Why glycine tRNA? As previously discussed, a number of theories on the origin of the genetic code (reviewed and summarized by Trifonov [9]), based on a range of suppositions, place glycine as the first, or in the first group of amino acids incorporated into the genetic code (see also [10, 11, 19, 20]).
2. Schimmel's experimental work [4] on aminoacylatable hairpins supports and strengthens Di Giulio's theory of the hairpin origin of tRNA [1, 2, 32]. A large number of authors in addition to Di Giulio have proposed theories of the hairpin origin of tRNA. Of these, Ohnishi [56], and Nagaswamy and Fox [57], as well as Di Giulio [32], have proposed a hairpin ligation model with incorporation of the 3'-CCA terminus of the upstream hairpin in the anticodon loop of the resultant tRNA, however at different locations to our proposal.
3. Our finding of a highly conserved CCA sequence in the anticodon loop of 96% of contemporary glycine tRNA genes from bacteria, archaea and eukaryote cytoplasm forms the crux of our hypothesis. It seems to us that the presence and precise position of this sequence provides a possible clue to the identity of the first tRNA and the origin of the genetic code. Rather than being arbitrary, we would argue that our theory brings together a number of disparate theories/findings and produces a logical evolutionary scenario. Rather than being unfalsifiable, subsequent observations or experiments that gave proof that tRNA did not arise by a hairpin ligation, or that the canonical intron insertion position is not ancestral, or that glycine was not an early amino acid would all throw serious doubt on our proposal.
4. We agree with the reviewer that, 'Why glycine is encoded by GGX is an interesting question', and believe that our hypothesis offers a plausible explanation.