The Rodin-Ohno hypothesis that two enzyme superfamilies descended from one ancestral gene: an unlikely scenario for the origins of translation that will not be dismissed
© Carter et al.; licensee BioMed Central Ltd. 2014
Received: 25 March 2014
Accepted: 19 May 2014
Published: 14 June 2014
Because amino acid activation is rate-limiting for uncatalyzed protein synthesis, it is a key puzzle in understanding the origin of the genetic code. Two unrelated classes (I and II) of contemporary aminoacyl-tRNA synthetases (aaRS) now translate the code. Observing that codons for the most highly conserved, Class I catalytic peptides, when read in the reverse direction, are very nearly anticodons for Class II defining catalytic peptides, Rodin and Ohno proposed that the two superfamilies descended from opposite strands of the same ancestral gene. This unusual hypothesis languished for a decade, perhaps because it appeared to be unfalsifiable.
The proposed sense/antisense alignment makes important predictions. Fragments that align in antiparallel orientations, and contain the respective active sites, should catalyze the same two reactions catalyzed by contemporary synthetases. Recent experiments confirmed that prediction. Invariant cores from both classes, called Urzymes after Ur = primitive, authentic, plus enzyme and representing ~20% of the contemporary structures, can be expressed and exhibit high, proportionate rate accelerations for both amino-acid activation and tRNA acylation. A major fraction (60%) of the catalytic rate acceleration by contemporary synthetases resides in segments that align sense/antisense. Bioinformatic evidence for sense/antisense ancestry extends to codons specifying the invariant secondary and tertiary structures outside the active sites of the two synthetase classes. Peptides from a designed, 46-residue gene constrained by Rosetta to encode Class I and II ATP binding sites with fully complementary sequences both accelerate amino acid activation by ATP ~400 fold.
Biochemical and bioinformatic results substantially enhance the posterior probability that ancestors of the two synthetase classes arose from opposite strands of the same ancestral gene. The remarkable acceleration by short peptides of the rate-limiting step in uncatalyzed protein synthesis, together with the synergy of synthetase Urzymes and their cognate tRNAs, introduce a new paradigm for the origin of protein catalysts, emphasize the potential relevance of an operational RNA code embedded in the tRNA acceptor stems, and challenge the RNA-World hypothesis.
This article was reviewed by Dr. Paul Schimmel (nominated by Laura Landweber), Dr. Eugene Koonin and Professor David Ardell.
KeywordsAminoacyl-tRNA synthetases Urzymes Genetic code Origin of Translation RNA World hypothesis Amino acid activation Structural homology Ancestral genes Sense/antisense coding
Open peer review
Reviewed by Dr. Paul Schimmel (nominated by Laura Landweber), Dr. Eugene Koonin and Professor David Ardell. For the full reviews, please go to the Reviewers' Reports section.
“…there is no single path to creativity. We are constrained not by the necessary discipline of rigor but by the limits of our own imaginations and intellectual courage. In the words of Jazz musician Fats Waller, Dare to be wrong or you may never be right.”
J. Michael Bishop 
“How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?”
Sir Arthur Conan Doyle 
Sergei Rodin (1947-2011) was both a mentor and a collaborator. When the paper that launched this work  was challenged , Sergei was so incensed that he and his son, Andrei, wrote a brilliant, rebuttal on our behalf . Thus, I also considered him my friend.
The origin of the universal genetic code is one of the most important, fascinating, and vexing questions facing contemporary biologists. Sergei devoted much of his professional life pursuing this question using several perspectives from “outside the box” [5–11]. One of his more unlikely hypotheses was the possible ancestry of Class I and Class II aminoacyl-tRNA synthetases as sense- and antisense- gene products expressed from the same primordial gene . That hypothesis was an elegant realization of thoughts expressed by both Bishop and Doyle. This review considers recent experimental support for, and implications of, their hypothesis.
Absent catalysts, aminoacyl-5’ adenylates form both very slowly and in very low equilibrium yield. The activation step proceeds 103-104 times more slowly than the second in aqueous solutiona, and therefore requires a correspondingly more potent catalyst. Release and subsequent hydrolysis of the pyrophosphate leaving group are both necessary to ensure that activated amino acids are formed in high yield. Further, once formed, the aminoacyl-5’ adenylate is exceeded in reactivity only by acyl-halides . In fact, of all reactions involved in ribosomal protein synthesis and in both kinetic and thermodynamic terms, amino acid activation is, mechanistically, by far the most challenging.
Just how far back in time the three functions can be traced lies close to the heart of the code’s origins. Consensus holds that aaRS enzymes had essentially assumed their modern configurations in the last universal cellular ancestor, LUCA [28–30]. It is certainly not idle speculation, therefore, that simpler ancestral aaRS preceded LUCA by eons. Nevertheless, because LUCA represents a localized “Big Bang” [31, 32], it was associated with intense genetic exchange . Thus, it becomes much more difficult to trace phylogenetic lineages for either activity much beyond that hypothetical landmark. One possible avenue lies in the identification and functional annotation of broadly conserved tertiary packing motifs, illustrated pointedly  by a nearly invariant core packing motif belonging to ~ 125 families in the Rossmannoid superfamily. That motif is associated with a discrete supersecondary structure that binds ATP and nucleotides in general, in keeping with its possible role in primordial chemical free energy conversion, and has been identified as a “protoallosteric site” . A related effort is the expression and engineering of invariant cores from enzyme superfamilies [36–40]. We call these constructs Urzymes, from Ur = primitive, original; they are our central focus here (see also ).
Aminoacyl-tRNA synthetases: why two families?
The most highly-conserved aaRS active-site amino acids occur in three sets of signature sequences. We’ll focus on Class I HIGH and KMSKS and Class II Motif 1 and 2. With rare exceptions, conserved amino acids with a direct, catalytic role in Class I active sites are drawn from amino acid substrates activated by Class II enzymes, and conversely (Figure 2). It is hard to imagine how this came about unless the evolutionary ancestors appeared simultaneously, rather than sequentially, as is often argued [43, 44].
Coding simplicity. The earliest proteins probably were encoded by a much simpler genetic code than the code of 20 amino acids we have today. In fact, a binary code of two amino acid TYPES that specify “inside” and “outside” seems to represent almost sufficient information (turns excepted) to encode globular objects with selectable functions and hence, to launch natural selection. Combinatorial libraries of polypeptides based on a binary, middle-base code that differentiates only between core and surface amino acids contain high proportions of products with the biophysical characteristics of molten globules , and give rise to significant functionalities [46, 47]. Two different kinds of synthetases with rudimentary specificities reflected in median hydropathies of the contemporary Class I and II aaRS substrates might thus have been sufficient to launch codon-directed protein synthesis.
Physical chemistry. Amino acid substrates of the two classes sort into just such a distinction. The apparent symmetry relating the three subclasses and the inordinate water preference of Class I arginine  mask an overwhelming difference between the hydrophobic character of Class IA and Class IIA amino acids. Despite exceptions, Class II amino acids generally prefer the aqueous phase, Class I amino acids the hydrocarbon phase. Their median free energies of transfer of between water and cyclohexane differ by -4.6 kcal/mole . Class I (larger) and Class II (smaller) amino acids are also distinguished by size. Solvent transfer free energy (P < 10−7) and mass difference (P < 10−4) contribute synergistically (P < 10−2) to the solvent accessible surface area in folded proteins (Carter, CW Jr & Wolfenden, R tRNA Acceptor-Stem and Anticodon Bases Form Independent Codes Related to Protein Folding, in preparation); Class II amino acid side chains are, on average, 54% exposed whereas Class I amino acids are 32% exposed (P ~0.03).
Genetic linkage. A pre-cellular world populated by quasispecies may have placed a premium on efficient information storage . We further believe that a substantial selective advantage would arise if both required kinds of synthetases were present at the same time and place. Coding Class I and II on opposite strands would link genes for the two classes genetically, assuring that when one was present, so was the other.
Rodin and Ohno observed a statistically significant complementarity between consensus coding sequences for class-I defining PxxxxHIGH and KMSKS peptide signatures and Class II Motif 2 and Motif 1 sequences, and conversely . They inferred from this that ancestral Class I and II aaRS descended from opposite strands of the same gene, a proposal we call the Rodin-Ohno (RO) hypothesis. Despite the strength of their statistical tests (vide infra), it was not obvious when the idea first appeared that experiments could either falsify or confirm this extraordinary proposal. In the interim, however, specifying the hypothesis more precisely in terms of the respective tertiary structures has clarified its more important implications, opening experimental and bioinformatic avenues to assess its validity.
In the following, we first summarize the salient features of Urzymology, the study of Urzymes. Then we describe how Urzymology facilitates the modular deconstruction of Class I TrpRS and Class II HisRS and recapitulation of their evolution. Finally, we summarize how these two Urzymes help validate the Rodin-Ohno hypothesis.
Urzymology: structural biology yields insights about the invariant cores
Comparative anatomy has always been the sine qua non of phylogenic inference. Because our interest here concerns molecules far earlier than LUCA, our approach begins with structural biology and 3D superposition, whose application to aaRS has been reviewed . We used a variety of manual least squares and automated algorithms [51, 52] to perform similar analyses. Contemporary aaRS are moderately large enzymes, in keeping with their sophisticated tasks. Their structures also exhibit considerable variation within the two classes (Figure 3). From the outside the four Class I and II monomers superimposed in each part of the figure look quite different. Inside, however, a much smaller, invariant core of ~120-130 amino acids is nearly identical in all 10 examples of each class.
Procedures used in Urzymology
POSA server: http://fatcat.burnham.org/POSA/
ID shared invariant subset
Usually 10-30% of monomeric forms
Re-design exposed surface
Protein design: Rosetta http://www.rosettacommons.org/
Clone & Express
Maltose Binding Protein (MBP) fusions; TEV cleavage
Rate accelerations, substrate specificities
Single turnover active-site titration, Steady-state kinetics, genetic mutation and manipulation
Multi-scale modular de- and reconstruction
On the other hand, removing all insertion and anticodon-binding domains from both classes leaves the potential sense/antisense alignment intact (Figure 5B). Remarkably, both invariant cores include complete ATP- and amino acid-binding sites, together with rudimentary binding sites for the 3’ CCA termini of tRNA (Figure 5C; ).Figure 5C illustrates the chief experimental prediction of the RO hypothesis: parts of either gene that cannot be related sense/antisense—both anticodon-binding domains, the insertions and Class II Motif 3—appear in some sense functionally superfluous. Removing them leaves precisely the invariant cores we had identified for both enzymes, and these align quite closely, sense/antisense. The key experimental question is: how active are Class I and II aaRS Urzymes? Too little catalytic activity to produce activated amino acids at a sufficient rate to support uncatalyzed assembly into peptides would effectively falsify the RO hypothesis. Urzymes are catalytically very much more active than necessary.
AARS Urzymes both activate, and acylate tRNA with, cognate amino acids
Amino acid activation
Catalysis of amino acid activation by aaRS Urzymes left a key question unanswered: do these peptide catalysts also accelerate aminoacylation of tRNA? A central implication of the RO hypothesis is that sense/antisense encoded fragments should be exhibit both activities. Precedent and structural arguments led us to expect recognition of tRNA by aaRS Urzymes, even without the anticodon-binding domain, which is often considered a late addition . A considerable literature describes the acylation of isolated tRNA modules containing the acceptor stem [19, 20, 55, 56]. Comparable experiments have until now not been performed with modular fragments of aaRS, owing to the greater difficulty of constructing and purifying proteins, compared to RNA. Further, simultaneous appearance of a fully-developed genetic code, depending heavily on binding the anticodon loop of most tRNAs  is difficult to envision. Accordingly, Giegé, Schimmel, and others proposed that an earlier, “operational RNA code” in the tRNA acceptor stem was a forerunner of the present day code .
Data shown in Figure 7A, 7B for 32P-3’ adenosine-labeled tRNATrp and tRNAHis demonstrate that TrpRS  and HisRS  Urzymes catalyze tRNA aminoacylation. TrpRS and HisRS Urzymes therefore retain a full functional repertoire. In particular, catalytic activities required for protein synthesis are as finely tuned as the contemporary enzymes, between activation and acylation and between Classes, . They therefore represent convincing models for ancestral Class I and Class II aaRS.
Are Urzyme activities authentic?
Criteria for the authenticity of Urzyme catalytic activities
Empty vector controls
Purify, assay MBP
De Rigueur, but unconvincing
Renaturation from inclusion bodies
Tagged Urzymes purified from pellet
WT Enzymes do not segregate with inclusion bodies
MBP fusions release cryptic activity on TEV cleavage.
Assay fusion proteins ± TEV cleavage
Inhibition in fusion proteins is widespread, not universal.
Active-site titrations Urzyme preparations have significant bursts.
Single turnover time-courses
A key criterion, this is also essential for comparing kcat/KM.
Mutations, modular alterations induce predictable changes in activity.
Determine effect of active-site mutations, genetic manipulations
Active-site mutations generally affect Urzyme activities differently and can actually enhance activity because mechanisms are different.
Urzymes, WT enzymes have different Steady-state KM values.
Measure: kcat, KM, kcat/KM
Contamination by WT enzyme would saturate at WT KM.
Amino acid specificity is different from full-length
Urzymes are generally low specificity, high kcat catalysts.
Empty vector controls show essentially no activity. We express all Urzymes as maltose-binding protein (MBP) fusions to improve solubility. No unfused MBP expressed and purified in the same manner on an amylose resin exceeded background when assayed at 12 mg/ml with 32PPi exchange mixes for tryptophan, histidine, and leucine.
Active TrpRS Urzyme can be renatured from inclusion bodies . It is unlikely that native full length enzyme would contaminate inclusion body preparations.
Cleavage of MBP fusion proteins releases cryptic activity. MBP fusions inhibit both TrpRS and HisRS Urzymes ~ 50-fold [38, 39]. Cryptic activity released by Tobacco Etch Virus protease cleavage of purified fusion proteins implicates both the purified fragment and protease cleavage (see also 5 below).
Active-site titrations show significant pre-steady state bursts in single turnover assays, amounting to 10-90% of the total number of molecules. Active-site titrations measure single turnover time courses. If product release is rate-limiting, then turnover will be slower than the first round of catalysis, and extrapolation of the steady-state rate to the origin can be used to estimate the “burst” or the amplitude of the first-order portion of the reaction. Burst size therefore estimates the proportion of active molecules. Contaminating activity from a tiny amount of wild type full-length enzyme 105-fold more active than the Urzymes would exhibit an insignificant burst, and the entire time course would represent its steady-state rate. Active fractions also provide more accurate kcat values.
Both Urzymes show substantial bursts, which range between ~10 and 90%—much bigger than those expected from a rare, very active contaminant. Full length aaRS bind tightly to the aminoacyl-adenylate to protect the cell from a highly reactive adenylating reagent and to preserve the specificity achieved by the activation step . It is remarkable that the Urzymes also sequester the intermediate. Pre-steady state bursts are thus a third key function of full-length aaRS (Figure 1) retained by Urzymes from both Classes.
Active-site mutants and modular variants have altered activities. Molecular biologists recognize that manipulating the gene of a suspected source of activity can implicate that gene product in the observed activity. We therefore tested active-site mutations and modular variants in the TrpRS and HisRS Urzymes. All such experiments significantly altered activity. One result—mutation D146A in TrpRS Urzyme actually increased, rather than reducing activity as it does in the WT enzyme—was counter-intuitive. However, the catalytic function of D146 in full-length TrpRS likely requires allosteric coupling of missing CP1 and ABD modules, and its presence in the Urzyme likely stabilizes the ground state, rather than the transition state .Of particular interest are thermodynamic cycles that involve an aaRS Urzyme and complementing segments. Very short (6-20 aa) modules accelerate HisRS Urzyme amino acid activation by a small but significant amount (Figure 8; ). We PCR amplified the 122 residue fragment containing only Motifs 1 and 2, adding either a six-residue N-terminal extension (red), or Motif 3 (yellow), or both, giving us a balanced assay for the effects of both factors. The intrinsic catalytic enhancement of Motif 3 to transition-state stabilization by HisRS Urzyme, -0.85 kcal/Mole, is essentially identical to that of the much shorter and less obvious N-terminal extension; their synergistic effect is nearly twice that. Figure 8 emphasizes that the two modules stabilize the ATP binding site from opposite faces of the molecule.Full TrpRS specificity and tRNATrp aminoacylation activity both require essentially complete interdomain synergy (Figure 9; ). The TrpRS Urzyme favors tryptophan activation by ~10-fold over competing tyrosine and is ~400 times less specific than full-length TrpRS. CP1 and the anticodon-binding domain (ABD) must account for the increased specificity of native TrpRS. We addressed this question by comparing specificities of intermediate constructs in which the CP1 and ABD modules were added back individually to the Urzyme  to form a complete factorial experiment in those two variables (Figure 9A).
Quite surprisingly (Figure 9B), although adding back either CP1 or the ABD does enhance tryptophan binding, this effect is non-specific. Addition of either domain also reduced KM for tyrosine, such that the log of the specificity ratio (kcat/KM)Trp/(kcat/KM)Tyr, was actually ~0.0 (Figure 9B). The ~400-fold increase in specificity observed for full-length TrpRS, relative to the Urzyme depends entirely on cooperative interactions (also called epistasis [60–62]) between the two domains .
tRNATrp aminoacylation requires comparable interdomain synergy (Figure 9B). These experiments with TrpRS Urzyme support the unexpected conclusion that the Urzyme itself, consisting only of peptide segments that can align antisense to the HisRS Urzyme, is actually better at the two tasks—amino acid recognition and tRNA aminoacylation—required of aaRS, and hence lie closer to the actual path of aaRS evolution than either intermediate, potentially more advanced construct. Evolutionary development of contemporary enzymes must be more subtle than simply accumulating one module at a time.
Steady state K M values differ from those of the WT enzymes. Enzymologists recognize that the steady-state KM value is an independent signature. Contaminating wild-type enzymes, irrespective of concentration, would saturate at the same amino acid concentrations. Altered KM values are thus strong evidence against contaminating wild-type enzyme activity. @ ATP- and amino acid-dependent Michaelis-Menten data for the TrpRS and HisRS Urzymes show that ATP binding affinity is either the same or tighter to Urzymes than to contemporary aaRS [38, 39]. The kcat values are nearly comparable to those of the native enzymes. Urzyme amino acid KMs, however, are quite different from those of full-length, native enzymes. The TrpRS Urzyme KM for tryptophan is ~1 mM, 500 times higher than that of wild type TrpRS . That for HisRS-3, containing Motifs 1, 2, and 3, but lacking the six-amino acid N-terminal extension to Motif 1, is 120 μM, compared to 30 μM for wild-type HisRS and 45 μM for the Ncat catalytic domain .
HisRS and TrpRS Urzymes have reduced, but Class-dependent specificities. Weaker amino acid affinities imply that the Urzymes likely have reduced specificity for their cognate amino acids. The TrpRS Urzyme retains a 10-fold preference for tryptophan vs tyrosine [36, 39]. Second-order rate constant free energies, -RT ln(kcat/KM), for amino acid substrates activated by Class I LeuRS and Class II HisRS Urzymes (Figure 6C) show that both Urzymes are promiscuous. However, they both preferentially activate amino acids similar to the original substrate (i.e., Leu and His). By a factor of ~5-fold, they both prefer amino acid substrates from the Class to which they belong. This modest, class-dependent substrate specificity also rules out adventitious activities unrelated to aaRS-derived constructs. As with active-site titration and steady-state kinetic parameters, contaminating wild-type aaRS would have a specificity of ~4000-fold.
Experiments in (1-7) leave little doubt that the Urzymes are the authentic source of observed amino acid activation activities.
Bioinformatic evidence from multiple sense/antisense alignments
Three additional comparisons between subclasses IC and IIA using sequences for Class IC TyrRS and Class IIA ProRS, with which we rooted the Class I and II phylogenetic trees  also all have < MBP > = 0.34 ± 0.002. Middle-base pairing is quite similar in antisense alignments of TrpRS with ProRS, TyrRS with HisRS, or TyrRS with ProRS. Extending this metric to multiple sequences from other subclasses may lead to a deeper phylogeny of aaRS subclasses.
Reconstructed ancestral sequences derived by maximum likelihood methods from the phylogenetic trees show that the middle-base pairing frequencies, which are already markedly higher for bacterial sequences (Figure 10D), also increase for nodes closer to the root (Figure 10E) . Increased ancestral frequencies are consistent with the conclusion that middle codon-base pairing decreases with time, and hence that it was even higher and perhaps equal to 1.0 in the original gene, now inaccessible from the contemporary multiple sequence alignments. Thus, they add weight to the RO hypothesis. Over-represented extant taxa can bias reconstructions . Although middle bases should be least vulnerable to such biases, it is not known how, or by how much, they might affect sense/antisense alignments based on reconstructed sequences from different protein families, and thus how much they strengthen the RO hypothesis.
An existence proof for embedded modularity and sense/antisense coding
For these reasons, we characterized the 46-residue peptides from TrpRS and HisRS by several functional assays. Isolated 46-mers from both aaRS classes bind tightly to ATP (KD = 10-65 μM), as has been observed for structurally homologous 45-51 residue fragments isolated from a variety of nucleotide-binding proteins [67–71] containing the Walker-A motif and, by this criterion, may prove to be distant relatives of the Class I aaRS .
Regression analysis of the time, [amino acid], and [catalyst] concentrations of amino acid activation by Class I and II designed sense/antisense ATP binding sites
Prob > |t|
Evolutionary enzymology: “life-enabling” catalysis
Arguably, the most important function of enzymes is to equalize reaction rates for the diverse chemistry necessary for life. Catalyzed rates of chemical reactions important for life span at most five orders of magnitude, whereas the corresponding uncatalyzed rates range over 25 orders of magnitude Figure 6A [73–76]. This quantitative framework (Figure 6) lets us assess rate enhancements of the aaRS Urzymes and related catalysts that we have characterized (Figure 6B). Reactions much slower than the fastest uncatalyzed reaction must first be accelerated to about the same rate; otherwise life would be impossible. A key contribution of catalysis, therefore, is to ensure that different kinds of chemistry will happen at close to the same rates.
Assembly of activated amino acids to form peptides is among the faster of the uncatalyzed reactions. Uncatalyzed amino acid activation is three orders of magnitude slower. Uncatalyzed amino acid activation must be accelerated ~1000-fold to provide material for protein synthesis, even in the absence of ribosomes. As described in the previous section, we have found experimentally that catalysts as small as 46 amino acids derived from both aaRS classes afford approximately “life-enabling catalysis” of this essential reaction.
Data in Figure 6C afford a preliminary glimpse at the potential coding properties of the LeuRS and HisRS Urzymes. These are also the first experimental data suggesting that the first proteins were indeed statistical peptides as proposed by Woese [49, 77, 78] that likely contained sufficient functionality to seed natural selection. We believe that they represent a crude and probably rather late snapshot from the extended process by which decoding proteins (e.g. aaRS) became able to reproduce themselves from self-contained RNA/DNA genes . Additional and more definitive data of this kind, and similar studies of tRNA specificities now afford a new experimental basis from which to unravel that subtle process, e.g. by posing questions such as: can the 46-residue peptide catalysts function using simpler amino acid alphabets?
Further, subsequent evolution from the earliest biological catalysis must have proceeded coordinately, in that reaction rates remained roughly synchronous as their proficiency increased. The histogram in Figure 6B shows experimental catalyzed rates for the aaRS constructs we have made. Class I, II catalytic proficiencies track the structural modularity of the two classes of contemporary aminoacyl-tRNA synthetases, with comparable rate increases over 11 orders of magnitude, providing a crude, but realistic, existence proof of the kind of evolutionary trajectory that led to the contemporary enzymes. Important expeditions forward in time from Urzyme base-camp have now shown that specificity in Class I aaRS requires allosteric behavior in the synergy of two domains, neither of which by themselves enhance fitness [35, 36].
Urzymology has strengthened the posterior probability of the Rodin-Ohno hypothesis
Popper  provides an appropriate stance from which to evaluate the RO Hypothesis. It is appropriate to ask whether or not the idea can be falsified by articulating specific predictions derived from the hypothesis, and assessing how new data gathered to test those predictions confirm or invalidate them. Bayes’s Theorem, in turn, provides a quantitative framework for how new data impact confidence in a hypothesis . It asserts that new data update the prior probability of a hypothesis via their conditional probability given the hypothesis, i.e., the likelihood.
Rodin and Ohno adduced a prior probability much higher than generally appreciated. Jumble tests for the observed complementarity relating the conserved Class I and II catalytic signatures had Z-scores ranging from 5.7 to 8.8. Rodin and Ohno understated the corresponding P-values, placing them at “<< 0.01” , rather than citing the actual values, 5 × 10−8 and 7 × 10−18 under the null hypothesis. Sense/antisense ancestry thus begins with a very strong prior probability based on the small statistical chance of otherwise observing the high complementarity between coding sequences for the class-defining motifs.
Invariant cores of Class I and II aaRS coincide with the only segments that align sense/antisense . By consensus, the most conserved amino acid sequences—often catalytic residues at the active site—are the oldest remnants in protein superfamilies. Ancestral gene reconstruction rests on this assumption, whose reliability has been established beyond reasonable doubt for more recent nodes since LUCA with reconstructed nodes defined by large multiple sequence alignments [61, 82–89]. Sequence conservation becomes intrinsically less reliable, the further back we reach in time, so structural biology inherits its mantle; we invest conservation of 3D structure with comparable significance.
AARS Urzymes from both classes, solubilized forms of the invariant cores, retain 60% of the transition-state stabilization of contemporary aaRS in both amino acid activation and tRNA aminoacylation, retaining rate accelerations proportional to the respective uncatalyzed rates . The gap between Urzyme and 46-mer rate accelerations (Figure 6B) is much larger than expected, showing that the most highly conserved secondary and tertiary scaffolds identified using 3D structural alignment are both necessary and sufficient to position conserved active site residues correctly for transition-state stabilization of both amino acid activation and tRNA aminoacylation [36–40]. As the most highly conserved cores actually catalyze the same reactions it becomes increasingly difficult to imagine that ancestors catalyzing both reactions were actually based on non-homologous structures, including ribozymes.
Approximately 70% of the coding sequences (94 of ~130 residues) derived from Class I and II Urzymes exhibit codon middle-base pairing frequencies that are greater by several hundred-fold times the standard error than those expected under the null hypothesis .
Codon middle-base pairing is not significantly different for any of the four combinations between Class IC and Class IIA .
Codon middle-base pairing increases toward the root of TrpRS and HisRS urgene trees .
Highly conserved ATP binding motifs 46 residues long from the Class I and II active sites can be coded by fully complementary nucleic acid sequences, and exhibit 400-fold stimulation over the uncatalyzed rate of amino acid activation (Figure 11 and ). They are phylogenetically and functionally reasonable ancestors of the respective Urzymes.
Probabilities associated with hypothesis testing with Bayes’s Theorem often take the form of odds ratios comparing posterior probability to that of the null hypothesis; the larger that ratio the stronger the case for rejecting the null hypothesis. Alternatively, ignoring the prior probabilities, we can examine likelihood ratios, or how much more probable the new data are under the hypothesis to be tested than under the null hypothesis. For large numbers, the logarithm of the likelihood ratio or log-likelihood gain affords the relative “support” .
The RNA World hypothesis
Copernicus famously said that Earth revolves around the sun. But opposition to this revolutionary idea didn’t come just from the religious authorities. Evidence favored a different cosmology.
“People may spend their whole lives climbing the ladder of success only to find, once they reach the top, that the ladder is leaning against the wrong wall” - Thomas Merton
The genetic code is undoubtedly the nexus between chemical evolution, where genetic inheritance is meaningless, and biological evolution, from which we can in principle trace phylogenies. Conventional wisdom holds that this nexus was traversed exclusively by RNA molecules. That hypothesis is broadly held to be the only likely scenario for simultaneous introduction of genetic information and catalysis . Belief in the early importance of RNA-only metabolism continues to be strong and actively pursued , to the exclusion of alternatives. Sergei Rodin was a persuasive and imaginative advocate of this notion .
There is broad consensus, which we share, that RNA was a carrier of information mediating the origin of codon-dependent translation. Catalysis, however, is an entirely different matter. The RNA World hypothesis rests almost exclusively on engineering and selection of RNA aptamers capable of RNA replication [93, 94], amino acid recognition , aa-5‘AMP synthesis , and tRNA aminoacylation . The relevance of such experiments is hard to assess. They arise because oligonucleotide syntheses are so accessible, and because SELEX technology can select from extraordinarily large combinatorial libraries. Indeed, given such awesome selective power, it would be surprising not to have identified such aptamers. Without genetic evidence linking them to biology it is difficult to attach significance to their catalytic activities.
Before our work on aaRS Urzymes, the prior , alternative case for a peptide/RNA origin rested on three arguments. (i) Modeling suggests that even the emergence of RNA and the establishment of the code required the catalytic repertoire of stereochemically complementary polypeptide hairpins. (ii) RNA in biology is made entirely by proteins and conversely, proteins are assembled by ribozymes, suggesting that this may always have been the case. (iii) The complete absence of contemporary catalytic RNA genes for either ribozyme catalyzed free-energy conversion (amino acid activation by ATP) or codon-dependent translation (specific recognition and catalysis of acyl-transfer to tRNA) argues persuasively that these processes may never have been catalyzed by ribozymes. The magnitude of this gap in phylogenetic support is substantial and, in our view, decisive.
The unexpected catalytic power of relatively simple peptides allows us to invest the peptide/RNA alternative scenario with a much higher probability [37, 98, 99], in part because it implements rudimentary sense/antisense stereochemical coding of two amino acids per base of an RNA double helix. The unexpected sophistication of aaRS Urzymes implies that they had even simpler ancestors. The two aaRS classes are certainly among the oldest, if not the oldest, protein superfamilies. The RO hypothesis  implies that they arose at nearly the same instant in geological time because, at the nucleic acid level, the information necessary for function of each Class is indistinguishable from that necessary for function of the other. Complementarity means that one implies the existence of the other. Sense/antisense coding thus projects back past the genetic coding nexus to chemistry.
By greatly reducing the information necessary to launch natural selection, Urzymes strengthen the case that it arose from a balanced peptide-RNA partnership. The origins of natural selection are rooted in catalysis—producing some molecules faster than others. As amino acid activation rates limit spontaneous peptide synthesis, the initial selective advantage of the earliest catalysts was probably the ability of ancestral synthetases to mobilize ATP (NTPs) and to activate amino acids, enhancing rates at which peptides could be made. In this scenario, information and catalysis both began simply and evolved together. Rather than an unaccountable burst of both information and catalysis, the peptide/RNA scenario lays out a credible path to complexity.
The transition from chemical evolution to genetic biology nevertheless remains baffling. Questions remain about how Urzyme coding sequences began to function as genes. However, at some point, that did happen. Codon middle-base pairing frequencies establish a phylogenetic lineage that projects peptide-mediated catalysis further toward the origin of life than was previously considered possible, to events at the origin of translation in a peptide/RNA world .
The RO hypothesis is certainly falsifiable ; three orthogonal but equally rigorous tests fail to do so. Invariant cores that align sense/antisense have considerable and comprehensive catalytic activities. Coding sequences for 70% of Class I, II Urzymes exhibit unexpectedly high middle-base pairing. Products from a designed, sense/antisense gene for Class I and II ATP binding sites both exhibit appropriate catalytic activities. Our formulation is Popperian  and Bayesian : these data have very high probability under the RO hypothesis, which is thus a much more probable explanation than others of how the two aaRS Classes arose.
Urzymes demonstrate that the most highly conserved segments, by themselves, have high activity but low specificity—the very properties expected of catalysts implementing the genetic code. By Ockham’s razor, the true ancestral aaRS were unlikely to have differed greatly from those of the TrpRS and HisRS Urzymes. More generally, Urzymes are authentic catalysts that model very early enzymes. Their enzymatic activities provide valid metrics for testing improvements in fitness from modules as small as 6-20 aa  and for novel thermodynamic cycle analysis of contemporary enzyme function [36, 35].
Aspects reviewed in this work were carried out using methods described in detail in the original publications [36–40, 63]. Briefly, invariant cores from both aaRS superfamilies were identified by 3D superposition , re-designed if necessary using Rosetta Design , expressed either with FLAG and His6 tags or as maltose-binding protein fusions, purified using these tags, and assayed as described .
All statistical calculations were performed using JMP . P values under the null hypothesis in the description of amino acid physical chemistry were estimated from multivariate linear regression models in which the dependent variable (i.e., solvent accessible surface area of each amino acid in folded proteins) is expressed as a linear combination of other predictors, amino acid mass and solvent transfer free energies for each amino acid, plus their interaction.
Data presented in Figure 6C were obtained by Michaelis-Menten steady-state kinetics as in  in which all amino acids (excepting tyrosine because of its limited solubility) were substituted individually for cognate amino acids. Four-fold replicate assays were performed on two occasions and all replicated data were treated independently in nonlinear fits to the Michaelis-Menten equation using JMP. Maximum velocities were divided by the concentration of active sites to give kcat values. Proficiencies, kcat/KM, were converted to free energies and plotted.
ATP was titrated at pH = 4.5 with increasing amounts of 46-residue segments isolated from TrpRS and HisRS by PCR amplification and purified by affinity chromatography as noted in the following paragraph. This assay, described by Mildvan  detects fluorescence changes as the peptide orders and binds to ATP. ATP affinities were estimated from the titration curves using JMP . TrpRS and HisRS 46-residue peptides were also assayed by 32P PPi exchange, essentially as noted in the next paragraph.
Data presented in Figure 11 were obtained as follows: Rosetta was adapted (OE, XA, BK) to constrain sequences simultaneously for two backbones provided as scaffolds with the additional constraint that substitutions at each position have complementary codons, assuring that the resulting gene was fully sense/antisense. The resulting genes were inserted separately in opposite directions for expression as MBP fusion proteins, expressed and purified by affinity chromatography on amylose, nickel-NTA, and blue sepharose supports and stored in 50% glycerol at -20 C. Time-dependent assays were sampled at 0, 3, 6, 9, and 12 days in parallel with background controls using the standard 32P PPi exchange assay conditions . After determining that tryptophan over such long incubations induces elution from charcoal of a yellow compound that independently enhances scintillation counting, leucine was used to assay the Class I ATP binding site. Histidine did not show such behavior and was used in the assay of the Class II ATP binding site. Additional controls were done separately using maltose-binding protein itself with both amino acids, as isolated from amylose chromatography.
Reviewer 1: Dr. Paul Schimmel, SkaggsInstitute for Chemical Biology at The Scripps. Research Institute (nominated by Prof Laura Landweber)
This review recapitulates the long-standing Rodin-Ohno hypothesis that, by postulating that complementary strands of early genes encoded members of the two classes of tRNA synthetases, the mystery of two classes is solved. Specifically, this complementarity is proposed to come from the active-site-encoding mRNA of one synthetase (say, from class I) being the anti-sense of the active-site-encoding mRNA of a synthetase from the opposite class (say, class II). Thus, one duplex encodes both types of synthetases. In the active-site encoding region, there is a group of ‘codons’ in the strand encoding a class I synthetase that are paired (in the duplex gene) with the corresponding group of codons in the strand encoding a class II tRNA synthetase. In their original work, RO presented evidence from the existing sequence databases to support their hypothesis. These databases have grown enormously since then and have provided further opportunity to test this hypothesis.
Carter, in a virtually single-handed way, has attempted to dig more deeply into the predictions of the hypothesis through experiments and bioinformatics. His first paper in Molecular Cell (Carter and Duax (2002)) was a provocative discovery of how the complementary strands of the NAD-GDH gene, in the fresh water mold Achlya klebsiana, code for the two different class-associated synthetase signature motifs. Thus, these gene-encoded signature motifs are in exact complementary alignment with each other in the A. klebsiana NAD-GDH gene. (Surprisingly, this paper is not cited.) This remarkable finding gave Carter the impetus to search out experimental proofs of the RO hypothesis, using peptide motifs that embodied the signatures of the class I and class II synthetases, and testing their abilities to stimulate amino acid activation. It also stimulated him to dig more deeply into the bioinformatics.
This review is a compendium of much of that work. The paper recapitulates the experiments of his laboratory, and also summarizes some deeper bioinformatics. The experimental work is outlined in great detail. An impressive long summary is given of experiments to prove that the results are not artifacts. And yet, by giving this long list, this summary has the appearance of being defensive. There also are ‘bumps’ , such as the failure to obtain clear results with the ‘mutants’ of the peptide motifs. And there is always the philosophical problem of ‘the absence of evidence of an artifact is not evidence of absence’. Although the work described is well done and rigorously thought through, Carter et al’s strong enthusiasm for their work on the peptides gives a sense of bias.
The section entitled “Bioinformatics evidence from multiple sense/antisense alignments” impressed me. Carter has great strength in this sort of analysis and presents an excellent update and extension of the earlier RO work. Likewise, the section of the Discussion entitled “Urzymology has…the RO hypothesis” is an excellent point-by-point of the state of affairs on the informatics side. I was quite impressed by the depth of this recapitulation.He views the work as providing a challenge to the RNA World hypothesis. I found this viewpoint somewhat curious. For myself, the two lines of thinking can by harmonized in a straightforward way, by an extension of the idea first described in Figure 2 of a rather obscure publication (Henderson, B. S. and Schimmel, P. (1997). RNA-RNA Interactions Between Oligonucleotide Substrates for Aminoacylation. Bioorgan. Med. Chem. 5: 1071-1079.) In that publication, the authors suggest that ribozymes first aminoacylated small RNAs, and then these aminoacyl RNAs formed clusters that brought together the activated amino acids to form peptides. I can imagine some of these peptides eventually associated with the ribozymes and enhanced their catalytic activities. I would also imagine that these peptides could be related in some way to the ones that Carter and his students have so well studied. Whether or not these ideas are correct, my main point is that Carter et al. have nothing to gain by attempting to provide a contrast with the RNA World hypothesis. Their work stands well on its own merits.
Overall the review is written in a somewhat rambling, diffuse style, and with emotional content. (I noted 5 exclamation points scattered throughout the text, and recommend that these be removed to give a lack of bias and to convey “academic sobriety”).
I am a fan of Carter and his work. He is widely respected for his thorough and deep understanding of thermodynamics and protein structure-function. His experiments are generally thorough and self-critical. He is the only person who not only expanded the analytical side of informatics that is relevant to the RO hypothesis, but also has done specific experiments to test the hypothesis. This sort of combined effort is rare in any field. My recommendation is that the paper be published. It is a fascinating topic that Carter alone is in the best position to summarize. But the text needs to be tightened up, shortened, and recast in a more sober style, considering some of the points raised above.
Authors’ response: We appreciate both the complimentary assessment of the work overall and the criticism of the writing. We readily eliminated all exclamation marks. It takes nothing away from the text to suppress hyperbolae. We have three substantive replies:
The extant A. klebsiana sense/antisense gene. Professor Schimmel asks why our first publication on the RO hypothesis was not cited. In fact, it was cited implicitly in the dedication to Sergei Rodin. That section now includes explicit references both to our paper (Carter & Duax, 2002) and to the paper challenging aspects of the work (Williams, et al., 2008), as well as the rebuttal published by Rodin, Rodin, & Carter (2009). Carter & Duax based their work on that of H. LeJohn, in which he used antibody precipitation to clone a stress-induced glutamate dehydrogenase[103–105]. The cloned gene expressed both the putative dehydrogenase and HSP70. Although LeJohn’s work was described in some detail in the three resulting papers, Williams, et al. (2008) concluded that the evidence that the protein expressed from the putative dehydrogenase clone was indeed the dehydrogenase was weak, and hence that the interpretation reported in Carter & Duax might be flawed. We decided against re-opening this issue here, in part because it contributes little to the subsequent work reviewed here on aaRS Urzymes, and because we have done little to resolve discussions with Dr. Koonin about a related issue concerning possible ancestral relationships of the Class II aaRS, actin-HSP70, and RNAse H superfamilies.
Mutational analysis of Urzymes and 46-mers. Professor Schimmel writes “There also are ‘bumps’, such as the failure to obtain clear results with the ‘mutants’ of the peptide motifs.” His reference fails to distinguish between two possibilities: (i) the D146A active site mutation in the TrpRS Urzyme actually increases activity and (ii) we have yet to obtain similar results for the 46-mer SAS gene products. Regarding (i), we can now postulate and are testing a coherent explanation for the unexpected activation of the TrpRS Urzyme by the D146A mutation. This explanation is now outlined in the appropriate section of “Are the Urzyme activities authentic?” point 5. Regarding (ii), we are in the process of testing active-site mutants to both 46-mers. The activities of these peptides are more difficult to validate than those of the Urzymes, owing to the fact that it is unlikely that they retain activated aminoacyl-adenylates and their active site titers cannot be determined as they were for the Urzymes. The authors feel that data in Table 3 afford compelling, though admittedly not definitive evidence of authenticity and hence justify publication of the data in Figure 11 C,D.
The RNA World hypothesis. Professor Schimmel questions our assessment of the RNA World hypothesis, suggesting that the essential validity of the RO hypothesis is neither evidence for nor against that scenario, and arguing that hypothetical RNA and peptide/RNA scenarios can be reconciled along lines he developed in an earlier paper. We disagree substantively on both points. The paper he cites documents an intriguing model for peptide synthesis from acylated RNA stems. However, it fails to address the fundamental issue posed by the RNA World hypothesis: where did RNA arise if not via rudimentary catalysis by peptides? The absence of present day ribozymes related in any phylogenetic sense to the any of required activities of the aminoacyl-tRNA synthetases or indeed to nucleic acid polymerases, should be a massive red flag. The sense/antisense ancestry of the aaRS appears to be solidly established. It points, intrinsically, far further back in time than do multiple sequence alignments for any gene, establishing phylogenetic roots of the earliest coded peptides.
The RNA World hypothesis suppresses entire domains of important questions related to the physical chemistry of proteins and catalysis, including the absence of phylogenetic evidence for ribozymal nucleic acid polymerases. On the other hand, the contributing author’s alternative scenario, published a dozen years before Gilbert’s proposal, suggests coherent answers to many of these questions, and in addition affords a rudimentary, but consistent, path backward to a putative earlier sense/antisense genetic coding. Thus, there is much for both the authors and the literate scientific public to gain by revisiting that alternative hypothesis, especially as we are the ones who have resurrected it from oblivion with key catalytic activities that establish its credibility as an alternative. Suppressing discussion can hardly be productive.
Professor Schimmel’s discomfort with our discussion of competing hypotheses is, however, likely exacerbated by the imbalance of substance and polemic in that particular section of the submitted manuscript. We have re-balanced this section by removing >25% of the text, most of which was either polemical or redundant, and by supplementing it with additional references on ribozymal aptamers. Elsewhere, the manuscript has been tightened throughout and is now ~5% shorter overall despite the inclusion of new material responding to these and other reviewers’ comments.
Reviewer 2: Dr. Eugene Koonin, National Center for Biotechnology, NIH
This very lengthy, yet carefully and elegantly written article summarizes experimental data from the senior author's laboratory that is perceived to support the Rodin-Ohno hypothesis on the origin of the two classes of aminoacyl-tRNA synthetases from complementary strands of the same gene. I expect this paper to become an important contribution to the literature on the origin of codon-dependent translation. It contains a plethora of interesting ideas and descriptions of ingenious experiments. After quite some thought, I have decided not to comment in specific detail on the Rodin-Ohno hypothesis and the validity of the presented argument in that regard. Again, Carter and colleagues present their argument in detail and with great care, so an interested and qualified reader will be able to judge it.
Authors’ response: We are grateful to Dr. Koonin both for his generous remarks and for having provided in Biology Direct an appropriate venue in which to generate public dialog of topics growing from the work described. As noted above, the revision benefits from tightening and some re-structuring.
Reviewer 3: Professor David Ardell, University of California, Merced
This work by Carter et al. reviews a substantial and growing literature on testing and extending the Rodin-Ohno hypothesis using ‘urzymes’—experimentally tractable models of early aaRSs. It presents new data on the substrate specificities of urzymes—and describes a designed experimental ‘existence proof’ of complementary antisense coding of Class I-type and Class II-type amino acid activation activities.
The Rodin-Ohno hypothesis is consistent with persuasive ideas about early life. For instance, in their original work, Rodin and Ohno speculate that sense-antisense coding may, via compression, bring replication advantages to quasispecies. Overlapping genes may also be favored through co-transfer of co-dependent ‘decoding genes’ during code evolution in structured populations (Vetsigian et al. (2006) in “Collective evolution and the genetic code” PNAS 103(28):10696).
The present work by Carter et al. places weight on Rodin and Ohno’s own statistical analyses using permutation tests: ‘jumbling’ as according to R. Doolittle. By design, these jumbles are not constrained to conserve sequence, particularly the critical active site motifs of the two classes of aaRS. Should we not instead attempt to model the space of all possible sequences with primordial Class I-type and Class II-type aaRS activities, and use this as a condition when measuring the probability of complementarity? What indeed are the spaces of all shortest protein sequences with Class I-type or Class II-type enzymatic activities comparable to (those of) urzymes? Or to frame the question differently, how ‘designable’ are the Class I and Class II aaRS active sites (‘Sequence optimization and designability of enzyme active sites’ by Chakrabarti et al. (2005) PNAS 102(34):12035)? Were the four motifs—HIGH, KMSKS, Motifs I and II and the extended secondary structures studied by the authors of the present work—inevitable products of selection for these activities? If the motifs and structures were inevitable, then perhaps their encoded head-to-tail antisense complementarity is just a remarkable coincidence.Of the new data presented by the authors, I found the data in Figure 6C of interest, reporting activities of urzymes in activating different amino acids in ambiguous, yet apparently class-specific, ways. These data have not been published elsewhere, and I caution that I was unable to evaluate them critically. At face value they raise the question whether the ‘statistical urzymes’ implied by them have class-specific amino acid activation activities or not. It would seem to strengthen the Rodin-Ohno hypothesis if they did, but not necessarily refute it if they did not. A refinement of this question would take into consideration that not all amino acids were available in early ancestral genetic codes. Generally speaking, at what rates could antisense-coded urzymes regenerate themselves in model prebiotic translation systems? For a relevant theoretical treatment of this question, please see Bedian (2001). “Self-description and the origin of the genetic code”. BioSystems 60:39.
Part of the authors’ case in this review rests on ancestral reconstructions of aaRS coding sequences. These reconstructions must be among the most ambitious possible, in terms of the depth of reconstruction and base composition nonstationarity of the data being modeled over the Tree of Life. Nonstationarity is especially problematic in causing bias in ancestral sequence reconstructions (Susko and Roger (2013). “Problems With Estimation of Ancestral Frequencies Under Stationary Models”. Syst Biol 62:330). I don’t believe that the results discussed in this work, on complementarity in reconstructed ancestors, adequately controls for this bias.The following statement “Without exception, conserved amino acids with a direct, catalytic role in Class I active sites are drawn from amino acid substrates activated by Class II enzymes, and conversely” seems misleadingly strong to me given data from single structures shown in Figure 2. This is a much stronger claim than Rodin and Ohno themselves made in their analysis of sequence variation (on page 568 of their work).
The statistical methods leading to the p-values reported in the Physical Chemistry section on page 8 should be briefly summarized.
Authors’ response: The authors are especially grateful for the careful reading and thoughtful criticism from Dr. Ardell, who identified several intriguing questions, some that we feel largely lie outside the scope of this article, but which point directly toward future investigations. We were unaware of several references provided and have tried to cite them appropriately in the revision.
Self description and the origin of the genetic code. We appreciate the identification of previously unpublished work presented in two sections, the specificity spectra of Class I TrpRS and Class II HisRs Urzymes, and characterization of the sense/antisense gene for the 46-residue ATP binding sites of Class I and II synthetases. In order to facilitate more critical assessment, we have amplified the description in Methods of how specificities for each amino acid were determined. As with much of what is reviewed in this paper, in this these data represent the first attempt to bring the underlying question raised by Professor Ardell further in par 4 of his review into an experimental context. These preliminary data are certainly neither comprehensive nor definitive. Nonetheless, they represent an honest attempt to provide experimental bases from which, eventually, to approach the question posed by Bedian. We agree wholeheartedly with Professor Ardell that the aminoacyl-tRNA synthetases lie at the center of the genesis of biological self-reference represented by the genetic code. Our work thus far has helped only to define experimental systems with which this problem can be fruitfully addressed. Our revision includes a new paragraph just before the Discussion in which we touch briefly on the questions posed by Bedian. Our work is still at the beginning of the effort to answer the question; thus it seemed inappropriate to speculate further.
Designability . This is an excellent question, one about which we have thought quite a lot. In an as yet unpublished study of the LeuRS Urzyme construction, we characterized eight different Urzymes designed by Rosetta. These Urzymes had a narrow range of specific activities. That is, they all had almost the same activity as the re-designed TrpRS Urzyme. That Rosetta design experiment does not fully address Professor Ardell’s question, however, because we did not allow changes to active-site residues; nor did we constrain the design of catalytic hydrogen-bonding interactions. We note here, however, that this question is in some ways simply a re-phrasing of the question addressed in the preceding paragraph. Curiously, however, more in-depth comparison now reveals functionally important differences between some of the LeuRS Urzymes we selected for further work. Notably, variation in the loop connecting the specificity determining helix to the GXDQ motive at the N-terminus of the C-terminal alpha helix generates two LeuRS Urzymes, one of which exhibits a pre-steady state burst, the other of which does not. Pursuit of that question is obviously a valid future research project.
Non-stationarity of ancestral character states . This is also an excellent question. We were unaware of the potential bias identified in the paper by Susko and Roger, which appeared online only in September 2013, by which time our own paper had been published for two months. We certainly will attempt to take the bias into account in future work. We have qualified our interpretation of the reconstructed sequences in the revision. The middle-base pairing frequency statistic, which as Professor Ardell correctly states is among the most ambitious metrics ever proposed for phylogenetic comparisons, has independent validity irrespective of whether or not we reconstruct ancestral states. The data in Figure 10 B and 10 D reflect frequencies from contemporary sequences, not reconstructed ancestral states.
Interdependence of Class I, Class II aaRS . Professor Ardell questioned the strength of the phrase “Without exception..” in referring to the active-site compositions in Class I aaRS, which are constructed from Class II substrates, and vice versa. We have qualified the statement in the revision. Figure 2 is indeed drawn based on only a single Class I active site and a single Class II site. However, a comprehensive examination of active site compositions across the ten members of each family from several hundred species does reinforce that description. The HIGH signature contributes three residues H, G, and H that interact with ATP. G is essentially invariant because it is in van der Waals contact with the adenosine ring, whereas the two Hs are replaced only by T and N, and for example never by Q, which is in other contexts a functional substitute for both H and N. Similarly in the KMSKS motif the four catalytic residues are always K, S, and T. The only exceptions we know of are from eukaryotic TrpRSs and TyrRSs in which the terminal lysine is absent (replaced by A) and its function is taken by an arginine that occurs uniquely in these enzymes much closer to the amino terminus. Similarly all 10 Class II active sites invariably use R for transition state stabilization. The active site E is only very rarely a D. Thus, the statement discussing Figure 2 is scarcely hyperbole. The distinction between our treatment and that of Rodin and Ohno is that the latter authors included nonpolar amino acids that couple the active site residues to the rest of the protein, whereas we consider only those residues that interact directly with ATP.
Statistical methods . It is fair to ask for clarification of how P-values were calculated. This is explained more fully in a new addition to the Methods section.
This review originated in a presentation by CWCjr at a memorial symposium celebrating scientific contributions of Sergei Rodin, (1947-2011) at the Beckman Center, City of Hope, Duarte, CA. It is dedicated to the memory of Rodin and Ohno. None of the authors has a conflict of interest.
Last universal cellular/common ancestor
Amino acid activation
Sweet potato β-amylase
This work was supported by NIGMS R01-78227 and R01-90406. Gurkan Yardimci contributed to some of the bioinformatic experiments. R. Wolfenden assisted significantly with preparation of Figure 6.
- Bishop JM: How to Win the Nobel Prize: An Unexpected Life in Science (Jerusalem-Harvard Lectures). 2004, Cambridge, MA: Harvard University Press
- Doyle SAC: The Sign of the Four. 1894, London: Spenser Blackett
- Carter CW, Duax WL: Did tRNA synthetase classes arise on opposite strands of the same gene?. Mol Cell. 2002, 10: 705-708.PubMed
- Williams T, Wolfe KH, Fares MA: No rosettta stone for a sense-antisense origin of aminoacyl tRNA synthetase classes. Mol Biol Evol. 2008, 26: 445-450.PubMed
- Rodin A, Rodin SN, Carter CW: On primordial sense-antisense coding. J Mol Evol. 2009, 69: 555-567.PubMedPubMed Central
- Rodin AS, Szathmáry E, Rodin SN: On origin of genetic code and tRNA before translation. Biol Direct. 2011, 6: 14-PubMedPubMed Central
- Rodin SN, Rodin AS: On the origin of the genetic code: Signatures of its primordial complementarity in tRNAs and aminoacyl-tRNA synthetases. Heredity. 2008, 100: 341-355.PubMed
- Rodin SN, Rodin A: Origin of the genetic code: first aminoacyl-tRNA synthetases could replace isofunctional ribozymes when only the second base of codons was established. DNA Cell Biol. 2006, 25: 365-375.PubMed
- Rodin SN, Rodin A: Partitioning of aminoacyl-tRNA synthetases in two classes could have been encoded in a strand-symmetric RNA world. DNA Cell Biol. 2006, 25: 617-626.PubMed
- Rodin SN, Rodin A, Ohno S: The presence of codon-anticodon pairs in the acceptor stem of tRNAs. Proc Nat Acad Sci USA. 1996, 93: 4537-4542.PubMedPubMed Central
- Rodin SN, Ohno S: Two types of aminoacyl-tRNA synthetases could be originally encoded by complementary strands of the same nucleic acid. Orig Life Evol Biosph. 1995, 25: 565-589.PubMed
- Danchin A, Sekowska A: The logic of metabolism and its fuzzy consequences. Environ Microbiol. 2014, 16: 19-28.PubMed
- Binder PM, Danchin A: Life’s demons: information and order in biology: What subcellular machines gather and process the information necessary to sustain life?. EMBO Rep. 2011, 12: 495-499.PubMedPubMed Central
- Danchin A: Archives or palimpsests? Bacterial genomes unveil a scenario for the origin of life. Biological Theory. 2007, 2: 1-10.
- Koonin EV: The Logic of Chance: The Nature and Origin of Biological Evolution. 2011, Upper Saddle River, NJ: Pearson Education; FT Press Science
- Szostak JW: Systems chemistry on early Earth. Nature. 2009, 459: 171-172.PubMed
- Fersht AR: Dissection of the structure and activity of the tyrosyl-tRNA synthetase by site-directed mutagenesis. Biochem. 1987, 26: 8031-8037.
- Fersht AR, Knill Jones JW, Bedouelle H, Winter G: Reconstruction by site-directed mutagenesis of the transition state for the activation of tyrosine by the tyrosyl-tRNA synthetase: a mobile loop envelopes the transition state in an induced-fit mechanism. Biochemistry. 1988, 27: 1581-1587. Issn: 0006-2960PubMed
- Francklyn C, Musier-Forsyth K, Schimmel P: Small RNA helices as substrates for aminoacylation and their relationship to charging of transfer RNAs. Euro J Biochem. 1992, 206: 315-321.
- Francklyn C, Schimmel P: Aminoacylation of RNA Minihelices with Alanine. Nature. 1989, 337: 478-481.PubMed
- Ribas de Pouplana L, Schimmel P: Operational RNA code for amino acids in relation to genetic code in evolution. J Biol Chem. 2001, 276: 6881-6884.PubMed
- Schimmel P: Origin of genetic code: A needle in the haystack of tRNA sequences. Proc Nat Acad Sci USA. 1996, 93: 4521-4522.PubMedPubMed Central
- Schimmel P, Giegé R, Moras D, Yokoyama S: An operational RNA code for amino acids and possible relationship to genetic code. Proc Nat Acad Sci USA. 1993, 90: 8763-8768.PubMedPubMed Central
- Ibba M, Soll D: Aminoacyl-tRNAs: setting the limits of the genetic code. Genes Dev. 2004, 18: 731-738.PubMed
- Rogers MJ, Weygand-Durasevic I, Schwob E, Sherman JM, Rogers KC, Adachi T, Inokuchi H, Söll D: Selectivity and specificity in the recognition of tRNA by E. coli by glutaminyl-tRNA synthetase. Biochimie. 1993, 75: 1083-1090.PubMed
- Woese CR, Olsen GJ, Ibba M, Soll D: Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol Mol Biol Rev. 2000, 64: 202-236.PubMedPubMed Central
- McMurry J: Organic Chemistry. 2009, Independence, KY: Cengage Learning, Enhanced
- Aravind L, Anantharaman V, Koonin EV: Monophyly of Class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implication for protein evolution in the RNA World. Proteins: Struct Funct Gen. 2002, 48: 1-14.
- Wolf YI, Aravind L, Grishin NV, Koonin EV: Evolution of aminoacyl-tRNA synthetases—analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res. 1999, 9: 689-710.PubMed
- Fournier GP, Andam CP, Alm EJ, Gogarten JP: Molecular Evolution of Aminoacyl tRNA Synthetase Proteins in the Early History of Life. Orig Life Evol Biosph. 2011, 41: 621-632.PubMed
- Dokholyan NV, Shakhnovich EI: Understanding hierarchical protein evolution from first principles. J Mol Biol. 2001, 312: 289-307.PubMed
- Dokholyan NV, Shakhnovich B, Shacknovich EI: Expanding protein universe and its origin from the biological big bang. Proc Nat Acad Sci USA. 2002, 99: 14132-14136.PubMedPubMed Central
- Koonin EV: The Biological Big Bang model for the major transitions in evolution. Biol Direct. 2007, 2: 21-PubMedPubMed Central
- Cammer S, Carter CW: Six Rossmannoid folds, including the Class I aminoacyl-tRNA synthetases, share a partial core with the anticodon-binding domain of a Class II aminoacyl-tRNA synthetase. Bioinformatics. 2010, 26: 709-714.PubMedPubMed Central
- Weinreb V, Li L, Chandrasekaran SN, Koehl P, Delarue M, Carter CW: Enhanced amino acid selection in fully-evolved tryptophanyl-tRNA synthetase, relative to its urzyme, requires domain movement sensed by the d1 switch, a remote, dynamic packing motif. J Biol Chem. 2014, 289: 4367-4376.PubMedPubMed Central
- Li L, Carter CW: Full Implementation of the genetic code by tryptophanyl-tRNA synthetase requires intermodular coupling. J Biol Chem. 2013, 288: 34736-34745.PubMedPubMed Central
- Li L, Francklyn C, Carter CW: Aminoacylating Urzymes challenge the RNA World hypothesis. J Biol Chem. 2013, 288: 26856-26863.PubMedPubMed Central
- Li L, Weinreb V, Francklyn C, Carter CW: Histidyl-tRNA synthetase Urzymes: Class I and II aminoacyl-tRNA synthetase Urzymes have comparable catalytic activities for cognate amino acid activation. J Biol Chem. 2011, 286: 10387-10395.PubMedPubMed Central
- Pham Y, Kuhlman B, Butterfoss GL, Hu H, Weinreb V, Carter CW: Tryptophanyl-tRNA synthetase Urzyme: a model to recapitulate molecular evolution and investigate intramolecular complementation. J Biol Chem. 2010, 285: 38590-38601.PubMedPubMed Central
- Pham Y, Li L, Kim A, Erdogan O, Weinreb V, Butterfoss G, Kuhlman B, Carter CW: A minimal TrpRS catalytic domain supports sense/antisense ancestry of Class I and II aminoacyl-tRNA synthetases. Mol Cell. 2007, 25: 851-862.PubMed
- Carter CWJ: Urzymology: experimental access to a key transition in the appearance of enzymes. J Biol Chem. 2014, 289: In Press
- Carter CW: Cognition, mechanism, and evolutionary relationships in aminoacyl-tRNA synthetases. Annu Rev Biochem. 1993, 62: 715-748.PubMed
- Weiner AM: Molecular evolution: Aminoacyl-tRNA synthetases on the loose. Curr Biol. 1999, 9: R842-R844.PubMed
- Klipcan L, Safro M: Amino acid biogenesis, evolution of the genetic code and aminoacyl-tRNA synthetases. J Theor Biol. 2004, 228: 389-396.PubMed
- Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH: Protein design by binary patterning of polar and non-polar amino acids. Science. 1993, 262: 1680-1685.PubMed
- Moffet DA, Foley J, Hecht MH: Midpoint reduction potentials and heme binding stoichiometries of de novo proteins from designed combinatorial libraries. Biophys Chem. 2003, 105: 231-239.PubMed
- Patel SC, Bradley LH, Jinadasa SP, Hecht MH: Cofactor binding and enzymatic activity in an unevolved superfamily of de novo designed 4-helix bundle proteins. Prot Sci. 2009, 18: 1388-1400.
- Wolfenden R: Experimental measures of amino acid hydrophobicity and the topology of transmembrane and globular proteins. J Gen Physiol. 2007, 129: 357-362.PubMedPubMed Central
- Vetsigian K, Woese C, Goldenfeld N: Collective evolution and the genetic code. Proc Nat Acad Sci USA. 2006, 103: 10696-10701.PubMedPubMed Central
- O’Donoghue P, Luthey-Schulten Z: On the evolution of structure in aminoacyl-tRNA synthetases. Microbiol Mol Biol Rev. 2003, 67: 550-573.PubMedPubMed Central
- Roach JM, Sharma S, Kapustina M, Carter CW: Structure alignment via Delaunay tetrahedralization. Proteins: Struct Funct Bioinf. 2005, 60: 66-81.
- Ye Y, Godzik A: Multiple flexible structure alignment using partial order graphs. Bioinformatics. 2005, 21: 2362-2369.PubMed
- Burbaum JJ, Paul S: Assembly of a Class I tRNA synthetase from products of an artificially split gene. Biochem. 1991, 30: 319-324.
- Burbaum JJ, Starzyk RM, Schimmel P: Understanding structural relationships in proteins of unsolved three-dimensional structure. Proteins: Struct Funct Gen. 1990, 7: 99-111.
- Frugier M, Florentz C, Giegé R: Anticodon-independent valylation of an RNA minihelix. Proc Nat Acad Sci USA. 1992, 89: 3900-3904.
- Giegé R, Sissler M, Florentz C: Universal rules and idiosyncratic features in tRNA identity. Nucleic Acids Res. 1998, 26: 5017-5035.PubMedPubMed Central
- Yang X-L, Otero FJ, Ewalt KL, Liu J, Swairjo MA, Köhrer C, RajBhandary UL, Skene RJ, McRee DE, Schimmel P: Two conformations of a crystalline human tRNA synthetase–tRNA complex: implications for protein synthesis. EMBO J. 2006, 25: 2919-2929.PubMedPubMed Central
- Wolfson AD, Pleiss JA, Uhlenbeck OC: A new assay for tRNA aminoacylation kinetics. RNA. 1998, 4: 1019-1023.PubMedPubMed Central
- Weinreb V, Li L, Carter CW: A master switch couples Mg2+-assisted catalysis to domain motion in B. stearothermophilus tryptophanyl-tRNA Synthetase. Structure. 2012, 20: 128-138.PubMedPubMed Central
- Gong LI, Suchard MA, Bloom JD: Stability-mediated epistasis constrains the evolution of an influenza protein. eLife. 2013, 2: e00631-PubMedPubMed Central
- Ortlund EA, Bridgham JT, Redinbo MR, Thornton JW: Crystal structure of an ancient protein: evolution by conformational epistasis. Science. 2007, 317: 1544-1548.PubMedPubMed Central
- Nasrallah CA, Huelsenbeck JP: A phylogenetic model for the detection of epistatic interactions. Mol Biol Evol. 2013, 30: 2197-2208.PubMedPubMed Central
- Chandrasekaran SN, Yardimci G, Erdogan O, Roach JM, Carter CW: Statistical evaluation of the Rodin-Ohno hypothesis: Sense/antisense coding of ancestral Class I and II aminoacyl-tRNA synthetases. Mol Biol Evol. 2013, 30: 1588-1604.PubMedPubMed Central
- Susko E, Roger AJ: Problems with estimation of ancestral frequencies under stationary models. Syst Biol. 2013, 62: 330-338.PubMed
- Weinreb V, Li L, Chandrasekaran SN, Koehl P, Delarue M, Carter CW: Enhanced amino acid selection in fully-evolved tryptophanyl-tRNA synthetase, relative to its Urzyme, requires domain movement sensed by the D1 switch, a remote, dynamic packing motif. J Biol Chem. 2014, 289: 4367-4376.PubMedPubMed Central
- Retailleau P, Huang X, Yin Y, Hu M, Wieinreb V, Vachette P, Vonrhein C, Bricogne G, Roversi P, Ilyin V, Carter CW: Interconversion of ATP binding and conformational free energies by Trptophanyl-tRNA synthetase: a closed, pre-transition-state ATP complex at 2.2 Å resolution. J Mol Biol. 2003, 325: 39-63.PubMed
- Chuang W-J, Abeygunawardana C, Gittis AG, Pedersen PL, Mildvan AS: Solution Structure and Function in Trifluoroethanol of PP-50, an ATP-Binding Peptide from F1ATPase. Arch Biochem Biophys. 1992, 319: 110-122.
- Fry DC, Byler DM, Sisu H, Brown EM, Kuby SA, Mildvan AS: Solution structure of the 45-residue MgATP-binding peptide of adenylate kinase as examined by 2-D NMR, FTIR, and CD spectroscopy. Biochem. 1988, 27: 3588-3598.
- Fry DC, Kuby SA, Mildvan AS: NMR studies of the MgATP binding site of adenylate kinase and of a 45-residue peptide fragment of the enzyme. Biochem. 1985, 24: 4680-4694.
- Mullen GP, Shenbagamurthi P, Mildvan AS: Substrate and DNA binding to a 50-residue peptide fragment of DNA polymerase I. J Biol Chem. 1989, 264: 19637-19647.PubMed
- Mullen GP, Vaughn JB, Mildvan AS: Sequential proton NMR resonance assignments, circular dichroism, and structural properties of a 50-residue substrate-binding peptide from DNA polymerase I. Arch Biochem Biophys. 1993, 301: 174-183.PubMed
- Jimenez M, Williams T, González-Rivera AK, Li L, Erdogan O, Carter CWJ: Did Class 1 and Class 2 aminoacyl tRNA synthetases descend from genetically complimentary, catalytically active ATP-binding motifs?. Biophys J. 2014, In Press:14-A-4093-BPS
- Radzicka A, Wolfenden R: A proficient enzyme. Science. 1995, 267: 90-93.PubMed
- Schroeder GK, Wolfenden R: The rate enhancement produced by the ribosome: An improved model. Biochem. 2007, 46: 4037-4044.
- Wolfenden R, Snider MJ: The depth of chemical time and the power of enzymes as catalysts. Acc Chem Res. 2001, 34: 938-945.PubMed
- Wolfenden R: Benchmark reaction rates, the stability of biological molecules in water, and the evolution of catalytic power in enzymes. Ann Rev Biochem. 2011, 80: 645-667.PubMed
- Woese CR: The Universal ancestor. Proc Nat Acad Sci USA. 1998, 95: 6854-6859.PubMedPubMed Central
- Woese CR: On the origin of the genetic code. Proc Nat Acad Sci USA. 1965, 54: 1546-1552.PubMedPubMed Central
- Bedian V: Self-description and the origin of the genetic code. BioSystems. 2001, 60: 39-47.PubMed
- Popper K: The Logic of Scientific Discovery. 1959, Florence, KY: Routledge
- Sivia DS: Data Analysis: A Bayesian Tutorial. 1996, Oxford, UK: Clarendon Press
- Bridgham JT, Ortlund EA, Thornton JW: An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature. 2009, 461: 515-519.PubMed
- Bridgham JT, Carroll SM, Thornton JW: Evolution of Hormone-Receptor Complexity by Molecular Exploitation. Science. 2006, 312: 97-101.PubMed
- Dean AM, Thornton JW: Mechanistic approaches to the study of evolution: the functional synthesis. Nat Rev Gen. 2007, 8: 675-
- Thornton JW: Evolution of vertebrate steroid receptors from an ancestral estrogen receptor by ligand exploitation and serial genome expansions. Proc Natl Acad Sci U S A. 2001, 98: 5671-5676.PubMedPubMed Central
- Thornton JW, Need E, Crews D: Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling. Science. 2003, 301: 714-1717.
- Benner SA, Sassi SO, Gaucher EA: Molecular paleoscience: systems biology from the past. Adv Enzymol Relat Areas Mol Biol. 2007, 75: 9-140.
- Gaucher EA, Govindarajan S, Ganesh OK: Palaeotemperature trend for Precambrian life inferred from resurrected proteins. Nature. 2008, 451: 704-707.PubMed
- Liberles DA: Ancestral Sequence Reconstruction. 2007, Oxford: Oxford University Press
- Edwards AWF: Likelihood; Expanded Edition. 1972, Baltimore: Johns Hopkins University Press
- Danielson D, Graney CM: The case against Copernicus. Scient Am. 2013, 310: 72-77.
- Akst J: RNA World 2.0. The Scientist. 2014, 28: 34-40.
- Lincoln TA, Joyce GF: Self-sustained replication of an RNA enzyme. Science. 2009, 323: 1229-1232.PubMedPubMed Central
- Wochner A, Attwater J, Coulson A, Holliger P: Ribozyme-catalyzed transcription of an active ribozyme. Science. 2011, 332: 209-212.PubMed
- Yarus M, Widmann J, Knight R: RNA-amino acid binding: A stereochemical era for the genetic code. J Mol Evol. 2009, 69: 406-429.PubMed
- Kumar RK, Yarus M: RNA-catalyzed amino acid activation. Biochem. 2001, 40: 6998-7004.
- Niwa N, Yamagishi Y, Murakami H, Suga H: A flexizyme that selectively charges amino acids activated by a water-friendly leaving group. Bioorg Med Chem Lett. 2009, 19: 3892-3894.PubMed
- Carter CW, Kraut J: A proposed model for interaction of polypeptides with RNA. Proc Natl Acad Sci U S A. 1974, 71: 283-287.PubMedPubMed Central
- Carter CWJ: Cradles for molecular evolution. New Scientist. 1975, 27: 784-787.
- Dantas G, Kuhlman B, Callender D, Wong M, Baker D: A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. J Mol Biol. 2003, 332: 449-460.PubMed
- SAS: JMP Statistics and Graphics Guide. Book JMP Statistics and Graphics Guide (Editor ed.^eds.), V.6 edition. 2007, Cary NC: SAS Institute
- Chuang W-J, Abeygunawardana C, Pedersen PL, Mildvan AS: Two-dimensional NMR, circular dichroism, and fluorescence studies of PP-50, a synthetic ATP-binding peptide from the b-subunit of mitochondrial ATP synthase. Biochem. 1992, 31: 7915-7921.
- LéJohn HB, Cameron LE, Yang B, MacBeath G, Barker DS, Willams SA: Cloning and analysis of a constitutive heat shock (Cognate) protein 70 gene inducible by L-glutamine. J Biol Chem. 1994, 269: 4513-4522.PubMed
- LéJohn HB, Cameron LE, Yang B, Rennie SL: Molecular characterization of an NAD-specific glutamate dehydrogenase gene inducible by L-glutamine: Antisense gene pair arrangement with L-glutamine-inducible heat shock 70-like protein gene. J Biol Chem. 1994, 269: 4523-4531.PubMed
- Yang B, LéJohn HB: NADP + -activable, NAD + -specific glutamate dehydrogenase. Purification and immunological analysis. J Biol Chem. 1994, 269: 4506-4512.PubMed
- Stockbridge RB, Wolfenden R: The intrinsic reactivity of ATP and the catalytic proficiencies of kinases acting on glucose, N-acetylgalactosamine, and homoserine: a thermodynamic analysis. J Biol Chem. 2009, 284: 22747-22757.PubMedPubMed Central
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.