Temporal order of evolution of DNA replication systems inferred by comparison of cellular and viral DNA polymerases

Background The core enzymes of the DNA replication systems show striking diversity among cellular life forms and more so among viruses. In particular, and counter-intuitively, given the central role of DNA in all cells and the mechanistic uniformity of replication, the core enzymes of the replication systems of bacteria and archaea (as well as eukaryotes) are unrelated or extremely distantly related. Viruses and plasmids, in addition, possess at least two unique DNA replication systems, namely, the protein-primed and rolling circle modalities of replication. This unexpected diversity makes the origin and evolution of DNA replication systems a particularly challenging and intriguing problem in evolutionary biology. Results I propose a specific succession for the emergence of different DNA replication systems, drawing argument from the differences in their representation among viruses and other selfish replicating elements. In a striking pattern, the DNA replication systems of viruses infecting bacteria and eukaryotes are dominated by the archaeal-type B-family DNA polymerase (PolB) whereas the bacterial replicative DNA polymerase (PolC) is present only in a handful of bacteriophage genomes. There is no apparent mechanistic impediment to the involvement of the bacterial-type replication machinery in viral DNA replication. Therefore, I hypothesize that the observed, markedly unequal distribution of the replicative DNA polymerases among the known cellular and viral replication systems has a historical explanation. I propose that, among the two types of DNA replication machineries that are found in extant life forms, the archaeal-type, PolB-based system evolved first and had already given rise to a variety of diverse viruses and other selfish elements before the advent of the bacterial, PolC-based machinery. Conceivably, at that stage of evolution, the niches for DNA-viral reproduction have been already filled with viruses replicating with the help of the archaeal system, and viruses with the bacterial system never took off. I further suggest that the two other systems of DNA replication, the rolling circle mechanism and the protein-primed mechanism, which are represented in diverse selfish elements, also evolved prior to the emergence of the bacterial replication system. This hypothesis is compatible with the distinct structural affinities of PolB, which has the palm-domain fold shared with reverse transcriptases and RNA-dependent RNA polymerases, and PolC that has a distinct, unrelated nucleotidyltransferase fold. I propose that PolB is a descendant of polymerases that were involved in the replication of genetic elements in the RNA-protein world, prior to the emergence of DNA replication. By contrast, PolC might have evolved from an ancient non-templated polymerase, e.g., polyA polymerase. The proposed temporal succession of the evolving DNA replication systems does not depend on the specific scenario adopted for the evolution of cells and viruses, i.e., whether viruses are derived from cells or virus-like elements are thought to originate from a primordial gene pool. However, arguments are presented in favor of the latter scenario as the most parsimonious explanation of the evolution of DNA replication systems. Conclusion Comparative analysis of the diversity of genomic strategies and organizations of viruses and cellular life forms has the potential to open windows into the deep past of life's evolution, especially, with the regard to the origin of genome replication systems. When complemented with information on the evolution of the relevant protein folds, this comparative approach can yield credible scenarios for very early steps of evolution that otherwise appear to be out of reach. Reviewers Eric Bapteste, Patrick Forterre, and Mark Ragan.


Open peer review
This article was reviewed by Eric Bapteste, Patrick Forterre, and Mark Ragan.
For the full reviews, please go to the Reviewers' comments section.

Background
DNA replication is central to the reproduction of all cellular life forms and many viruses. Indeed, inasmuch as accurate DNA replication is strictly required for the faithful transmission of the information stored in the genomes of all known cellular life forms, it can be legitimately viewed as the quintessential biological process, the crucial manifestation of the proverbial double helix. Furthermore, mechanistically, the DNA replication processes in all cells, indeed, appear to be very similar [1]. Thus, it came as an extraordinary surprise when comparative genomics ushered in the realization that the protein components of DNA replication systems are not at all universally conserved [2][3][4][5], in a sharp contrast to the core parts of the translation and transcription systems that are, indeed, shared by all cellular life [6,7]. Notably, this dramatic disparity of DNA replication systems has been predicted in the seminal early work of Woese and Fox in the context of their concept of the Last Universal Common Ancestor (LUCA) of modern cellular life forms as a primitive entity, the progenote [8].
The DNA replication systems of bacteria, on the one hand, and archaea and eukaryotes, on the other hand, are a peculiar mix of conserved and unrelated proteins. Notably, the central parts of the machinery, in particular, the polymerases that are responsible for DNA chain elongation, primer formation, and gap filling after primer removal, and replicative helicases, are either unrelated or distantly related and are thought to derive independently from proteins with other functions [5,9]. The conclusion that the main replicative polymerases and primases of archaea (and eukaryotes) and bacteria are unrelated was, originally, reached through exhaustive protein sequence analysis [5]. Subsequently, this conclusion received crucial support from the solution of the crystal structures of the bacterial and archaeal primases [10][11][12][13] and replicative polymerases [14][15][16][17]. The comparison of the respective structures unequivocally demonstrated that they, indeed, have unrelated folds. The distinction between the archaeal and bacterial DNA replication systems is additionally emphasized by the discovery of a unique DNA polymerase that is involved in replication in euryarchaea [18,19]. However, several ancillary components, such as the sliding clamp (the proliferating cell nuclear antigen, PCNA, and its homologs), the clamp loader ATPase, and RNAse H are represented by well-conserved orthologs in bacteria and archaea (eukaryotes) [5].
This unexpected divergence of cellular DNA replication systems is, in principle, compatible with at least three distinct evolutionary scenarios each of which takes as the focal point the nature of the replication system that is inferred for the LUCA [5,20]. These three views of LUCA's genome replication go as follows: i) LUCA had no DNA replication per se but instead had a retrovirus-like replication cycle, with segments of genomic RNA reverse-transcribed into a DNA provirus, which is transcribed back into RNA; the existence of a DNA stage would explain the conservation of some proteins involved in DNA replication [5], ii) LUCA had one of the two of the modern types of DNA replication systems, either (proto)archaeal or (proto)bacterial; subsequent non-orthologous gene displacement of the key components in one of the primary lines of descent, possibly, via a virus vector, resulted in the current dichotomy [4,21], and iii) LUCA had both DNA replication systems (with one, possibly, involved in repair), with subsequent differential loss of the central components in the respective common ancestors of archaea and bacteria [2,4,21].
The notion of a possible contribution of DNA-containing viruses to the evolution of the DNA replication systems of cellular life forms has been presented in a series of publications by Forterre [21][22][23][24]. Recently, this line of thought has been further developed in two more general treatises each of which emphasized the integral connection between the evolutionary histories of cells and selfish genetic elements. The first of these studies posited that LUCA was a cell with an RNA genome, and the transition to the modern-type DNA replication system occurred after the divergence of the three primary lines of descent of cellular life forms, the progenitors of bacteria, archaea, and eukaryotes [25]. The second study laid out the argument for a non-cellular, although complex and compartmentalized LUCA, envisaged as a stage of evolution at which the progenitors of the main lineages of extant viruses already coexisted with elements that gave rise to bacterial and archaeal genomes [26]. Here I employ comparative genomics of viruses and cellular life forms to address a specific aspect of the evolution of DNA replication systems, namely, the temporal order of their emergence, and discuss the conclusions in conjunction with the general views on the nature of LUCA.

The hypothesis: inferring the temporal order of the origin of DNA replication systems from comparison of viral and cellular genomes
Viruses possess a remarkable collection of diverse genome replication and expression strategies, in a sharp opposition to the uniformity of the cellular genetic cycle [27,28]. Since the subject of this article is origin and evolution of DNA replication systems, I concern myself only with those viruses that possess DNA genomes; most, though not all, of these viruses encode their own core replication proteins. There are three basic types of viral DNA replication, one of which is, essentially, the same as the replication mode of cellular life forms (some interesting variations in viruses notwithstanding) whereas the remaining two -the protein-primed and the rolling circle replication (RCR) systems -seem to be unique to viruses and other selfish genetic elements (Table 1). A remarkable aspect of viral DNA replication that, to my knowledge, has never been interpreted in evolutionary terms, is that the vast majority of viruses with the "cell-like", RNA-primed, and terminal-protein-primed replication strategies encode the archaeal-type B-family DNA polymerase (hereinafter PolB) [29,30]. A series of exhaustive, iterative PSI-BLAST [31,32] searches of the viral subset of the nonredundant protein sequence database (NCBI, NIH, Bethesda) with bacterial PolC sequences as queries yielded only 8 bacteriophage PolC homologs (Table 1), in a sharp contrast to the thousands of viruses that possess PolB homologs (Table 1 and data not shown). The remaining viruses either have the A-family polymerase (hereinafter PolA) that performs, mostly, repair-related functions in bacteria (Table 1) or no DNA polymerase at all. No virus-encoded homologs of the unique euryarchaeal polymerase (hereinafter PolD) were identified. The other genes involved in viral DNA replication are a complex mix of homologs of bacterial and archaeal replicative proteins and virus-specific proteins [26,33,34] but the decidedly non-uniform distribution of polymerases is striking.
There is hardly any mechanistic impediment to the involvement of PolC in viral DNA replication as evidenced, in particular, by the presence of the gene for the PolC homolog in 8 phage genomes (Table 1). Additionally, a variety of bacteriophages, such as temperate phages of the family Siphoviridae, successfully recruit bacterial PolC for their replication but not the polC gene to their genomes. Furthermore, a direct comparison of the catalytic efficiencies of the three DNA polymerases of E. coli, PolA, PolB, and PolC, shows that PolC has a much greater turnover rate than the other two, i.e., is a substantially more efficient enzyme ( [35,36]; and see Table 5-I in [1]). Thus, inasmuch as mechanistic causes for the dominance of PolB among viruses and the near absence of PolC are unlikely to exist, I propose a historical explanation (Fig.  1). I hypothesize that PolB is the most ancient replicative DNA polymerase and, accordingly, the archaeal-type DNA replication system centered around this polymerase was the first to have evolved among the two known cellular replication systems. Moreover, there was a time interval after the emergence of the PolB-centered, archaeal-type DNA replication system and before the advent of the bacterial, PolC-centered one, during which several lineages of selfish genetic elements with diverse life styles have emerged. In particular, the divergence between the RNAprimed and protein-primed branches of the PolB family of polymerases, each of which spans a broad range of viruses and other selfish elements [30,37], can be confidently assigned to this early stage of evolution. Perhaps, along with RCR elements, which also display remarkable diversity [38] and are likely to be of ancient origin, these viruses and virus-like entities have occupied the major existing biological niches and thus prevented any significant diversification of selfish elements carrying PolC. The presence of PolC in several bacteriophages (Table 1) might be the result of relatively late non-orthologous gene displacement, a phenomenon that seems to have occurred on several occasions during the evolution of DNA polymerases [30]. Indeed, the phage PolC sequences did not appear to be closely related to each other but instead showed the closest similarity to different bacterial polymerases (data not shown).

Complications and caveats
Several compounding factors merit consideration in connection with this hypothesis. Firstly, and probably, most importantly, the current sampling of the "virosphere" is obviously incomplete. Only four major bacterial lineages (Proteobacteria, Cyanoabacteria, low-GC Gram-positive bacteria), two lineages of archaea (Sulfolobales and Halobacteria), and animals among the eukaryotes (as far as DNA viruses with large genomes are concerned) have been extensively sampled by viral genomics; there are only a few sequenced genomes of viruses infecting organisms outside these taxa. However, despite this limited sampling, the diversity of viruses with sequenced genomes is substantial by any criterion, be it replication strategy, genome size, gene repertoire, or virion structure. Therefore, it seems unlikely (although, certainly, not impossible) that sequencing of viruses from other lineages will radically change the distribution of DNA polymerases among viruses by revealing a dominant presence of PolC. Interestingly, in a recent study on viral metagenomics, a claim has been made that PolC is one of the dominant viral enzymes in three distant and diverse habitats [39]. However, examination of the lists of other enzymes that appeared to dominate these "viromes" indicates that these are uncharacteristic of viruses and, at lest in some case, unlikely to be present in a virus given their well-characterized functions (see the Author Response to Forterre below for additional details). Thus, these metagenomic results, mostly likely, reflect contamination of the analyzed viral samples with bacterial DNA and do not point to hidden diversity of viruses replicated by PolC appears unlikely.
The second complication for the present hypothesis is that many DNA viruses of archaea and bacteria encode no DNA polymerase of their own and employ the respective host enzymes. Among bacteriophages, the forms with and without virus-encoded replicative enzymes are interspersed within the same viral families which is best compatible with the latter having been derived by degenerative evolution. However, the case of the viruses of hyperthermophilic crenarchaea is truly mysterious. None of these viruses encode their own polymerase, and mostly, they do not possess any other viral hallmark genes either, with the exception of a single group that has the widespread icosahedral capsid protein [40,41]. The provenance of these viruses remains unclear: they might be ultimate derivatives of the virus world that have lost all its hallmarks (the more likely possibility in the context of the virus world concept [26]), or else, they might have evolved anew via assembly of genes derived from the host. Whichever of these scenarios turns out to be correct, these viruses do not possess PolC but, instead, are replicated by the host PolB. Thus, their unique gene repertoire might pose a challenge to the virus world concept but hardly undermines the present hypothesis.
The third problem is the relevance (or lack thereof) of the eukaryotic DNA viruses, which account for a good part of the overall viral diversity, and in particular, the preponderance of PolB in the replication systems (Table 1), for the problem of the ultimate origins of those systems. Indeed, origin of eukaryotes via the archaeal-bacterial symbiosis which, I believe, is, by far, the most likely scenario [42,43], implies that eukaryotic viruses are much younger than the viruses of archaea and bacteria. How-ever, that does not automatically mean that the gene composition of eukaryotic viruses tell us nothing about the earliest stages of the evolution of genome replication. Indeed, considering the accumulating evidence that sampling of genes from bacterial and archaeal viruses was the primary route of origin of eukaryotic viruses [26], the gene repertoire of eukaryotic viruses would reflect the composition of the gene pool of archaeal and bacterial viruses at the time of eukaryogenesis, perhaps, ~2 billion years ago. Hence, the predominance of PolB in eukaryotic viruses suggests that the this was the primary viral DNA polymerase at that stage of evolution.
Finally, from the most general standpoint, the approach employed here is an extension of the traditional logic of the argument from diversity that is common, e.g., in phylogeography. Under this view, the area with the greatest diversity of representatives from a given taxon is considered to be the birthplace of the group (e.g., [44]). This is, essentially, a parsimony-type argument that might fail under special circumstances, such as a sweep of the entire habitat by a particularly fit form leading to the obliteration of the ancestral diversity and followed by a new diversification. Applied to the evolution of the replication systems, this would translate into the sweep of the virus world by PolB via extensive horizontal gene transfer (HGT), at a relatively late stage of evolution. It has been demonstrated that HGT is common in the evolution of DNA polymerases including the B family [30]. However, as discussed above, it is hard to think of a selective advantage of PolB that would trigger a massive sweep. The alter- Viral taxonomy was the from the NCBI Taxonomy site [58] and is based on the 7 th report of the International Committee on Taxonomy of Viruses [59]. a NCLDV, nucleo-cytoplasmic large DNA viruses (poxviridae, asfraviridae, ascoviridae, iridoviridae, phycodnaviridae, mimivirus) [33].
native possibility of a major bottleneck in the evolution of viruses followed by a non-selective takeover by PolB is not supported by any concrete evidence either. Thus, although it is impossible to formally rule out the possibility of a PolB sweep, this scenario appears unlikely.

Support from the evolutionary relationships between DNA and RNA polymerases
The order of emergence of the replication systems proposed here seems to get support from the homologous relationships between DNA and RNA polymerases inferred from structural and sequence comparisons. The catalytic domain of PolB has the widespread palm-andfingers fold [14,45] various modifications of which are also found in PolA [46] and in RNA-dependent RNA polymerases (RdRp) of RNA viruses and reverse transcriptases [47]. Notably, the key protein of rolling circle replication, the initiation endonuclease (RCRE), has a derived form of the same fold [48,49]. By contrast, the core domain of PolC [16,17] belongs to the unrelated fold of the polβ family that includes a variety of non-replicative nucleotidyltransferases, such as polyA polymerases [50]. The prevailing current scenario for the early evolution of life has DNA replication evolving from within a RNA-protein world where only RNA replication occurred, with reverse transcription being a likely intermediate stage of evolution [5,23,25,26]. Under this scenario, it appears most likely that PolB, PolA, and RCRE evolved from the ancient replicative enzymes (RdRp or, more likely, reverse transcriptase). In contrast, the ancestor of PolC, probably, originated as a non-specific, non-replicative polymerase, such as a polyA polymerase, and was recruited for the bacterial-type replication system at a later stage of evolution ( Figure 1).

Discussion and Conclusion
The hypothesis on the temporal order of the emergence of DNA replication systems proposed here is drawn directly from the data on the remarkably non-uniform distribution of DNA polymerases among viruses and virus-like The inferred temporal order of evolution of DNA replication systems Figure 1 The inferred temporal order of evolution of DNA replication systems. elements and, accordingly, is not tightly linked to any specific model of the origin of cells and viruses. It is, nevertheless, interesting to consider how this hypothesis plays out in the context of two classes of such models. The first view which, conceivably, represents the orthodoxy, holds that the main classes of viruses emerged from already formed cells, probably, at early stages of evolution. Under this model, the present hypothesis implies that LUCA had the archaeal-type system of DNA replication, whereas the displacement of this ancestral system in bacteria, possibly, mediated by a virus [21,24], was a relatively late event ( Figure 2).

RNA-protein world
The alternative scenario [26] derives both virus-like elements and cells directly from a primordial gene pool (Figure 3). Under this view, LUCA did not have a cellular organization at all but instead consisted of a population of genetic elements that replicated and expressed proteins within networks of inorganic compartments [51,52]. This model stems from the lack of homology between the core components of the DNA replication systems and membrane biogenesis pathways in archaea and bacteria [5,53]. Accordingly, it is proposed that proto-archaeal and protobacterial cells escaped from these networks independently, following the evolution of the corresponding distinct versions of the membrane biogenesis machinery [51,52]. In conjunction with this model, the concept of the ancient virus world has been recently developed, according to which the major classes of viruses (more precisely, virus-like elements inasmuch as a pre-cellular stage of evolution is concerned) evolved already in the primordial gene pool, and distinct complements of viruses were captured by the escaping proto-archaeal and proto-bacterial cells [26]. Under this scenario, the present hypothesis implies that genetic elements encoding PolB as well as those with the RCR mode of replication have evolved considerable diversity prior to the emergence of cells. By contrast, the bacterial replication system was "invented" later and was recruited by a very limited range of bacteriophages, possibly, at much later stages of evolution. The evolutionary status of the PolA-based replication system, which is found in a limited range of bacteriophages (Table  1) and is centered around a polymerase that is involved in gap-filling during replication and in repair in all bacteria[1], is less clear. A progenitor of the PolA-replicated phages might have evolved already at an early, perhaps, pre-cellular stage of evolution ( Figure 3) but, alternatively, it is hard to rule out that the presence of PolA in some phages is the result of a relatively late non-orthologous gene displacement.
It might not be possible to come up with decisive arguments rejecting one of the above scenarios, at least, at present. However, as already argued in some detail elsewhere [26], the primordial pool hypothesis ( Figure 3) is simple, connects the origin of viruses and cells into one coherent scenario that is also linked to earlier stages of life's evolution, and seems to be best compatible with several lines of evidence. Perhaps, the most compelling of these is the existence of several "viral hallmark genes" that are shared by numerous, extremely diverse groups of viruses but are not found in any sequenced genomes of cellular life forms [26]. The primordial gene pool appears to be the natural source of the hallmark genes. Logically, the evolutionary succession of the DNA replication systems that is inferred here from comparative-genomic evidence also seems to be better compatible with this scenario for the early evolution of life. Indeed, under this scenario, the origin of the bacterial replication system is seen as evolution of a genetic element with a novel DNA polymerase that had limited success in an environment already inhabited by numerous elements with other, older replication systems, and gave rise to a single surviving line of descent, the bacteria. By contrast, the "protoarchaeal-LUCA" scenario ( Figure 2) includes an additional, non-trivial step, the displacement of the ancestral replication with a new one in bacteria. This step appears to be all the less likely considering the paucity of PolCbased replication systems among modern viruses (Table  1) and their probable absence from ancient viruses: a virus to displace the archaeal-type replication system in the LUCA might not have been readily available. The primordial gene set scenario faces its own difficulties that, primarily, have to do with the conservation of several key membrane proteins in archaea and bacteria [54]. Ideas on the possibility of the evolution of such proteins in the context of intermediate stages of membrane evolution have been proposed [51,52,55] but remain to be developed into a coherent scheme. It should be noticed that, under this scenario, the emergence and diversification of PolB-base replication systems prior to the recruitment of PolC for the replicative function does not necessarily imply that proto-archaeal cells escaped from the networks of inorganic compartments earlier than proto-bacterial cells. Indeed, the capture of replication systems by emerging cells and their subsequent escape are likely to be uncoupled from the evolution of diverse replicons that might have reached considerable complexity during the pre-cellular stage of life's history.
The recent progress in comparative genomics of viruses has triggered a number of conceptual endeavors into the crucial links between origins and evolution of cells and viruses [23][24][25][26]56,57]. The (sometimes substantial) differences in the proposed evolutionary scenarios notwithstanding, these studies converge on the notion that "viruses take center stage in cellular evolution" [57]. The analysis presented here follows along the same lines by showing that a joint survey of viral and cellular genomes, in this particular case, for the presence of different enzymes of DNA replication, complemented by the comparative analysis of the respective protein folds, allows one to propose a provisional order for ancient evolutionary events (in this case, the origin of the archaeal-type and bacterial-type DNA replication systems) that otherwise appeared to be undecipherable. The hypothesis will be falsified if and when multiple and diverse groups of viruses are discovered that use bacterial PolC for their DNA replication; as discussed above, there is. presently, no indication of the existence of such a hidden continent of the virus world.

Reviewers' comments
Reviewer's report 1 Eric Bapteste, Dalhousie University Eugene V. Koonin's contributions to the field of evolutionary biology have been numerous although quite dif-ferent in nature. This author (and his team) have provided rigorous scientific explanations for biological phenomena as well as contributed more prospective works, which point out important issues rather than propose robust answers. Such prospective works are important because they can indicate future directions for biological research. I consider the present manuscript to be of this second kind and to be meant to provide us with a hint of deeper evolution analyses still to come.
That is to say the reader should not consider the present manuscript as the last word on the question of the temporal order of evolution of the DNA replication systems, but as a good opportunity to think about it again. There could be and likely will be more to be said on this issue. If he wanted, and I think he could (thus maybe should), Eugene Koonin himself could contribute further and Origin of DNA replication systems: the "archaeal LUCA" scenario Before I suggest some of the additional studies that could help test and maybe strengthen E. Koonin's current claim, I would like to stress an interesting perspective this paper could contribute to put forward. I take it to be the general and quite elegant idea, that "comparisons of viral genomes with the genomes of cellular life forms might provide windows into the deep past of life's evolution". This mine of genetic information is indeed not systematically explored in evolutionary analyses of cellular life forms, although, because it broadens the portion of the metagenome investigated, it would be certainly capable of highlighting the dynamics of cellular genome evolution. However, presenting a strong case for the use of the phyl-ogenetic information stored in the DNA of viral communities remains challenging. In this regard, I am still unconvinced by the robustness of the claim of the present manuscript, although the temporal order of evolution of DNA replication systems presented here might be absolutely correct.
In this paper, Koonin presents a striking observation regarding the distribution of polymerases in viruses: the archaeal type (polB) is almost ubiquitous, the bacterial type (polC) is very rare. He thus legitimately looks for an explanation of this fact.
Several possibilities could a priori be considered: Origin of DNA replication systems: the primordial gene pool scenario Figure 3 Origin of DNA replication systems: the primordial gene pool scenario. The schematic is based on models of pre-cellular evolution discussed in [21,38]. The walls of the inorganic compartments are shown by dotted lines to emphasize their porosity. Double-headed arrows denote inter-compartmental horizontal gene transfer. C denotes a hypothetical precursor of PolC, probably, a non-templated polymerase (see text). Other designations are as in Figure 2.

RNA-protein world
Non-cellular LUCA (i) PolB is more broadly distributed because its fitness is better than polC fitness, and it invades viral genomes more efficiently: polB then replaces polC. One could not exclude that polB is a succesfull newcomer.
(ii) PolC is a newcomer in viral genomes, which was never able to successfully replace the efficient polB because polB is ancient and sucessful.
(iii) The present distribution does not really tell us much about which of these two polymerases appeared first, because we lack essential knowledge regarding the dynamic and mode of inheritance of polymerases within viral genomes. PolB and polC, in that regard, might well be equally ancient (and their present, highly unbalanced distribution could reflect the stochastic result of a very ancient competition between these two forms in the smallest first LUCAn population). Agnosticism is a scienfitic answer too.
In my view, a phylogeny of the viral polB to decide between these three options is currently lacking. Phylogenetic analysis of polB could test the presence of signs of high mobility and a tendency to spread across viral genomes. I would encourage Eugene Koonin to build such a phylogenetic tree of viral polB and to comment on it in a revised version of the manuscript. Furthermore, it would be interesting to test the congruence between the phylogeny of this marker and those of the other components of the replication systems. Congruence within the latter trees but disagreement with the former would suggest that polB has a tendency to be highly mobile, and capable of replacing native polymerases. The manuscript could also contain a more in depth biochemical presentation of PolB and PolC (structures, length, stability properties, etc.), which would help in the discussion of which polymerases has a higher chance of being carried/recruited by viral mosaic genomes, because some slight physico-chemical differences could not explain the predominance of one marker versus the other.

Author response: I strongly believe there is no reason to think PolB is intrinsically "better" than PolC. Additional information and references on the catalytic efficiency of each enzyme family have been included in the revision.
Finally, in all naivety, I am curious to know if some viral genomes investigated by Eugene Koonin happened to lack any of the currently known polymerases. If yes, does that mean that there are alternate replication systems, which could put the problem considered here in a different perspective by putting more attention on intermediary stages in the evolution of replication system, where neither polB nor polC were the decisive elements: after all in the eggor-chicken-first controversy, the answer is a third term...Is it conceivable that there are even more polymerases (or proteins playing their role in an older replication system) to be discovered? The restricted distribution of a particular product of life evolution does not imply per se a more recent origin. For instance, when Neanderthals were still present in a corner of southern Europe, it would have been wrong (in the absence of fossil record) to conclude that Homo sapiens sapiens appeared first and occupied the major existing biological niches, thus prevented any significant diversification of more recent Homo sapiens neanderthalis! A protein with a restricted distribution thus could simply be an ancestral enzyme which has been later on displaced by a more successful functional analogue in the majority of lineages. It can be also a protein which was once widely distributed, but mainly in lineages without descendent today. The distribution of various proteins should be influenced by many factors, including their respective contribution to the fitness of the organisms in which they operate and the respective evolutionary success of the organisms bearing these proteins (two factors that can be either related or completely independent). Coming back to DNA polymerases, the occurrence of Pol B and C in the three cellular domains cannot tell us much about their history, since we don't know the relative order of appearance of these domains (for instance, even if we were sure of the rooting of the universal tree in the "bacterial branch" the last common ancestor of bacteria might have existed either before or after the last common ancestors of the other two A recent metagenomic analysis by Rohwer and colleagues (Angly et al. 2006) even suggests that Pol C might be in fact even more widespread than Pol B in viruses infecting bacteria. Indeed, these authors found polC genes among the five most abundant enzyme-coding genes in three out of four oceanic viral metagenomes analyzed, whereas they never recovered genes encoding Pol B! Interestingly, they only recover genes encoding the alpha subunit of Pol C, suggesting that in viruses, this enzyme can be very processive alone. The existence of a large reservoir of polC genes in viruses supports the hypothesis that this bacterial replicase was initially recruited from a virus before the diversification of the bacterial domain (Forterre, 1999). Beside the sampling bias, one cannot exclude the historical bias. Viruses encoding Pol C polymerases might have existed for a very long time (predating or not those with PolB) but most of them might have disappeared with their hosts (except those infecting bacteria). The concept of lost lineages is presently neglected by Koonin and others evolutionists who base their argumentation on the principle of parsimony (see discussion with the reviewers in . Koonin put forward several time the argument of simplicity in favor of his hypotheses, he tell us that his general view of life history is attractively simple. I think that the path of history is precisely never so simple but usually extremely rich and complex. I would suggest that Koonin and others with similar views are using in fact an extreme form of "actualism", i.e. they want to explain all life evolution from its very beginning to the present state by only considering modern molecules and organisms (either cells or viruses). The combination of archaea and bacteria to produce eucaryotes is characteristic of this viewpoint. Incidentally, if Eukaryotes indeed originated from the association of a bacterium and an archaeon (as supported by Koonin)  However, the use of actualism in historical sciences is always delicate! In my opinion, the correct use of actualism in the present situation is to consider that known evolutionary patterns that occurred "recently" in the history of modern species also occurred much earlier in early life evolution. For example, we know that many lineages have been extinct during the evolution leading from the first animals to the modern fauna. Similarly, many Homo species have disappeared during the evolution leading from the ancestor of all Homo species to modern Homo sapiens. We can thus suppose, from the principle of actualism, that many cellular lineages also disappeared both during the evolution from the first cells to LUCA or from LUCA to the modern cellular world. In my opinion, an example of such lineages was probably the proto-eukaryotic lineages (urkaryote, sensu Woese), which were subsequently eliminated by modern eukaryotes harboring mitochondria (Kurland et al., 2006). In summary, I think that we cannot presently really define the order of appearance of Pol B and Pol C because we have still not enough data to correctly estimate the sample bias. Furthermore, our answer to this question will remain speculative forever since we will never be able to fully reconstruct the history of lost lineages.
Author response: It is hard to deny that an element of speculation will remain in inferences on very early evolutionary events, perhaps, "forever". Nevertheless, as already argued, I believe that the time is ripe to start seriously considering such scenarios, of course, not forgetting that corrections, perhaps, substantial ones will be required once we have more complete data.

It is my hope that the new section on Complications and Caveats makes this clear.
The notion of lost lineages explains several puzzling observations, including the existence of hallmark viral genes of the combination of "orthologous" and non homologous proteins in the core of the DNA replication apparatus. Koonin and his colleagues explain this combination by the presence of DNA (but no DNA replication) in LUCA. Alternatively, I have suggested that the homologous DNA replication proteins present in the universal protein set were not present in LUCA but delivered by two or three different viruses at the onset of the three domains (Forterre, 2006b).
A final argument proposed by Koonin to suggest that Pol B appeared before Pol C is that the superfamily including Pol B includes the reverse transcriptase and cellular RNA polymerases (likely direct ancestor of cellular DNA polymerases) whereas the superfamily including Pol C includes nucleotidyl transferases (Bailey et al., 2006). However, the superfamily that includes Pol C also includes, beside Pol X and PolE, the PolyA polymerases (template independent RNA polymerase) and CCA adding enzymes (maturation of tRNA). The two superfamilies thus appear to be very old and were probably already diversified in the RNA world (with possibly some yet unknown RNA polymerases in both). Several DNA polymerases have probably originated independently in these two superfamilies from enzymes used in the RNA world. Although Koonin is still one of the rare evolutionists who fully recognizes the role played by viruses in early evolution (even suggesting that they originated before cells!), I have the feeling that he remains somewhat biased in this paper toward a cellular view of the world. Furthermore, this view is itself strongly biased toward his favourite scenario of cellular evolution. For instance, he divides the DNA replication mechanisms based on either Pol A or Pol C in two families, the "Archaeal-type B family DNA polymerase (either viral or cellular)" and the "Bacterial-type C family DNA polymerase" (especially in Table 1). In my opinion, these are not good expressions. The eukaryotic DNA replication mechanisms should not be labelled "Archaeal-type", since they include topoisomerases (Topo IB, Topo IIA) which have no orthologues in Archaea (Gadelle et al., 2003), several Pol B very distantly related to archaeal ones (Filée et al., 2002.) and several proteins involved in the initiation step which have no homologues in Archaea.
Why use the term Archaeal-type instead of eukaryotic type? This came clearly from a gradist view of evolution supported by Koonin and others in which eukaryotes derived from prokaryotes. Another consequence of the emphasis given to the procaryote/eukaryote dichotomy can be see in the expression "prokaryotic and eukaryotic viruses" used in the abstract). This formulation mixes archaeal and bacterial viruses (bacteriophages). As a consequence, archaeal viruses (grouped with bacteriophages) are presented under the headline "bacterial viruses" in the last edition of the viral taxonomy handbook (Fauquet et al., 2005). I suggest to Koonin and others to replace such expression by "viruses infecting archaea, bacteria and eukaryotes".
In the same vein, I found misleading to characterize a viral system by a cellular one and to talk about "viruses replicating with the help of the archaeal system". This reinforces the old conception in which viruses derived their proteins from modern cells. Also strange for me is the sentence "distinct complements of viruses were captured by escaping archaeal and bacterial cells". This formulation gives the active role to the cells, whereas it should be given to the viruses. Here is very helpful the very important advice of Jean-Michel Claverie to focus on the viral factory rather than on the virion, forcing us to consider that viruses are real living organisms which, beside cells, also have an active role in life evolution (Claverie, 2006).
Author response: I edited the paper, modifying the wording in places, to avoid the impression that I stick to the old concept under which "viruses derived their proteins from modern cells"; obviously, I do not support this view (although some of them, like the NCLDV, indeed have derived a whole lot). However, I do not quite agree with the criticism of the phrases such as "prokaryotic and eukaryotic viruses", "viruses replicating with the help of the archaeal system". Description of these replication systems by the name of the respective cellular domain is succinct and unequivocal, and does not at all imply "primacy" of cells in evolution (of course, Forterre and I fully agree that there is not such primacy -see Refs. [24][25][26]). Furthermore, I believe that the distinction between the viruses of prokaryotes (archaea and bacteria) and those of eukaryotes is meaningful inasmuch as the entirely real and substantial differences in cellular organizations between prokaryotes and eukaryotes affects many aspects of virus-cell interaction. Of course, this is an issue of biology (life style) not of taxonomy, and the position of the ICTV that lumps archaeal viruses with bacteriophages in the current taxonomic scheme is disingenuous. Finally, I should note that the notion of the derivation of eukaryotes from prokaryotes (via endosymbiosis) has nothing to do with gradism or any other pre-conceived "ism". From my point of view, it is, simply, the most economical explanation for the origin of the eukaryotic cell we can think of today. Of course, one can be swayed by philosophical pre-conceptions unwittingly and subconsciously but I strongly doubt it is the case for this particular conundrum.
Finally, I have one historical remark. In the Background section, lane 6. Koonin writes that "it came as an extraordinary surprise when comparative genomics ushered in the realization that the protein components of the DNA replication systems are not universally conserved".
Interestingly, this was in fact predicted by Carl Woese and George Fox in 1977 in their very important paper on "The concept of cellular evolution". These two authors quote that "certain enzymes involved in DNA replication should appear quite dissimilar in the two cases (eukaryotes and bacteria) because they predicted that their common ancestor (the progenote) was still a member of the RNA world. The big divide between the two systems became apparent as soon as the genes encoding E. coli DNA Pol III and eukaryotic DNA Pol α, δ or ε became available (see Forterre et al., 1994). However, it is true is that the problem remained largely ignored until the Mushegian, Koonin's PNAS paper of 1996.
Author response: I am pleased to restore the historical precision. The text has been modified accordingly, and both  and Forterre et al. (1994)