Prof. Patrick Forterre Institut Pasteur
Several essential proteins involved in eukaryotic DNA replication, such as the initiator protein Cdc6 (and ORC subunits), helicase MCM subunits, the GINS complex, or DNA primases, have closely related homologues (possibly orthologs) in Archaea. From this observation, it seems logical to conclude that the DNA replication machineries in Archaea and Eukarya derived from an already well elaborated DNA replication machinery present in the DNA-based ancestor of Archaea and Eukarya. In such scenario, homologous DNA replication proteins in these two domains are true orthologs. Going one step further, it is often conclude that all DNA replication proteins performing similar function in these two domains SHOULD be orthologs, even if they only exhibit structural similarity, having extensively diverged in term of primary sequence. In this paper, Makarova, Koonin and Kelman push this reasoning to the limit by assuming that archaeal proteins that only share similar (but very divergent) domains are finally also orthologs. They conclude that these proteins (homologs of bacterial RecJ), share in fact a most recent ancestor with the eukaryotic protein CDC45, but, for unknown reason, have extensively diverged from the eukaryotic protein, and continued to do so during the diversification of the archaeal domain. Since one of these archaeal proteins, GAN, has been shown to associate with the archaeal GINS complex in Thermococcus kodakaraensis, they conclude that a putative GINS/"CDC45/GAN"/MCM complex in Archaea is orthologous to the GINS/CDC45/MCM complex in Eukarya. This is possible. However, there are other possibilities that are not discussed in this paper. To explain why several features of the DNA replication machineries are strikingly different in Archaea and Bacteria (such as the absence of type IIA DNA and Type IB topoisomerases or of DNA polymerase alpha in Archaea, or else the absence of "archaeal" DNA polymerase D in Eukarya), I suggested a few years ago that the DNA replication machineries in Archaea and Eukarya are in fact not orthologs, but were built independently in these two domains from both homologous and non homologous proteins recruited from different DNA viruses encoding their own replication machineries . From that time, a type IB DNA topoisomerase has been finally found in thaumarchaea , but the problems raised by Topo IIA or DNA polymerase alpha remain. In my 2006 paper, I suggested an RNA-based ancestor of Archaea and Eukarya. There are also intermediary scenarios, for instance, Archaea and Eukarya could have derived from a DNA-based ancestor, but many ancestral DNA replication proteins can have been replaced by viral ones or new ones can have been introduced later on by viruses independently either in the lineages leading to Archaea or Eukarya, or during the diversification of these two domains. Unfortunately, in that paper, the authors don't recognize the important role that DNA viruses probably played in the evolution of the DNA replication apparatus. For instance, p5, when they said that "multiple MCM paralogs have been identified in archaeal species". In fact, these MCM proteins are not paralogs (they don't originated from gene duplication in cellular genomes) but they have been introduced in archaeal genomes by viral integration .
Authors' response: The work of Krupovic et al.  and that of Chia et al.  clearly demonstrate that MCM genes have been independently duplicated in several archaeal lineages. Many of the MCM genes are indeed associated with mobile elements but in the phylogenetic trees published in the above papers they cluster with the 'main' MCMs from the respective archaeal groups. Thus, there is no evidence that these genes are of viral origin, they are clearly archaeal. The association of MCMs with mobile elements might lead to acceleration of their evolutionary rates and subfunctionalization, namely dedicated involvement in the replication of these elements. The evolutionary scenario leading to the MCM association with mobile elements is of major interest but currently remains unclear.
I suspect that many other archaeal DNA replication proteins entered into cellular genomes that way, and this might be the case for the proteins discussed in that paper.
Authors' response: We do not see any evidence of this. No RecJ homologs have been detected in any viral genomes, and neither have we observed any associations of genes for RecJ-like proteins with viruses or mobile elements.
The authors have used very powerful analytic tools to detect remote similarities. However, when you perform a BLAST search with ribosomal proteins, RNA polymerase subunits or DNA replication proteins such as MCM, you don't have to use such sophisticated searches. These proteins exhibit extensive sequence similarities between these two domains. The situation is strikingly different with CDC45 and GAN. Similar BLAST searches with archaeal GAN proteins fail to retrieve significant similarities with eukaryotic Cdc45 proteins. In contrast, you recover indeed more similarities with bacterial RecJ. This is in striking contrast with the situation observed with all other proteins that are truly orthologs between Archaea and Bacteria. IN ALL CASES, the archaeal protein is much more similar to its eukaryotic homologues than to its bacterial homologues.
Authors' response: BLAST generally is not a reliable indicator of phylogenetic relationships especially for diverged proteins. Clearly, evolution of the RecJ family involved multiple accelerations of evolution. To characterize the evolutionary relationships between proteins and protein families, phylogenetic analysis and not direct sequences comparison is the approach of choice.
The phylogenetic analysis performed by the authors is more a clustering than a phylogenetic analysis since the various groups analyzed are all very divergent (including archaeal "RecJ" from bacterial RecJ). For me, it is difficult to understand why these proteins should have diverged much more than other DNA replication proteins, including MCM, Cdc6 or Topo IB between Archaea and Bacteria if they are true orthologs. It is also difficult to understand why they have instead conserved some similarities with their more remote bacterial ancestor! This does not make real sense.
Authors' response: As indicated repeatedly, RecJ-like proteins are a complex family with convoluted evolutionary history, and there are pitfalls in phylogenetic analysis of such families. Nevertheless, as pointed out in the text, the tree presented here is consistent with all previously established relationships between DHH protein families. Moreover, the diverged arCOGs from Desulfurococcales and Halobactreia cluster with arCOG00427_II which is compatible with the localization of all these genes in the conserved neighborhood with S15, S3 and Pcc1. Thus, despite the divergence of these sequences, it appears likely that the tree in general accurately reflects the relationships between these families. Furthermore, grouping of CDC45 with archaeal RecJ homologs is also consistent with the presence of GINS proteins which interact with CDC45 in eukaryotes and with RecJ homologs in archaea (but not in bacteria that do not encode any GINS proteins as far as we are aware)
In my opinion, it is more reasonable to think that many variants of a large superfamily, including Cdc45, RecJ, Gan and others, emerged and diverged in the virosphere very early on, i.e. before the divergence of the three domains, and were recruited later on independently to improve the efficiency of replication forks (or for various steps in DNA repair) in various domains and lineages.
Authors' response: See the response about viruses above. We are well aware of the importance of the virosphere in cellular evolution as a whole. However, in the case of the RecJ-like protein family, the absence of any link to viruses or mobile elements, location of even the most diverged genes in the same, conserved gene neighborhoods and interaction with GINS proteins (that are as well highly diverged and so far not found in any viral genomes), the virosphere does not seem to be directly involved here.
With this interpretation in mind, I would suggest to be more cautious before concluding that, in all cases, the proteins analyzed in this study are members of a GINS/MCM/"GAN" complex functionally analogous to the eukaryotic CDC45, MCM, GINS complex. This is possible, but to be sure will require much of experimental work.
Author's response: Clearly, the conservation of the complex in all Archaea is a prediction that stems from comparative genomic analysis. However, as far as such predictions go, we believe it is a very strong one.