Evolution of DNA ligases of Nucleo-Cytoplasmic Large DNA viruses of eukaryotes: a case of hidden complexity
© Yutin and Koonin. 2009
Received: 17 December 2009
Accepted: 18 December 2009
Published: 18 December 2009
Skip to main content
© Yutin and Koonin. 2009
Received: 17 December 2009
Accepted: 18 December 2009
Published: 18 December 2009
Eukaryotic Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) encode most if not all of the enzymes involved in their DNA replication. It has been inferred that genes for these enzymes were already present in the last common ancestor of the NCLDV. However, the details of the evolution of these genes that bear on the complexity of the putative ancestral NCLDV and on the evolutionary relationships between viruses and their hosts are not well understood.
Phylogenetic analysis of the ATP-dependent and NAD-dependent DNA ligases encoded by the NCLDV reveals an unexpectedly complex evolutionary history. The NAD-dependent ligases are encoded only by a minority of NCLDV (including mimiviruses, some iridoviruses and entomopoxviruses) but phylogenetic analysis clearly indicated that all viral NAD-dependent ligases are monophyletic. Combined with the topology of the NCLDV tree derived by consensus of trees for universally conserved genes suggests that this enzyme was represented in the ancestral NCLDV. Phylogenetic analysis of ATP-dependent ligases that are encoded by chordopoxviruses, most of the phycodnaviruses and Marseillevirus failed to demonstrate monophyly and instead revealed an unexpectedly complex evolutionary trajectory. The ligases of the majority of phycodnaviruses and Marseillevirus seem to have evolved from bacteriophage or bacterial homologs; the ligase of one phycodnavirus, Emiliana huxlei virus, belongs to the eukaryotic DNA ligase I branch; and ligases of chordopoxviruses unequivocally cluster with eukaryotic DNA ligase III.
Examination of phyletic patterns and phylogenetic analysis of DNA ligases of the NCLDV suggest that the common ancestor of the extant NCLDV encoded an NAD-dependent ligase that most likely was acquired from a bacteriophage at the early stages of evolution of eukaryotes. By contrast, ATP-dependent ligases from different prokaryotic and eukaryotic sources displaced the ancestral NAD-dependent ligase at different stages of subsequent evolution. These findings emphasize complex routes of viral evolution that become apparent through detailed phylogenomic analysis but not necessarily in reconstructions based on phyletic patterns of genes.
This article was reviewed by: Patrick Forterre, George V. Shpakovski, and Igor B. Zhulin.
Viruses are ubiquitous parasites of all cellular life forms. In recent years, extensive genome sequencing and comparative analysis of both viral and host genomes yielded unprecedented insights into the evolution of viruses. In particular, it has been shown that 4 diverse families of large DNA viruses of eukaryotes (NCLDV), namely, Poxviridae, Asfarviridae, Iridoviridae, and Phycodnaviridae, share a set of conserved genes with functions implicated in replication, transcription and virion morphogenesis, suggesting an origin from a single ancestral virus. This apparently monophyletic class of viruses was denoted Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) to emphasize the presence of a cytoplasmic stage in the reproduction of most if not all of these viruses . The existence of such a cytoplasmic stage, during which virus replication is physically separated from the replication and expression of the host genome, and occurs in cytoplasmic viral "factories"[2, 3], can be reasonably thought of as the driving force behind the retention of genes encoding replicative proteins in the NCLDV genomes. The analysis of subsequently sequenced genomes representing 3 additional viral families, namely, Ascoviridae, Mimiviridae, and very recently, a novel family typified by the Marseillevirus, strongly supported the conclusion on the monophyly of the NCLDV [4, 5]. The reconstruction of the ancestral NCLDV gene set using a maximum parsimony method or a more sophisticated maximum likelihood approach  led to the delineation of a set of 40-50 ancestral genes that include the genes for the key proteins required for genome replication, expression and virion morphogenesis.
One of the most dramatic revelations of comparative genomics is that the set of universal genes (defined in terms of orthologous gene sets) in all life forms is much smaller than the set of universal molecular functions [7, 8]. The principal underlying cause is non-orthologous gene displacement (NOGD) whereby the same essential function is performed by unrelated or at least not orthologous genes . Often, especially among prokaryotes, NOGD is coupled to horizontal gene transfer (HGT) of the respective genes. Owing to NOGD and HGT, phyletic patterns (that is, patterns of presence-absence in sequenced genomes) are complex, diverse and patchy for the majority of genes, even those involved in essential functions [10, 11].
The case of viral genes is especially complicated because many viral functions can be complemented and replaced by functionally analogous host proteins that may or may not be homologous to the respective viral proteins; additionally, many viruses integrate host genes into their own genomes, so the functional repertoire of viral genes evolves in an incessant, dynamic interaction with the host gene repertoire. For instance, the RNA polymerase holoenzyme is obviously essential for the expression of any DNA genome. However, many large DNA viruses including certain NCLDV (such as some of the phycodnaviruses) that replicate in the cell nucleus (as well as herpesviruses and baculoviruses) do not encode any RNA polymerase subunits and fully rely on the host transcription machinery . The DNA replication apparatus of the NCLDV is substantially autonomous from the host replication system. All sequenced NCLDV genomes contain genes for 3 essential replication enzymes, namely, DNA polymerase, primase and distinct replicative helicase (the latter two genes typically are fused to produce a two-domain primase-helicase). However, other proteins that are also essential for replication, in particular, DNA ligase and (predicted) flap endonuclease, are found only in subsets of the NCLDV, with the implication that viruses that lack these enzymes employ host analogs . Very recently, this proposition was experimentally validated for the vaccinia virus (VACV) DNA ligase . The DNA ligase gene of VACV can be knocked out with minimal or no adverse effect on virus growth in cell culture . However, inhibition of the mammalian DNA ligase I but not any of the other 2 human ligases (III and IV) with a siRNA specifically abrogated the growth of the ligase knockout virus indicating that this host ligase was specifically recruited for VACV replication .
The DNA ligases of the NCLDV seem to epitomize the haphazard aspect of viral genome evolution. Some of the NCLDV encode ATP-dependent ligases that, among cellular life forms, are ubiquitous and essential for replication in archaea and eukaryotes, but are only sporadically found in bacteria where they seem to contribute to distinct forms of DNA repair [14–16]. Furthermore, the ligases of chordopoxviruses show extensive sequence similarity to mammalian ligase III . Another subset of the NCLDV encode NAD-dependent ligases that are distantly related to the ATP-dependent ligases [18, 19] and are ubiquitous and essential for replication in bacteria but represented only sporadically in archaea and eukaryotes [16, 20]. Finally, a considerable fraction of the NCLDV do not appear to encode any ligases.
The reconstruction of the NCLDV genome evolution tentatively placed the ATP-dependent ligase into the last common ancestor, with the implication that in some lineages the putative ancestral ligase was lost, whereas in others it was replaced with the NAD-dependent ligases . However, we were interested in applying phylogenetic methods along with comparative-genomic analysis, with the aim to reconstruct the history of the NCLDV ligases in greater detail and more definitively. The results of this study suggest an unexpected, complex evolutionary scenario.
DNA ligases of the NCLDV
Goatpox virus Pellor
Sheeppox virus 17077-99
Lumpy skin disease virus NI-2490
Deerpox virus W-848-83
Rabbit fibroma virus
Molluscum contagiosum virus
Variola virus (smallpox virus)
Orf virus, complete genome
Bovine papular stomatitis virus
Yaba monkey tumor virus
Yaba-like disease virus
Amsacta moorei entomopoxvirus
Melanoplus sanguinipes entomopoxvirus
Heliothis virescens ascovirus 3e
Trichoplusia ni ascovirus 2c
Spodoptera frugiperda ascovirus 1a
African swine fever virus
Aedes taeniorhynchus iridescent virus (Invertebrate iridescent virus 3)
Iridovirus (small iridescent insect viruses)
Invertebrate iridescent virus 6
Lymphocystis disease virus 1
Lymphocystis disease virus - isolate China
Infectious spleen and kidney necrosis virus
Singapore grouper iridovirus
Frog virus 3
Ambystoma tigrinum virus
Acanthamoeba polyphaga mimivirus
Paramecium bursaria Chlorella virus AR158
Paramecium bursaria Chlorella virus NY2A
Paramecium bursaria chlorella virus MT325
Acanthocystis turfacea Chlorella virus 1
Paramecium bursaria Chlorella virus FR483
Paramecium bursaria Chlorella virus 1
Emiliania huxleyi virus 86
Feldmannia species virus
Ectocarpus siliculosus virus 1
Ostreococcus virus OsV5
The ATP-dependent ligase of one of the phycodnaviruses, Emiliana huxlei virus (representative of the genus Coccolithovirus), belonged to the eukaryotic ligase I branch (Figure 2). The rest of the ATP-dependent ligases of the NCLDV, namely, those of African Swine Fever virus (the only available representative of the family Asfarviridae), phycodnaviruses and Marseillevirus, were scattered within a large cluster of ATP-dependent ligases of various bacteria and bacteriophages (Figure 2). The AU test indicated that the tree in which the latter 3 groups of viral ligases formed a clade could be effectively ruled out as an likely alternative to the tree in Figure 2 (the alternative topology with monophyletic NCLDV was associated with a zero AU value).
The specific sources of the NCLDV ligases are difficult to pinpoint with certainty but the trees in Figures 1 and 2 provide clues. The clustering of NAD-dependent viral ligases with homologs from (primarily) gamma-proteobacteria and bacteriophages (Figure 1) suggests that the viral ancestor of the NCLDV captured the ligase gene from a bacteriophage, perhaps, one that infected the mitochondrial endosymbiont. This scenario is compatible with the hypothesis that the NCLDV evolved concomitantly with eukaryogenesis . Indeed, the acquisition of the ligase gene was most likely an early event considering the rapid degradation of the mitochondrial endosymbiont . The tree of ATP-dependent ligases (Figure 1) suggests that they displaced the ancestral NAD-dependent ligase on several independent occasions and at different stages of viral evolution (Figure 3). Early replacements of the ancestral NAD-dependent ligase by ATP-dependent ligases of bacterial or bacteriophage origin apparently occurred independently in phycodnaviruses, Marseillevirus, and asfarviruses. By contrast, later displacements of the NAD-dependent ligases with ATP-dependent ligases of eukaryotic origin seem to have occurred in Emiliana huxlei virus and in chordopoxviruses. In particular, ligase III apparently was acquired by chordopoxviruses not only after the radiation of the entompoxviruses and chordopoxviruses (probably concomitant with the radiation of the host animals) but some time into the evolution of chordopoxviruses themselves, given that the earliest branching group in this subfamily, crocodile pox virus, still has an NAD-dependent ligase (Table 1 and Additional File 1). Considering the absence of genes for any ligases in many extant NCLDV and the lack of genomes encoding both an NAD-dependent and an ATP-dependent ligase, it seems likely that the displacements occurred via a ligase-less intermediate as opposed to an intermediate encoding both types of ligases.
Both the major form of bacterial and bacteriophage NAD-dependent ligase (LigA) and eukaryotic ligase III possess a C-terminal BRCT (BRCA1 C-terminal) domain that mediates interactions with components of various protein complexes involved in repair and cell cycle control[27, 28]. The BRCT domain is present in the NAD-dependent ligases of the mimiviruses and iridoviruses but is missing in the NAD-dependent ligases of entomopoxviruses and the ATP-dependent ligases of chordopoxviruses, the apparent origin of the latter from ligase III notwithstanding. Thus, acquisition of cellular ligase genes by distinct NCLDV lineages was accompanied by independent but analogous truncations of the acquired gene.
The phylogenomic analysis described here led to unexpected conclusions. Although ATP-dependent ligases are more common among the NCLDV than NAD-dependent ligases, it is the latter that can be traced back to the last common ancestor of the NCLDV whereas the ATP-dependent ligases apparently were acquired by several viral lineages at different stages of their evolution. The most general message brought about by these findings is that phyletic patterns alone, at least, in some cases, are insufficient for an accurate evolutionary reconstruction and can lead to substantially oversimplified or even false scenarios.
More specifically, the study of the evolution of viral ligases has general implications for understanding the evolution of the NCLDV. The apparent acquisition of the NAD-dependent ligase gene at an early stage of evolution antedating the last common ancestor of the NCLDV is compatible with the scenario of the origin of eukaryotic viruses by assembly of genes from diverse sources including bacteriophages and bacteria in the course of eukaryogenesis . The apparent independent displacement of the NAD-dependent ligase by ATP-dependent ligases of bacterial/bacteriophage origin in several NCLDV lineages implies that the primary radiation of the NCLDV occurred at the earliest stages of the evolution of eukaryotes, conceivably, prior to the radiation of eukaryotic supergroups and before the process of mitochondrial reduction was completed. A similar interpretation of the NCLDV evolution was given previously on the basis of the examination of the host ranges of the major lineages of these viruses . Phylogenomic analysis of other conserved NCLDV genes has the potential to further clarify the evolutionary scenario and, possibly, the origin of this important class of viruses.
The non-redundant protein sequence databases at the NCBI were searched using the BLASTP and PSI-BLAST programs with the expectation (E) value for sequence inclusion in PSI-BLASt iterations set at 0.005 . The sequences for phylogenetic analysis were aligned using the MUSCLE program with the default parameters . To eliminate poorly aligned regions, each alignment column was assigned a homogeneity value by scaling the sum-of-pairs score within the column between those of a homogeneous column (the same residue in all aligned sequences) and a random column (YIW, I. A. Seledtsov, K. S. Makarova, unpublished). Columns with homogeneity of less than 0.2 and/or with more than one-third of gap characters were removed from the alignment.
Maximum Likelihood (ML) phylogenetic trees were constructed using the TreeFinder software , with the estimated site rates heterogeneity and the WAG (Whelan and Goldman) substitution model . The Expected-Likelihood Weights  of 1,000 local rearrangements were used as confidence values of TreeFinder tree branches. The Approximately Unbiased (AU) test of tree topologies  was applied using TreeFinder .
The origin of viral genes is presently a hot topic of discussion. Some authors consider that all viral genes originated from cellular genes robbed by viruses whereas others emphasize that new genes can also appear in viral lineages and be transferred later on from viruses to cells. Phylogenies combining viral and cellular genes are indeed often difficult to interpret, especially the direction of transfer, and the interpretation can be strongly dependent of the prejudice of the author. This paper is an example of such case. Eugene Koonin has recognized in a previous paper the existence of specific viral genes (hallmark genes) that, according to him, even originated before cells, in a primordial hydrothermal vent . However, he is also a proponent of the idea that eukaryotes and their viruses originated after the emergence of Archaea and Bacteria from this primordial vent . In this scenario, eukaryotes originated by the association of an ancient bacterium and an ancient archaeon, and eukaryotic viruses originated by combining genes of bacterial and archaeal viruses together with eukaryotic genes from their hosts. In this scenario, genes present in eukaryotic viruses could not have originated directly from the ancestral viral world, but should have originated either from prokaryotic viruses or from the virus hosts. As a consequence, the authors systematically favour in this paper transfers from cells to viruses to explain the origin of NCLDV ligases.
I have a different prejudice. I think that most core genes of NCLDV are viral hallmark genes that originated directly in ancient viral lineages. Furthermore, I suspect that NCLDV proteins involved in replication might have played an important role in the origin of modern eukaryotic DNA genomes . I thus favour the possibility of ancient LGT from NCLDV to eukaryotes. The phylogenies presented in this paper do not really allow to decide between these different interpretations but raises interesting questions.
An important point, in my opinion, is to determine, if possible, which ligase (s) was (were) present in the last common ancestor of each of the three domains. This is missing in the manuscript. To my understanding, the ancestral bacterium should have contained an NAD-dependent DNA ligase. It should thus be important to present a more exhaustive tree of bacterial NAD ligase in Figure 1, with their distribution among the various bacterial divisions, to clarify this point. For me, the tree in Figure 1 suggests that several subfamilies of NAD ligases were already established before one of them was recruited in the bacterial domain. This tree suggests also the existence of several subfamilies presently encoded by bacterioviruses and distantly related to the NCLDV DNA ligases. For me, it fits very well with the idea that the bacterial DNA ligase has a viral origin, and that Bacteria on one side, Archaea/Eukarya on the other, recruited independently their DNA replication machinery from different viruses.
In the case of ATP-dependent ligases, it should be important to determine which ligases were present in the LECA (the Last Eukaryotic Common Ancestor), in the ancestor of Archaea, and possibly in the common ancestor of Archaea and Eukarya. The trees suggest for me that ATP-dependent ligases were in fact recruited several times independently in the three domains. It should be also important to indicate on the tree where are the hosts of respective viruses. This can sometimes helps to polarize the direction of transfer. For instance, if the host is not located at its expected position in the cellular tree, one can suspect a transfer from the virus to the host.
In these phylogenies, when the authors detect a DNA ligase gene in Bacteria or Archaea, it would be important to determine if this is the bona fide ligase of the domain, or a ligase present in the genome of an integrated virus and/or plasmid . In other words, it is important to distinguish in cellular genomes those genes that are really cellular (they were already present in the ancestor of a particular domain or recruited from another cell by LGT) and those that are in fact viral (present in integrated viruses or related elements). The confusion between these two kinds of genes in phylogenies can explain, in my opinion, the present confusion between networks and trees.
Finally, it should be interesting to have an idea of the structural differences between the various ligase subfamilies studied here. In my opinion, cellular ligases (originating from the ancestor of a particular domain) should be very similar in terms of structure, as it is the case for instance for DNA gyrase in all Bacteria or Topo II in all Eukarya, or else for Topo IB in both Archaea and Eukarya [37, 38]. In contrast, families that diverged from ancient viral lineages before LUCA should exhibit more structural differences (as it is the case between DNA gyrase, T4 and eukaryotic Topo II or else between archaeal/eukayal Topo IB and bacterial or Poxvirus Topo IB) [37, 38].
I thus encourage the authors to consider the two alternative possibilities in the discussion of their results, either LGT from cells to viruses or from viruses to cells.
We greatly appreciate this review that puts the rather limited and specific study described in this article into the much more general context of evolution of viruses and cells, and while doing so, offers a perspective on these general issues that is orthogonal to our own view in some important aspects while congruent in others. There is no need to discuss these concepts as this was done in several previous publications [29, 39, 40]. However, a brief summary of the differences is due. The distinction between our position and that of Forterre that is of primary relevance for the conclusions of this study is the adoption of different scenarios for the origin of eukaryotes. Our position is that the first eukaryotic cells emerged as a result of engulfment of an alpha-proteobacterium, the future mitochondrion, by an archaeon. To the best of our understanding, this scenario is best compatible both with comparative-genomic results and with more general considerations stemming from the parsimony principle [41–44]. This scenario, of course, has critical implications for the origin of viruses infecting eukaryotes: these viruses are thought to have evolved in a « second melting pot of viral evolution », concomitant with eukaryogenesis, through amalgamation of gene from viruses of prokaryotes, bacteria, archaea and the emerging eukaryote . Under this scenario, the possibility of the origin of any genes of the NCLDV directly from the primoridal pool of virus-like elements can be safely ruled out although some of the hallmarks could come from that pool through viruses of prokaryotes, having never been integral genomic components of cellular life forms. Acquisition of genes from the evolving NCLDV by the eukaryotic host remains a possibility but there seems to be no compelling evidence that this was a major route of eukaryote evolution.
Forterre propounds a different scenario of evolution that includes a primordial virus world as well but considers eukaryotes to be one of the primary lines of descent in the evolution of cells [35, 45–47]. It is further proposed that original bacterial, archaeal and eukaryotic cells might have possessed RNA genomes, whereas the DNA replication machineries were invented by viruses and independently grafted onto the 3 cell types . Other authors also have developed evolutionary scenarios under which eukaryotes represent a primordial cellular lineage, possibly, even the first type of cells to evolve [48, 49]. In our view, there is little evidence if any evidence in of the « primordial eukaryotes » scenarios (see specifically ), so we cannot really agree that our adherence to the symbiogenetic scenario is a « prejudice ». Nevertheless, we do recognize that a definitive elucidation of the sequence of events that led to the emergence of eukaryotes is an extremely difficult task that requires much additional phylogenomic analysis and might not be attainable in the near future. Clearly, under the « primordial eukaryotic lineage » scenario, viral hallmark genes in the NCLDV could plausibly originate from the primordial virus world, and transfer of genes from these viruses to the eukaryotic hosts potentially could be a major route of evolution. Thus, interpretation of the phylogenomic analysis of viral genes, to a large extent, hinges on the choice between the two orthogonal scenarios for the origin of eukaryotes that cannot be definitively distinguished at this time.
Having acknowledged this uncertainty, we would like to point out that the history of the NCLDV ligases elucidated in this study is poorly compatible with the contribution of viral genes to the host genomes that is hypothesized by Forterre. Indeed, we show that viral NAD-dependent ligases are monophyletic and by implication were probably present in the common ancestor of the extant NCLDV. However, this class of ligases is only sporadically represented in eukaryotic genomes (it cannot be ruled out that the respective genes were acquired from viruses although so far there are no direct evidence of such transfers). By contrast, ATP-dependent ligases are ubiquitous in eukaryotes but polyphyletic in the NCLDV, a finding that seems to effectively rule out the origin of eukaryotic ligases from viruses of this class. Moreover, there are two « smoking » guns of acquisition of ATP-dependent ligase genes from the hosts by distinct lineages of the NCLDV, namely, the ligase III homolog in poxviruses and the ligase I homolog in Emiliana huxlei virus.
Finally, we should note that a more extensive (comprehensive) phylogenetic analysis suggested by Forterre is certainly of interest. However, in this paper, we focus specifically on the history of the NCLDV ligases; for this analysis, we used representative sets of ATP-dependent and NAD-dependent ligases from bacteria (and bacterial viruses), archaea, and eukaryotes, so we do not expect that inclusion of more exhaustive sequence sets (which complicates the construction of ML trees) would affect the message.
The authors have performed a detailed phylogenomic analysis and reconstructed the history of the DNA ligases in Nuclear-Cytoplasmic Large DNA Viruses of eukaryotes (NCLDV). The study reveals a quite complex evolutionary trajectory of the enzyme involved in NCLDV' genome replication: although ATP-dependent ligases are approx. 3 times more often present in different NCLDVs and previous studies (Ref. 1 & Ref. 4 in the paper) regarded this type of enzyme as the one which was present in the ancestral NCLDV genome (alone or together with the NAD-dependent DNA ligase form), the last common ancestor of the NCLDV probably contained an NAD-dependent enzyme only (Fig. 3). The novel conclusion reported in the manuscript is based mostly on the monophyly of both the NCLDV as a group and all viral NAD-dependent ligases. The story is nicely presented and interpreted but, in my opinion, the scenario suggested is not the only one which could be inferred. Because of clearly demonstrated polyphyletic origin of the viral ATP-dependent ligases (Fig. 2), the presence of enzyme of this type in the ancestral NCLDV genome can be effectively ruled out. But the fourth remaining possibility (that the ancient, primordial NCLDV genome did not contained its own DNA ligase, but employed host analogs instead, like 19 out of 45 currently known viruses do) cannot be excluded. Following this assumption it could be possible to effectively decrease the total number of evolutionary events (exemplified by 'lost' or 'gain' of the DNA ligases - see Fig. 3 of the manuscript) from 17 events (as it is now in the authors' interpretation) to only 10, which will be more in concord with the Occam's razor principle. Additional data and further research will probably clarify the matter.
The hypothesis that the ancestral NCLDV encoded no ligase was implicit in the original manuscript but in the revision we made it explicit. However, we also indicate that this scenario is not particularly plausible, given the monophyly of the NAD-dependent ligases of the NCLDV, because it would require multiple gene transfers between viruses infecting taxonomically distant hosts. Put another way, although the number of the events under this scenario indeed could be smaller than under the ancestral NAD-dependent ligase scenario that we favor, the nature of the inferred events also should be taken into account. As some classes of events (such as the well established gene loss) are more likely than other (such as the dubious gene transfer between viruses from distant hosts).
This is a study of DNA ligases in Eukaryotic Nucleo-Cytoplasmic Large DNA Viruses (NCLDV). Unlike other viruses, which exclusively use the host DNA replication machinery, many of NCLDV encode their own DNA replication machinery. This machinery proves useful when viruses inhabit the cytoplasm of eukaryotes, rather than the nucleus. NCLDV have been shown to encode two types of DNA ligases: NAD-dependent and ATP-dependent. ATP-dependent ligases were known to be more prevalent, but the details of their origins were unknown. Bacteria predominantly use NAD-dependent ligases, whereas Archaea and Eukaryotes primarily use ATP-dependent ligases. Here authors collected the NAD and ATP ligases from NCLDV genomes and examined their distribution patterns as well as their relationships to homologous enzymes in selected Bacteria, Archaea, and Eukaryotes. Phylogenetic analysis of the proteins shows that all viral NAD-dependent ligases are monophyletic, whereas viral ATP-dependent enzymes show sporadic distribution throughout the trees. This information along with distribution patterns mapped onto a viral species tree built from conserved NCLDV genes support that NAD-dependent ligases were present in the NCLDV common ancestor, but were displaced by ATP-dependent ligases on multiple independent occasions in viral evolution.
This is a very clearly organized manuscript. It addresses a specific question, and the results support the conclusions. I'm not as excited about the results as are the authors, but that is because I do not know much about viruses.
I have one minor question that is more of a personal curiosity than a criticism. The species tree (Fig. 3) is a consensus of the individual trees of conserved NCLDV genes, and I wonder how this would compared to a tree built from a concatenated alignment of these genes/proteins?
My only issue with the paper is the methods section on sequence identification and alignment. It is too short and contains no detail whatsoever. There is nothing I can say that is not illustrated by simply copying it below:
"Protein sequence databases at the NCBI were searched using the BLASTP and PSIBLAST programs . The sequences for phylogenetic analysis were aligned using the MUSCLE program . Poorly conserved positions and positions including gaps in more than one-third of the sequences were removed prior to tree computation".
What databases? What cutoffs and parameters? How were the poorly conserved positions identified?
Providing the necessary methodology details is essential, especially for a computational paper.
Authors' response: The details are included in the revised manuscript.
The authors thank Bernard Moss for providing results prior to publication and critical reading of the manuscript, and Tatiana Senkevich for helpful discussions. The authors' research is supported by the DHHS (National Library of Medicine, National Institutes of Health) intramural research program.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.