Pandoraviruses are highly derived phycodnaviruses

The recently discovered Pandoraviruses are by far the largest viruses known, with their 2 megabase genomes exceeding in size the genomes of numerous bacteria and archaea. Pandoraviruses show a distant relationship with other nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes, lack some of the NCLDV core genes and in particular do not appear to be specifically related to the other, better characterized family of giant viruses, the Mimiviridae. Here we report phylogenetic analysis of 6 core NCLDV genes that confidently places Pandoraviruses within the family Phycodnaviridae, with an apparent specific affinity with Coccolithoviruses. We conclude that, despite their many unusual characteristics, Pandoraviruses are highly derived phycodnaviruses. These findings imply that giant viruses have independently evolved from smaller NCLDV on at least two occasions. This article was reviewed by Patrick Forterre and Lakshminarayan Iyer. For the full reviews, see the Reviewers’ reports section.

The discovery of giant viruses infecting unicellular eukaryotes, in particular amoeba, eliminated the distinction between viruses and cellular life forms in terms of size and genomic complexity [1]. Until very recently, all the discovered true giants of the virus world, with genomes exceeding 1 megabase (Mb) and encompassing more than 1,000 genes, were closely related members of the family Mimiviridae [2][3][4]. The gap between the members of the Mimiviridae and viruses outside this family was dramatic: apart from the mimiviruses, the largest viral genome, that of Emiliana huxleyi virus 86, was approximately 0.41 Mb in size [5]. The unexpected recent discovery of two strains of Pandoraviruses, Pandoravirus salinus and Pandoravirus dulcis, with genomes of at least 2.5 and 1.9 Mb, respectively, dramatically expanded the range of viral giantry [6]. In addition to being enormous, Pandoravirus genomes turned out to be highly unusual in that they showed little similarity to other viruses, lacked some of the core genes of the Nucleo-Cytoplasmic Large DNA Viruses (NCLDV, or the proposed order Megavirales) of eukaryotes [7][8][9][10] and failed to show clear-cut affinities in phylogenetic analysis [6]. We set out to investigate the repertoire of core NCLDV genes in pandoraviruses and their phylogenies in greater detail.

Ancestral NCLDV genes in Pandoraviruses
The sequences of the predicted proteins of Pandoraviruses were compared to the sequences of the NCLDV included in the clusters of orthologous viral genes (NCVOGs) [11] resulting in the inclusion of Pandoraviruses in 67 NCVOGs (Additional file 1). In particular, we found that, of the 49 inferred ancestral genes (NCVOGs), only 17 were represented in one or both Pandoraviruses ( Table 1). The low representation of Pandoraviruses in the NCVOGs and specifically, the absence of so many of the core, ancestral genes is anomalous among the NCLDV. To examine the extent of this anomaly, we tallied the number of ancestral NCVOGs that are represented in members of each of the 7 NCLDV families. The results indicate that Pandoraviruses stand out among the NCLDV with respect to the paucity of the (putative) ancestral viral genes ( Figure 1). This lack of conservation of core NCLDV genes is all the more striking considering the huge genome size of Pandoraviruses compared to the other NCLDV ( Figure 1) and suggests that Pandoraviruses are highly derived forms. Nevertheless, it should be stressed that the inclusion of Pandoraviruses into the NCLDV (in other words, their membership in the proposed order Megavirales [10]) is strongly supported by the presence of signature genes such as the primase-helicase fusion, packaging ATPase and thiol-disulfide oxidoreductase ( Table 1). The obvious glaring gap in the repertoire of conserved genes in pandoraviruses is the absence of detectable capsid proteins. The most abundant virion proteins detected by proteomic analysis failed to show significant similarity to any known capsid proteins [6]. Furthermore, our attempts to identify putative derived capsid proteins by screening the pandoravirus protein sequences with position-specific scoring matrices obtained from multiple alignments of capsid proteins of different groups of NCLDV failed to identify any plausible candidates (data not shown).

Phylogenetic analysis of conserved genes places Pandoraviruses within Phycodnaviridae
The pattern of best database hits in the BLASTP searches for the ancestral gene products of Pandoraviruses yielded a hint of a possible evolutionary relationship between Pandoraviruses and Phycodnaviridae, an expansive family of NCLDV that infect algae and other unicellular eukaryotes [12]. Indeed, among the best hits to homologous proteins from other NCLDV all but one were to homologs from the Phycodnaviridae family (Table 1).
To gain further insight into the origin of the Pandoraviruses, we then performed phylogenetic analysis of the 17 ancestral NCLDV genes that are represented in the pandoravirus genomes. In 6 of the 17 phylogenetic trees, Pandoraviruses grouped within the Phycodnaviridae clade, or in cases when such a clade was absent, with members of the family Phycodnaviridae (Figures 2-3 and Additional file 2). In 10 of the remaining trees, the Pandoravirus genes clustered with eukaryotic homologs (Additional file 2), suggestive of replacement of ancestral NCLDV genes with homologs derived from the hosts, as observed for multiple genes in the previous phylogenomic analysis of the NCLDV [13]. Only the gene for the dual specificity phosphatase (NCVOG0040) showed an apparent phylogenetic affinity with NCLDV outside Phycodnaviridae, namely with Marseilleviruses (Additional file 2). Similar to several other genes in the ancestral NCLDV gene set [13], the tree for the dual specificity phosphatases shows NCLDV scattered among homologs from cellular life forms (Additional file 2). This pattern suggests that the evolution of the phosphatase gene in the NCLDV involved multiple gene transfers and replacements. One of such gene transfers might have involved the phosphatase genes of pandoravirus and marseillevirus. Additional intervirus gene transfers could have involved among non-ancestral viral genes as implied by the detection of 17 pandoravirus genes with best database hits to mimivirus homologs [6]. Gene exchange between diverse viruses infecting amoebae has been reported previously. Indeed, amoebal cell, with their omnivorous phagocytic life style have been recognized as "melting pots" of horizontal gene transfers, so such intervirus gene exchanges could be expected. Within the Phycodnaviridae, the preferred grouping of Pandoraviruses was with Emiliana huxlei virus (the type member of the genus Coccolithovirus [5]) as exemplified by the phylogenetic tree of the DNA polymerase, one of the most highly conserved genes of the NCLDV for which a reliable phylogeny can be obtained ( Figure 2A). The highly conservative Approximately Unbiased (AU) test rejected all tested tree topologies with Pandoraviruses placed outside the Phycodnaviridae branch for the D5-like helicase-primase; for the other genes, some of the alternative topologies were not rejected by the AU test but all were assigned lower likelihood values (Additional file 2). Perhaps the strongest evidence of an evolutionary link between Pandoraviruses and Coccolithoviruses comes from the phylogenetic trees of two RNA polymerase (RNAP) subunits in which the two confidently grouped together as indicated by the bootstrap support value of 0.99 ( Figure 3). Coccolithoviruses are the only genus of phycodnaviruses that encode the RNAP subunits; the rest of the phycodnaviruses have lost the ancestral RNAP genes, presumably because these viruses employ the host RNAP during a nuclear phase of their reproduction cycle [11,12]. Thus, the shared presence of the two monophyletic RNAP subunit genes in Pandoraviruses and Coccolithoviruses is a shared derived character that supports the common origin of these viruses.
Taken together, the phylogenetic analysis results indicate that the ancestral NCLDV genes in Pandoraviruses largely share the evolutionary history with the homologous genes Representation of Pandoraviruses and 7 NCLDV families in the NCVOGs vs the total number of (predicted) protein-coding genes. 'Extended Mimiviridae' stands for Mimiviridae, Cafeteria roenbergensis virus, Phaeocystis globosa virus 12T, and Organic Lake phycodnaviruses that have been shown to comprise a monophyletic group [16].
of Phycodnaviruses, and more specifically, appear to have evolved from a common ancestor with Coccolithoviruses.

Implications for the evolution of giant viruses
Despite their enormous size, Pandoraviruses show no evolutionary connection with the other family of giant viruses, the Mimiviridae. Instead, phylogenetic analysis of the ancestral NCLDV genes points to an affinity between Pandoraviruses and Phycodnaviruses. Moreover, Pandoraviruses appear to belong within the Phycodnavirus branch, being a sister group of Coccolithoviruses. Certainly, the phylogenomic analysis that leads to this conclusion involves a proverbial "tree of 1%" [14]. Indeed, the entire evidence hinges on the topologies of 6 phylogenetic trees, albeit those for key NLCDV genes, and on the finding that two RNAP subunits genes are shared between  Figure 2 Maximum-Likelihood trees of ancestral NCLDV genes present in Pandoraviruses. A, DNA polymerase B, D5 primase-helicase. C, Poxvirus Late Transcription Factor VLTF3 like (A2L). D, A32-like packaging ATPase. Branches with bootstrap support less than 0.5 were collapsed. For individual sequences, the species name and the gene identification numbers are indicated; triangles denote multiple, collapsed sequences; env stands for environmental sequences (marine metagenome). Taxa abbreviations: c1, Asfarviridae; q2, Coccolithovirus; q3, Phaeovirus; q7, Raphidovirus.
Pandoraviruses and Coccolithoviruses, to the exclusion of other Phycodnaviruses. However, given that altogether Pandoraviruses retain only 17 of the 49 inferred ancestral NCLDV genes, there is not much potential for obtaining additional evidence on the relationship between these viruses and the other NCLDV although, as noticed above, some interviral gene exchanges within amoeba might have occurred.
Thus, it appears that, despite their extremely unusual gene repertoires, Pandoraviruses are highly derived Phycodnaviruses. This conclusion implies that giant viruses have evolved independently from less complex NCLDV on at least two independent occasions, within the families Mimiviridae and Phycodnaviridae (Figure 2A). Given the much smaller genomes of the other NCLDV and the lack of substantial similarity between the gene repertoires of Pandoraviruses and Mimiviruses, the scenario of independent gain of numerous genes in two lineages of NCLDV appears much more plausible than the alternative that would involve extensive degradation of extremely complex ancestors in multiple lineages. The discovery of additional, perhaps independently evolving giant viruses appears likely, and identification of the aspects of virus biology that favor such dramatic genome expansions is of major interest.

Conclusions
Phylogenomic analysis indicates that the giant Pandoraviruses, by far the largest viruses discovered to date, are highly derived Phycodnaviruses, most likely, the sister group of Coccolithoviruses. The more general implication of these findings is that giant viruses independently evolved in at least two lineages of the NCLDV.

Methods
P. dulcis and P. salinus protein sequences were retrieved from the non-redundant database at the National Center for Biotechnology Information (NIH, Bethesda). The nonredundant protein sequence database was searched using the PSI-BLAST program [15], with default parameters and the predicted Pandoravirus protein sequences used as queries. The reported results reflect searchers performed in August, 2013. The sequences for phylogenetic analysis were collected using (i) BLAST searches against nr and environmental (env_nr) databases initiated by Pandoravirus protein sequences; (ii) the corresponding NCVOG sequences [11]; and (iii) the corresponding mimiCOG sequences [16]. Nearly identical sequences were eliminated using BLASTCLUST (http://www.ncbi.nlm.nih.gov/Web/ Newsltr/Spring04/blastlab.html). Protein sequences were aligned using the MUSCLE program with default parameters [17]; columns containing a large fraction of gaps (greater than 30%) and non-homogenous columns defined as described previously [18] were removed from the alignment prior to phylogenetic analysis. A preliminary maximum-likelihood tree was constructed using the FastTree program with default parameters (JTT evolutionary model, discrete gamma model with 20 rate categories [19]) [19]. The preliminary tree and the alignment were then used to determine the best substitution matrix using Prottest [20]  trees were constructed using TreeFinder (1,000 replicates, Search Depth 2), with the substitution matrix that was found to be the best for a given alignment [21]. The Expected-Likelihood Weights (ELW) of 1,000 local rearrangements were used as confidence values of TreeFinder tree branches [21]. For tree topology testing, whenever applicable, alternative (constrained) topologies were constructed and compared to the initial trees using TreeFinder [21]. Approximately unbiased (AU) test P value cutoff 0.05 was used for rejecting tree topologies [22].

Reviewers' reports
Reviewer 1: Patrick Forterre (Institut Pasteur) Pandoraviruses are fascinating new organisms, which illustrates the capacity of viruses to produce drastically different types of virions, with strikingly different structures and genomes encoding from 2 genes up to 2500 genes [1]. In this paper, Yutin and Koonin have revisited the genomes of the two isolated Pandoraviruses and identified 6 of the 17 core NCLDV genes which consistently group within Phycodnaviridae (one of the NCLDVor Megaviralesfamilies) in phylogenetic analyses. They concluded that Pandoraviruses evolved from smaller Phycodnaviridae. The implication is that giant viruses (Mimiviridae and Pandoviruses) evolved twice independently from smaller viruses and not from cellular organisms.
The authors did not discuss the possibility that some Pandoravirus ancestor captured these 6 genes as an operon from a Phycodnavirus. We know that LGT can indeed occur between viruses co-infecting the same hosts. The authors state that: "in none of the trees pandoraviruses would cluster with any viruses outside the family Phycodnaviridae". However, it seems that the dual specificity phosphatase NCVOG0040 branch with Mimiviridae (Lausannevirus and Marseillevirus) suggesting that LGT have indeed occurred between Pandoraviruses and Mimiviridae. In their paper, Philippe and co-workers mentioned the existence of 17 genes of P. salinus that have their closest homolog (34% identical residues in average) within the Megaviridae [6]. This seems in contradiction with the results reported here.
Authors' response: The exceptional case of the dual specificity phosphatase was overlooked in the original submission (although the tree was included in Additional file 2), and we appreciate the reviewer pointing out this omission. Indeed, this case of apparent phylogenetic affinity between ancestral genes of Pandoraviruses and Marseilleviruses (sic! not Mimiviridae) is likely to originate from intervirus gene exchange within amoeba, and so do the non-ancestral genes apparently shared between Pandoraviruses and Mimiviruses. This aspect of the evolution of the giant viruses is briefly discussed in the revised manuscript. The full characterization of such gene exchanges requires a comprehensive phylogenomic analysis of giant viruses that is currently underway in our group. It should be noted, however, that ancestral genes of the NCLDV do not form operons or clusters, so the scenario under which pandoraviruses acquired the ancestral genes from Phycodnaviruses "as an operon" is hardly justified. More importantly, there is no contradiction between the conclusions of this work and the possibility of horizontal gene transfer between Pandoraviruses and Mimiviruses (and/or other viruses of amoeba) as the latter involved non-ancestral genes.
The presence among the 6 core genes related to Phycodnavirus of the packaging ATPase typical of viruses whose major capsid protein (MCP) contains a double-jelly roll fold structure is intriguing, since such MCP has not been detected in Pandoraviruses. This suggests several possibilities: 1) Pandoraviruses do encode an MCP that share ancestry with that of Phycodnaviruses, but is highly divergent and cannot be detected by sequence similarity.
2) The structural proteins of Pandoraviruses are unrelated to those of NCLDV, but the detected ATPase is involved in packaging.

3) The structural proteins of Pandoraviruses involved
in formation of the virion are unrelated to those of megavirales and the detected ATPase is not involved in packaging.
Could the authors discuss these different possibilities? Did they use sensitive methods to specifically search for MCP? Philippe et al. identified two abundant proteins that could be involved in formation of the virion. Did the authors analyse these proteins?
Authors' response: indeed, the absence of detectable capsid proteins in Pandoraviruses is most intriguing and is emphasized in the revised manuscript. Of the three hypotheses brought up in this comment, (1) and (2) appear to be most plausible. We did employ a sensitive search strategy to detect possible diverged capsid proteins homologous to those of other NCLDV as pointed out in the revised manuscript. With regard to the abundant virion proteins of pandoraviruses, we prefer to cite the original publication [6]. An exhaustive analysis of the sequences and predicted structures of these and other proteins of Pandoraviruses is a separate undertaking that will be published in due course.
Viral lineages are better defined by their capsid proteins, because these proteins are hallmarks of viruses (I use here capsid in a broad definition, including all type of structural assemblage involved in the formation of a virion) [1]. It has been shown that viruses producing homologous capsids can use different types of replicons, and that exchanges of replicons cassette genes have rather frequently occurred between viruses [23]. At the moment, it is therefore a bit premature to definitely classify Pandoraviruses as an NCLDV, because we know nothing about their virion structural proteins. One could thus imagine that Pandoraviruses belong to a novel major viral lineage and recruited in the past a cassette of replication/transcription genes from a Phycodnavirus. However, this scenario, gene cassette shuffling. is especially prevalent in viruses with small DNA genomes and has never or rarely been observed in large DNA viruses. Could the authors comment on this last point?
Authors' response: The nature of viral "lineages" and the comparative utility of structural and replicative proteins for reconstructions of virus evolution are matters of a long, storied debate [23][24][25][26][27]. Probably, the key message is that viral evolution is a complex network of relationship that involves both numerous gene exchanges and intervals of vertical evolution of gene modules [28,29]. Accordingly, both structural proteins and replicative proteins are important for evolutionary reconstructions. As repeatedly argued, replicative proteins are more informative because they retain more sequence conservation, show a strong tendency to come in coevolving modules, and most crucially, provide the potential for reconstructing evolutionary relationships between viruses and capsid-less selfish elements. As demonstrated in detail elsewhere, such relationships are pervasive in the evolution of different classes of selfish agents and essential for understanding the routes of their evolution [30]. Under the weight of all these considerations, we stick to our classification of Pandoraviruses as bona fide members of the NCLDV (Megavirales). As for the transfer of cassettes of replicative genes, we are indeed unaware of such events in the evolution of NCLDV.
My feeling is that the authors's interpretation (independent evolution of "giant" viruses from "big" viruses) is the correct one, in agreement with previous suggestion that NCLDV originated from smaller viruses predating LUCA [31] and the recent accordion model for genome evolution of Megavirales proposed by Jonathan Filée [32]. However, it will be important to obtain more insights into the origin and history of other genes of Pandoraviruses, especially those involved in the formation of the virion.
Authors' response: we could not agree more.
Anticipating criticisms, Yutin and Koonin remark that their analysis is a case of "tree of 1%" or less, since it is based on 7 genes only, out of 2500. However, one should not forget that the rRNA tree (0.1%) was sufficient to identify the three domains structure of the universal tree of life.
Authors' response: true but that criterion makes sense only because rRNA coevolves with numerous other genes, even if not perfectly. The giant Pandoraviruses are the largest dsDNA viruses sequenced to date with over 2000 genes. Although the initial sequencing effort recognized the relationship of the Pandoraviruses to the NCLDV, it did not clarify their precise affinities to other viruses within this group. Yutin and Koonin convincingly demonstrate that the Pandoraviruses are divergent Phycodnaviruses, and with the existing data posit a special relationship to Coccolithoviruses. The observations are independently reproducible and the conclusions justified given the data.