DNA topoisomerases are ubiquitous enzymes that control DNA topology and solve topological conflicts arising during DNA replication, transcription, and recombination [1–3] (For a recent review on DNA topoisomerases see also [4]). Based on their mechanisms of action, DNA topoisomerases belong to two classes, type I (Topo I) and type II (Topo II): Topo I change the number of DNA topological links by introducing transient single-stranded breaks in the DNA molecule, whereas Topo II introduce transient double-stranded breaks. According to phylogenetic criteria, both Topo II and Topo I classes regroup several families of unrelated (i.e. non homologous) proteins: Topo IIA and IIB on one hand, and Topo IA (that also includes the so-called Topo III of eukaryotes and bacteria), IB and IC on the other hand [5, 6]. This indicates that enzymes with either Topo I or Topo II activity originated multiple times independently in the course of evolution. For instance, Topo IIA and IIB share a homologous ATP binding subunit, but their DNA cleavage-religation subunits are non homologous and are structurally unrelated [2, 7]. Regarding Topo I enzymes, Topo IA, which form a transient covalent link in 5' of the DNA break during the reaction of topoisomerization, share a Toprim domain with Topo II, some nucleases and primases [8], whereas Topo IB, which form a transient covalent link in 3' of the DNA break, are distantly related to tyrosine recombinases [2, 9]. Although Topo IC forms a 3' DNA link similarly to Topo IB, it harbors a novel unique fold, and is unrelated to Topo IB and tyrosine recombinases [10]. The three different Topo I families show very distinctive distributions in the living world: Topo IA are present in currently available complete genomes of organisms from the three domains of life [6], whereas Topo IC appears so far specific to one particular species, the archaeon Methanopyrus kandleri [5]. Finally, Topo IB is present in eukaryotes, in poxviruses, in the mimivirus, and in some bacteria [6, 10, 11].
Topo IB (sometimes named swivelase) was first described in mouse and plays a very important role [1, 12]. Indeed, whereas Topo IA can only relax negative superturns, Topo IB can relax both positive and negative superturns in vitro. As a consequence, eukaryotic Topo IB may relax the positive superturns that accumulate in front of replication forks or transcription bubbles during DNA replication, transcription, and chromatin assembly. In addition, Topo IB may also relax the compensatory positive superturns that form when the DNA becomes negatively wrapped around the histone octamer during nucleosome formation. Although these tasks can be fulfilled also by Topo II enzymes, genetic analyses have clearly indicated that Topo IB plays a major role in DNA replication, transcription and chromatin assembly in Saccharomyces cerevisiae [13–15]. Testifying for its crucial role in eukaryotes, Topo IB is the target of one of the most important antitumoral drugs, camptothecin [16]. Topo IB have been discovered in Poxviruses by Bauer and colleagues in 1977 [17], and the vaccinia virus Topo IB has been widely used as a model system to decipher the catalytic activity of this enzyme [18–20] and more recently to search for new antiviral drugs [21]. However, viral Topo IB are quite different from their eukaryotic counterparts, since they harbour a specific domain (virDNA-Topo-I_N) in their N-terminus instead of the long Topoisom_I_N domain found in eukaryotic homologues (Figure 1A and Additional files 1). Recently, homologues of Topo IB have been detected in several bacterial genomes and one of these has been characterized from Deinococcus radiodurans [22]. These bacterial Topo IB harbour a domain organisation close to the viral enzymes (Figure 1A and Additional files 1).
Up to now, Topo IB have never been observed in Archaea, in sharp contrast to members of the Topo IA family which are present in one or more copies in all archaeal genomes [6] (Additional files 2 and 3). Surprisingly, we recently noticed that a Topo IB coding gene was identified in the genome of the archaeon Cenarchaeum symbiosum [23, 24], but that a Topo IA coding gene was absent [24]. Phylogenetic analyses of the archaeal domain based on concatenation of ribosomal proteins and comparative genome analysis have recently led us to propose that C. symbiosum and its relatives, formerly included in the phylum Crenarchaeota, should be considered as members of a separate and possibly ancient phylum, that we proposed to name Thaumarchaeota [24]. We predicted that the absence of a Topo IA and the presence of a Topo IB might be a distinctive feature of all thaumarchaeota members. As expected, we have detected an archaeal Topo IB homologue (YP_001582656), misannotated as an 2-alkenal reductase, in the recently sequenced genome of a second thaumarchaeon Nitrosopumilus maritimus [25], which also lacks a Topo IA homologue. Both thaumarchaeal Topo IB display a domain organisation that is very similar to that of their eukaryotic homologues, since these harbour both the N-terminal Topoisom_I_N and the C-terminal Topoisom_I domain (Figure 1A and Additional files 1). The main difference between the eukaryotic and the archaeal Topo IB is that the former possess a long and highly variable extension upstream of the Topoisom_I_N domain that is absent in the archaeal sequences (Figure 1A and Additional files 1). Two hypotheses can be proposed to explain the presence of a Topo IB coding gene in Thaumarchaeota. One is that this gene was acquired by the last common ancestor of Thaumarchaeota via a horizontal gene transfer (HGT) (blue arrow, Figure 1B-a). In that case, the donor would have been a eukaryote since both the thaumarchaeal and the eukaryotic Topo IB harbour a similar domain organisation. Alternatively, a Topo IB coding gene might have been present in the last common ancestor of Archaea and Eucarya and was then lost in all archaea, except in the lineage leading to Thaumarchaeota (Figures 1Bb-d). To distinguish between these two hypotheses on the origin of thaumarchaeal Topo IB, we have performed an in-depth phylogenetic analysis of Topo IB homologues.
We retrieved homologues of Topo IB from the nr database at the NCBI (117 sequences from Eucarya, 2 from Archaea, 152 from Bacteria and 30 from viruses), as well as some environmental putative thaumarchaeal sequences from the GOS project [26] at the NCBI (For more details, see Additional files 2). We then selected 151 sequences representatives of Topo IB diversity for phylogenetic analysis. The resulting maximum likelihood tree (Figure 2) shows that the two archaeal Topo IB group with the few environmental sequences (BV = 100%) confirming that these are likely from yet uncultivated representatives of Thaumarchaeota. Although thaumarchaeal sequences are not yet abundant in environmental databases, this suggests that the presence of Topo IB is very likely a characteristic of this phylum. Moreover, thaumarchaeal Topo IB form a strongly supported sister-group to their eukaryotic homologues (BV = 100%). This sister-grouping of eukaryotic and thaumarchaeal sequences is also strongly supported when other reconstruction methods are used (not shown). The fact that thaumarchaeal sequences are sister to eukaryotes and do not arise from within them, coupled to the absence of the N-terminal extension in the archaeal sequences, strongly suggest that Thaumarchaeota did not acquire their Topo IB gene from a present-day eukaryotic lineage via a recent HGT. Based on phylogenetic and genomic analysis, we have recently proposed that Thaumarchaeota may represent the deepest branching lineage in the archaeal phylogeny, i.e. they emerged before the divergence between Euryarchaeota and Crenarchaeota [24]. This proposal is consistent with a large scale analysis performed by Koonin and collaborators [27]. The basal branching of Thaumarchaeota is also supported by the fact that, as in eukaryotes, the largest subunit of the RNA polymerase is not split in C. symbiosum and N. maritimus whereas it is split in A00 and A0 polypeptides in all other archaea for which sequences are available [28]. In order to account for the observed distribution of Topo IB in modern archaea, a deep branching of Thaumarchaeota requires only one evolutionary event (the loss of the Topo IB gene in an ancestor of Euryarchaeota and Crenarchaeota, after their divergence from Thaumarchaeota) (blue cross, Figure 1B-b). Accordingly, the presence of a Topo IB in C. symbiosum and N. maritimus may represent an ancestral archaeal feature. In contrast, if the root of the archaeal tree is located in either the euryarchaeal or the crenarchaeal branch, two independent losses of Topo IB would be required to explain the observed data (i.e. in the ancestor of Crenarchaeota and in the ancestor of Euryarchaeota, blue crosses, Figure 1B-c and 1B-d). Thus, the evolutionary history of Topo IB provides additional and independent evidence consistently with a rooting of the archaeal tree in the thaumarchaeal branch [24].
Topo IB have been for long thought to be absent in Archaea. Our finding now extends the presence of Topo IB homologues in members of all three domains of life. This may thus suggest that this enzyme was already present in the Last Universal Common Ancestor (LUCA). However, Topo IB homologues are either absent or scarcely distributed in complete genomes from most main bacterial phyla (Additional files 3). Moreover, the bacterial part of the Topo IB tree is not congruent with the bacterial specie tree (i.e. the monophyly of main bacterial groups is not recovered, Figure 2), suggesting that the history of Topo IB in Bacteria was dominated by lateral gene transfers. It was previously suggested that the viral-like Topo IB found in Bacteria was originally introduced from a DNA virus [6]. Our new and more detailed phylogenetic analysis, as well as the similarity of the domain organisation of viral and bacterial Topo IB, confirms the close relationship between these sequences and their probable common ancestry, although the direction of transfer is yet unclear.
The likely presence of both a Topo IA and Topo IB in the last common archaeal ancestor ([6] and this study, respectively), suggests that this ancestor was possibly more "complex" than modern archaea (if complexity is defined in terms of number of genes and/or redundancy of cellular processes). This idea was already proposed by Lecompte et al. who highlighted a streamlining in the evolution of archaeal ribosomes [29]. This is consistent with the recent observation that several proteins common to Archaea and Eukaryotes are missing in either Crenarchaeota, Euryarchaeota or Thaumarchaeota [27] and may indicate a possible tendency of evolution by streamlining of some central molecular processes in the archaeal domain. Finally, one of us has recently proposed that a transition from RNA genomes to DNA genomes occurred independently in each of the three life domains by the contribution of three different DNA viruses to three complex RNA cells [30]. The idea of different DNA viruses at the origin of Archaea and Eucarya sought to explain the existence of several critical differences in their DNA replication systems, including the ancestral presence of a Topo IB exclusively in Eucarya. Our finding that the last common ancestor of Archaea and Eucarya probably contained a Topo IB weakens this argument, and is more in favour of a DNA genome for this ancestor.