Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
© Makarova et al; licensee BioMed Central Ltd. 2007
Received: 02 November 2007
Accepted: 27 November 2007
Published: 27 November 2007
An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes.
New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems.
The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: ftp://ftp.ncbi.nih.gov/pub/koonin/arCOGs/.
This article was reviewed by Peer Bork, Patrick Forterre, and Purificacion Lopez-Garcia.
A robust classification of genes based on accurately deciphered evolutionary relationships is the cornerstone of comparative and evolutionary genomics. Such a classification is indispensable both for the functional annotation of sequenced genomes and for any genome-wide evolutionary reconstruction. The construction of an evolutionary classification of genes is a non-trivial task because of the complexity of homologous relationships between genes. The two principal classes of homologs are orthologs and paralogs. Orthologs are homologous genes that evolved via vertical descent from a single ancestral gene in the last common ancestor of the compared species. Paralogs are homologous genes, which, at some stage of evolution, have evolved by duplication of an ancestral gene [1, 2]. Orthology and paralogy are intimately linked because, if a duplication (or a series of duplications) occurs after the speciation event that separated the compared species, orthology becomes a relationship between sets of paralogs, rather than individual genes (in which case, such genes are called co-orthologs).
Correct identification of orthologs and paralogs is of central importance for both the functional and the evolutionary aspects of comparative genomics. Orthologs typically occupy the same functional niche in different organisms; by contrast, paralogs evolve to functional diversification as they diverge after the duplication [3, 4]. Therefore, the accuracy of genome annotation critically depends on the accurate identification of orthologs . A clear demarcation of orthologs and paralogs is also required for constructing evolutionary scenarios which include, along with vertical inheritance, lineage-specific gene loss and horizontal gene transfer (HGT) [6–8].
In principle, orthologs, including co-orthologs, should be identified by means of phylogenetic analysis of entire families of homologous proteins in the compared genomes, which is expected to define orthologous protein sets as clades. However, for genome-wide protein sets, such analysis remains extremely labor-intensive, and error-prone as well . Accordingly, procedures have been developed for identification of sets of likely orthologs without an explicit referral to phylogenetic analysis. These procedures are based on the notion of a genome-specific best hit (BeT), i.e., the protein from a target genome that is most similar (typically, in terms of similarity scores computed using BLAST or another sequence comparison method) to a given protein from the query genome [10, 11]. The assumption central to this approach is that orthologs have a greater similarity to each other than to any other protein from the respective genomes. When multiple genomes are analyzed, pairs of probable orthologs detected on the basis of BeTs are combined into orthologous clusters represented in all or a subset of the analyzed genomes. This approach, amended with additional procedures for detecting co-orthologous protein sets and for treating multidomain proteins, was implemented in the database of Clusters of Orthologous Groups (COGs) of proteins [11, 12]. The latest COG set released in 2003 includes ~70% of the proteins encoded in 69 genomes of prokaryotes and unicellular eukaryotes . The COGs have been employed for functional annotation of newly sequenced genomes (e.g. [14, 15], comparative analysis of gene neighborhoods [16–18] and other types of connections between genes, as implemented in the widely used STRING tool , target selection in structural genomics (e.g. , and various genome-wide evolutionary analyses [7, 8]. Independently, other groups have developed similar methodologies for identification of orthologs and paralogs in pairwise or multiple genome comparisons [21, 22]. Very recently, a major effort on automatic construction of sets of orthologous genes has culminated in the EggNOG database which employed the COGs as a prototype and a seed .
The methods for the construction of COGs were developed and originally applied to small sets of genomes; these and other related methods do not guarantee correct identification of the paralogous and orthologous relationships, due to the variability of domain architectures of proteins, differential loss of paralogs in different lineages, extreme divergence of some orthologous and paralogous genes, and other complications [2, 12, 13]. The computational cost of exhaustive genome comparisons also grows almost prohibitively with the steep increase in the number of sequenced genomes which approached 500 in the beginning of 2007 . Thus, several smaller scale studies have been conducted in which COGs were constructed for compact groups of bacteria including the Thermus-Deinococcus group , Cyanobacteria , and Lactobacillales . In each of these analyses, a considerably better resolution of the homologous relationship than in the overall COG set has been achieved.
In the previous comparative-genomic analyses of archaea, we delineated COGs for this domain of life and used them to partition archaeal genes into the evolutionarily stable, conserved core and the "shell" of genes that are often lost during evolution or are characteristic of a narrow group of species ; we further traced the dynamics of drop in the number of the core genes with sequencing of additional archaeal genomes [28, 29].
Here we present the updated set of COGs that includes 41 sequenced archaeal genomes and delineate the core sets of genes that are represented in all archaea or in the major archaeal divisions, Euryarchaeota and Crenarchaeota. We further describe evolutionary reconstructions aimed at inferring the nature of the Last Archaeal Common Ancestor (LACA) and other ancestral forms, and uncovering the trends of gene loss and gain during archaeal evolution.
Results and Discussion
The archaeal genomic data set and construction of archaeal COGs
The 41 archaeal genomes included in the arCOGs
Genome size, Mb
Number of annotated protein-coding genes
Life style and other features
Aerobic chemorganotroph, sulfur enhances growth
Caldivirga maquilingensis IC-167
Moderate acidophile, heterotroph, anaerobe or microaerophyle
Moderate psychrophile, uncultivated symbiont of sponges
Hyperthermophilic neutrophile, anaerobe
Facultative nitrate-reducing anaerobe
Pyrobaculum calidifontis JCM 11548
Same as Pyrae
Pyrobaculum islandicum DSM 4184
Same as Pyrae
Staphylothermus marinus F1
Anaerobic submarine heterotroph
Sulfolobus acidocaldarius DSM 639
Sulfur-metabolizing chemorganotroph, thermoacidophilic, motile aerobe
Same as Sulso
Thermofilum pendens Hrk 5
Facultative hydrogen-sulfur authotroph, anaerobe
Motile, anaerobic, sulfate-reducing chemolito- or chemorgano- autothroph
Haloarcula marismortui ATCC 43049
Chemoorganotrophic obligate halophile
Aerobic chemorganotroph, obligate halophile, proteolytic, motile, with cell envelope; 2 extrachromosomal elements
Halophilic, aerobic heterotroph
Chemolitoautothroph, strict anaerobe, nitrogen-fixing methanogen
Methanococcoides burtonii DSM 6242
Psychrotolerant, strictly anaerobic, slightly halophilic methylotroph
Chemolito-autothrophic, strictly anaerobic, motile methanogen, 2 extrachromosomal elements
Methanococcus maripaludis C5
Mesophilic hydrogenotrophic, nitrogen-fixing methanogen
Methanococcus maripaludis S2
same as MetmC
Methanocorpusculum labreanum Z
Strictly anaerobic, CO2 fixing methanogen
Methanoculleus marisnigri JR1
Strictly anaerobic methanogen
Chemolito-autothrophic, strictly anaerobic, methanogen, high intracellular salt concentration
Methanosaeta thermophila PT
Strictly anaerobic methanogen
Chemolito-autothrophic, anaerobic, nitrogen-fixing, versatile methanogen, motile, forms multicellular structures
Methanosarcina barkeri fusaro
Same as Mac
Same as Mac
Methanogen, human intestinal inhabitant
Methanospirillum hungatei JF-1
Strictly anaerobic methanogen
Picrophilus torridus DSM 9790
Extremely acidophilic moderate thermophile
Same as Pho
Same as Pho
Anaerobic, motile heterotroph
Thermococcus kodakaraensis KOD1
Chemorganotrophic, thermoacidophilic, motile facultative anaerobe
Same as Tac
Uncultured methanogenic archaeon
Methanogen isolated from rice rhizosphere
Obligate symbiont of the crenarchaeon Ignicoccus
Coverage of archaeal genomes with arCOGs
Phyletic patterns, conserved cores and variable shells of archaeal genomes
The 10 most common phyletic patterns in the arCOGs
Number of arCOGs
Metac, Metba, Metma
Halma, Halsp, Halwa, Netph
Sulac, Sulso, Sulto
Pyrae, Pyrca, Pyris, Thete
Pyrab, Pyrfu, Pyrho, Theko
Picto, Theac, Thevo
Applications of arCOGs for evolutionary genomics of archaea: gene-content tree, evolutionary reconstructions, and putative phylogenetic of core and shell genes
From the inception of the COG methodology, it had been realized that COGs have potential for straightforward evolutionary-genomic applications. One of these is the construction of gene-content trees whereby the phyletic patterns of COGs are converted into a distance matrix between the analyzed genomes, with an appropriate normalization for genome size [37, 38, 40](see Materials and Methods).
Major features of the reconstructed gene set of LACA
No. of arCOGs
Implication for LACA
Complete translation system and essentially complete set of enzymes for tRNA and rRNA modification
aaRS and related enzymes
Moderately sophisticated transcription control
RNA polymerase subunits
Replication, recombination and repair
Advanced DNA replication and repair system
DNA polymerase subunits
Energy production and conversion
Membrane-based redox bioenergetics; partial TCA cycle
NADH dehydrogenase or Na+/H+ antiporter
V-type ATPase-ATP synthase
Carbohydrate transport and metabolism
Moderately sophisticated sugar metabolism
Amino acid transport and metabolism
Enzymes for the biosynthesis of all amino acids
Amino acid biosynthesis
Nucleotide transport and metabolism
Enzymes for the biosynthesis of all nucleotides
Coenzyme transport and metabolism
Enzymes for the biosynthesis of all essential cofactors
Lipid transport and metabolism
Fully developed membrane
Cell wall, membrane and envelope biogenesis
Fully developed cell wall
Inorganic ion transport and metabolism
Sophisticated ion uptake system
Secondary metabolites biosynthesis, transport and catabolism
Limited or unknown
Limited motility and/or conjugation
Posttranslational modification, protein turnover, chaperones
Sophisticated system of protein fate control
Cell cycle control
Limited or unknown
Signal transduction mechanisms
Limited use of bacterial type signal transduction system; original signal transduction
Intracellular trafficking and secretion
Fully developed secretion system
Viruses abundant at LACA times
Poorly characterized or unknown
Comparing these observations with those presented in Figs. 3 and 5, one comes to the conclusion that, quantitatively, archaeal genomes are dominated by the relatively mobile "shell" genes that belong to the common prokaryotic gene pool and encode the overwhelming majority of metabolic, structural, and signal transduction functions; a sharp contrast is presented by the stable, archaeo-eukaryotic core of information-processing genes. These quantitative conclusions, even if based on a crude analysis, are in a good agreement with the early observations on the bimodal distribution of the taxonomic affinities of archaeal genes , the subsequent observations on the affinities of eukaryotic genes [51, 53], and the complexity hypothesis which posited distinct evolutionary fates of information and operational genes .
The arCOGs, which are expected to be updated as genome sequencing progresses, are a resource for genome annotation of the newly sequenced archaeal genomes and the refinement of the existing annotations, as well as evolutionary reconstructions. Crude reconstructions presented here indicate that the ancestral archaeal forms, including LACA, probably, were full-fledged prokaryotes, of approximately the same level of complexity as the simplest of the modern free-living archaea.
Construction of archaeal COGs
Protein sets for 40 completely sequenced genomes of Archaea were downloaded from the NCBI FTP site  or from the RefSeq section of GenBank (Caldivirga maquilingensis IC-167, Cenarchaeum symbiosum and Uncultured methanogenic archaeon). Protein sequences of Thermoproteus tenax were kindly provided by Bettina Siebers with permission from the sequencing consortium. The procedure of COG construction involved the following steps.
1. All-against-all BLAST  search was used to establish the similarity relationships between the archaeal proteins. Lineage-specific expansions of paralogs were identified essentially as described previously [57, 58]. Initial clusters based on triangles of symmetrical best hits were constructed using a modified COG algorithm [11, 13]; the major difference in the current implementation was the strict symmetry requirement for the "best hit" relationship between proteins. This constraint lowers the number of false-positives but, in the presence of paralogs, leads to substantial underclustering ; this was rectified on the subsequent steps.
2. Multiple alignments of the initial cluster members were constructed using the MUSCLE program ; alignments were used to construct PSSMs for a PSI-BLAST search  against the database of Archaea proteins with the e-value threshold of 0.01; proteins (domains) were added to the corresponding best-scoring original clusters resulting in a set of expanded clusters.
3. Sequences of expanded cluster members were aligned using MUSCLE, and the PSSMs constructed from these alignment were used for a second round of PSI-BLAST search against the database of archaeal proteins. The search results were used to construct a similarity graph for the relationships between the expanded clusters. Formally, all statistically significant (e<0.01) hits in a search with the PSSM for a particular cluster were classified according to the cluster they belong to; clusters in the hit list were ranked according to the mean score across their members (members missing from the hit list were assigned an arbitrary score 2 bits below the significance threshold). An edge between the i-th and the j-th clusters was given weight equal to the lowest rank among the i→j and j→i relationships (i.e., if cluster j is the top-ranking hit when cluster i is the query but cluster i is the third-ranking hit for cluster j, then the edge connecting i and j is given the rank of 3). Connected components were extracted from the graph; pairs of nodes within a connected component were assigned an edge with a rank of infinity if they were not connected directly. A minimum-linkage clustering procedure was applied to the connected sets of clusters (if cluster i and j are merged, the edge between cluster k and the node, representing the merged clusters, is given the rank equal to the lowest rank of k-i and k-j edges), resulting in a rooted dendrogram of relationships between the clusters. Then each node on on the tree was labeled with the number of species that were present in all descendant clusters. Two rules were used to determine if the descendant clusters should be merged: i) if species-coverage of the node is at least 50% greater than that of any of the descendant nodes and ii) if, among the descendants of a node, one is species-rich and the other one is species-poor (formally, if s i >20s j /(10-s j ) where s i and s j stand for the species-coverage of the species-rich and species-poor descendant nodes, respectively).
4. In parallel to the above procedures, a BLAST search against the COG 2003 database was performed, followed by using a modified COGNITOR program [11, 13] to assign archaeal proteins to prokaryotic COGs. Merged clusters with proteins assigned to different COGs were split into COG-specific clusters to avoid clustering of paralogous proteins that previously have been assigned to different curated COGs.
Reconstruction of gene gain and loss events during the evolution of Archaea
Reconstruction of gene gain and loss during the evolution of Archaea was performed using a modified weighted parsimony approach  implemented in a two-pass algorithm. First, a coarse-resolution multifurcating species tree was compiled from several single-gene phylogenetic reconstructions and taxonomic data. For each arCOG, the phyletic pattern indicating the presense/absence of the respective gene in each analyzed species was mapped onto the leaves of the tree. The first pass is performed in the leaves-to-root direction, and the number of descendant nodes containing the given gene is counted for each internal tree node. If this number is greater than or equal to the first (generally, more stringent) threshold, which is set for each node individually, the node is assigned state "1" (presence of the gene), otherwise it is assigned state "0" (absence of the gene). In the second pass, which is performed in the opposite, root-to-leaves direction, if the gene is absent in the given node (state "0") but present in its ancestor and the number of descendant nodes carrying this gene is greater than or equal to the second (generally more relaxed) threshold, the node is assigned state "1". For the guide tree and the thresholds, see .
Reviewer 1: Peer Bork, European Molecular Biology Laboratory
The paper describes the construction of orthologous group for archea.
Given the success of the COGs and KOGs (a subset for eukaryotes with higher resolution) and the inability of current purely automatic procedures to produce reliable orthologus groups and, very importantly, their reliable functional annotation, I see this as an important resource for various studies. Furthermore, it uses a semi-automatic procedure that includes some clever guiding principles e.g. it takes into account phylogenetic gene presence patterns. The average coverage of 88% at a higher resolution than the current 76% COG coverage of genes in archeal genomes is another noteworthy and useful feature. As far as I can see, the arCOGs are of high quality and I look forward to use them.
There is no comparison to more recent orthology-built procedures, but I assume that this semi-automatic procedure presented here provides a more accurate picture than purely automatic methods.
The only concerns I have are availability/formate issues and some minimalistic Figure captions. Both should be easy to solve.
Taken together, I congratulate the authors for this nice, important and very useful piece of work.
Authors' response: The formats of the files on the ftp site were modified to increase transparency, and an extended README file was added. We hope this imporves accessibility which is, indeed, crucial. The figure captions were amended.
Reviewer 2: Patrick Forterre, Institut Pasteur and Université Paris-Sud
The «easy to use» COG database has been especially useful for the biological community. It has helped to improve the quality of genome annotation and has been widely adopted by non bioinformatic experts to perform preliminary rounds of comparative genomic analysis. The main problem with such popular database is the delay in their updating, a daunting task considering the current avalanche of completely sequenced genomes. The present paper by Kira Makarova and colleagues reports a much welcome update of the COG database that focus on archaea (arCOGs). The number of completely sequenced archaeal genomes remains quite low (compared to the situation with bacteria) allowing an exhaustive analysis that remains to be done for bacteria and eukarya. The arCOGs database will be for sure an extremely important source of information for the community working on archaea and for all scientists interested in comparative genomics and microbial evolution. The new analysis corresponds to a substantial increase in information compared to previous one, since around 40% of arCOGs are new.
In addition to the description of the arCOGs database, the paper by Kira Makarova and co-workers present several analyses that bring new (or update) data and raise several interesting evolutionary questions. In particular, they have built a gene-content tree based on the presence-absence of arCOGs in archaeal genome and estimated the evolution of the archaeal genome content along the evolutionary tree based on a gene loss and gain analysis. They reported several intriguing observations that are worth to be discussed in the framework of current debates on archaeal phylogeny and on the nature of the last universal archaeal ancestor.
Makarova and co-workers noticed that the number of strictly specific euryarchaeal and crenarchaeal proteins is very low (one and three, respectively). This seems to strongly argue in favour of the monophyly of Archaea (against the «eocyte» hypothesis). However, it should be interesting to present a slightly «relaxed» version of these cores, by allowing for the possibility for a protein to be missing in a group of related archaea (something quite frequently observed, for instance the lack of the euryarchaeal histone in Thermoplasmatales). More generally, it could be interesting in the future to define a category of conserved arCOGs (carCOGs?) present in all members of at least two archaeal orders in order to discriminate between ORFans arCOGs that are only present in one order (probably «recently» introduced by lateral gene transfer) and arCOGs of probable ancient origin that can tell us something about the evolutionary relationships between the diverse archaeal orders. It should be then interesting to determine if the distribution of such carCOGs correlate with the archaeal phylogeny based on various evolutionary markers.
The parasitic archaeon Nanoarchaeum equitans lacks the larger number (50) of universal arCOG, confirming that this archaeon probably evolved by «genome reduction». Some authors have suggested that N. equitans is a primitive organism. I suspect that there is a relatively high percentage of these 50 proteins that have homologues in Bacteria or Eukarya. This could be indicated as an argument in favour of the reduction scenario versus the "old nano" hypothesis! Interestingly, the gene content tree based on arCOGs groups N. equitans with Thermococcales among Euryarchaeota. Although gene-content trees can be sometimes highly biased by lateral gene transfer, this observation is in good agreement with a preliminary global analysis based on best BLAST-hits and refined phylogenies based on proteins of the small ribosomal subunits, reverse gyrase, Topo VI and elongation factors (Brochier et al.2005). This confirms that N. equitans should not be considered as a member of a new archaeal phylum (as already widely found in text-books!!) but as an odd member of the Euryarchaeota, probably, distantly related to Thermococcales.
Another puzzling observation is the grouping of Cenarchaeum symbiosum with euryarchaea in the gene-content tree. Interestingly, the COG coverage is quite similar for all archaeal genomes (around 88%) except for C. symbiosum and N. equitans. This can be explained by genome reduction in the case of N. equitans, but not in the case of C. symbiosum whose genome has a «normal» size. Significantly, the authors reported that the coverage of C. symbiosum genome with the old COGs was greater than with the new arCOGs! This indicates that this genome contains COGs present in Bacteria or Saccharomyces cerevisiae but not in any other archaeon. The proposed explanation is that C. symbiosum is a symbiotic crenarchaeon that has acquired lots of bacterial genes. An alternative hypothesis is that C. symbiosum is not a crenarchaeon after all, but represents an early branching archaeal phylum that contains bacterial and archaeal homologues that have been lost in other archaea.
From their reconstruction of gene loss and gain events, Makarova and co-workers suggest that the last Universal archaeal ancestor (LACA) was a hyperthermophile and a chemo-litoautotrophe with a minimal number of genes around 1000. They conclude that LACA might have been (nearly) as advanced as modern archaeal hyperthermophiles and found this conclusion quite «unexpected». I am not so surprised. It's a prejudice to think that ancestors are always simpler than present-day organisms and that ancient evolution always occurred toward more "complexity". There is no reason why reductive evolution, which has occurred so often in the evolution of modern cells, was not as pervasive in ancient time (Forterre and Philippe, 1999). In fact, an in-depth analysis of ribosomal protein distribution by Poch and co-workers already suggested a few years ago that the ribosome of LACA was probably more complex that the ribosome of any modern archaea (Lecompte et al., 2002).
Authors' response: We do not, exactly, disagree and certainly realize the importance of reductive evolution. Still, whether or not we should consider the reconstruction of a complex LACA surprising or not, depends on the perspective. Considering that LACA is supposed to be the common ancestor of one of the 3 domains of life, there might be some element of surprise in this observation. After all, at the earliest stages of the evolution of life, there must have been a dramatic increase in complexity. That this complexification stage, apparently, was over by the time the domain of life became distinct (very likely, the same will hold for bacteria) is, certainly, of note. Alternatively, it is conceivable that LACA is actually not as ancient as one might think but represents a more recent bottleneck in archaeal evolution such that there was a complexification stage after the onset of the archaeal domain but it is inaccessible by comparative genomics.
My only criticism of this paper is that the authors have taken a quite conservative view of archaeal phylogeny (only based on 16S rRNA) to analyse gene loss and gain along the archaeal history and to estimate the genome content of LUCA. Indeed, several features of their unresolved multifurcation tree are dubious.
N. equitans appears as an isolated lineages (a third phylum)
C. symbiosum is grouped with hyperthermophilic Crenarchaeota.
Methanopyrus kandleri is shown as an isolated branch
In all these cases, the authors have chosen to follow the 16S rRNA tree, whereas careful analyses based on ribosomal proteins have shown that Methanopyrus kandleri most likely groups with methanococcales and methanomicrobiales (Brochier et al. 2004) and that N. equitans is at least sister-group of euryarchaea (if not of Thermococcales). As previously indicated, the grouping of C. symbiosum with crenarchaea could be also highly misleading. It should have been interesting to compare the genome content of LACA based on the 16S rRNA phylogeny and the more robust phylogeny based on ribosomal proteins. My feeling is that the nature of LACA (chemo-litoautotroph or not, hyperthermophile or not?) is still a pending question.
Authors' response: We have not really followed the 16S RNA tree but rather deliberately chose a poorly resolved topology so as not to subscribe to any particular phylogenetic hypothesis with respect to issues that are still considered unresolved. We are well aware of the published work on archaeal phylogenies and the two important papers by Brochier et al. are cited. Out of fairness, the likely position of Methanopyrus with Methanococcales and Methanobacteriales, was first reported in Slesarev et al. in 2002, and this cited as well. The wording on Methanopyrus in the text was modified to reflect these reports but we did not modify the tree in Fig. 7. One has to keep in mind that the reconstruction here is by no means supposed to be the final word on the scenario of archaeal evolution but more of an exercise showcasing the utility of the arCOGs. We expect that there will be many more iterations with more genomes, better resolved trees, and better methods of reconstruction, and we certainly hope to be involved.
Finally, in the discussion of the gene-content tree, the authors wrote «methanogenesis which are spread both vertically and horizontally». In fact, a detailed phylogenetic analysis of genes involved in methanogenesis by Bapteste and co-workers has shown that, surprisingly, although these proteins can be considered as «operational» they have been only transmitted by vertical inheritance in the archaeal domain (Bapteste et al., 2005).
Authors' response: We believe that the issue is not quite resolved yet. The wording in the paper was softened, nevertheless.
Bapteste E, Brochier C, Boucher Y.
Higher-level classification of the Archaea: evolution of methanogenesis and methanogens.
Archaea.1, 353–363 (2005).
Brochier, C. Forterre P. and Gribaldo S.
Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox
Genome Biology, 5, R17 (2004).
Brochier, C., Gribaldo, S., Zivanovic, Y. Confalonieri, F. and Forterre, P.
Nanoarchaea: representative of a novel archaeal phylum or a fast evolving euryarchaeal lineage related to Thermococcales?
Genome Biology, 6:R42 (2005).
Forterre, P. and Philippe, H
Where is the root of the universal tree of life?
Bioessays, 21, 871–879 (1999).
Lecompte O, Ripp R, Thierry JC, Moras D, Poch O.
Comparative analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale.
Nucleic Acids Res., 30, 5382–5390 (2002).
Reviewer 3: Purificación López-García, CNRS, Université Paris-Sud
This article describes the analysis of genes present in most of the currently available archaeal genome sequences in view of their classification in clusters of orthologous genes specific to the archaea (arCOG). It represents an updated extension of previous comparative genomic analyses of COGs though exclusively devoted to the archaea. As a consequence, the arCOG database produced is more refined, resulting in an increased coverage and resolution. The latter is reflected in the numerical increase of specific archaeal COGs and the accompanying decrease in the number of clusters containing paralogs. The comparison of arCOGs thus defined allows to infer the presence of ~166 core arCOGs, which were likely present in the last archaeal common ancestor (LACA), while 282 and 336 arCOGs appear ancestral to the euryarchaeotal and crenarchaeotal branches, respectively. From the nature of the core arCOGs, the authors conclude that the LACA was a rather complex hyperthermophilic chemoautotroph possessing ~1000 genes. Differential gene gain and loss are predicted to have occurred in the two major archaeal branches. The pattern of arCOG distribution in the different archaeal genomes is used to reconstruct a gene-content tree. Despite biases that may be associated to this approach, which are cautiously recognized by the authors, the tree obtained is largely congruent with widely accepted archaeal molecular phylogenies. Interestingly, Nanoarchaeum equitans is placed within the Thermococcales in agreement with recent detailed phylogenetic analyses, reinforcing the idea that the basal placement of N. equitans in some trees was due to long-branch attraction artifacts. The two major differences of this gene-content tree with respect to previous accepted molecular phylogenies for the archaea are that all methanogenic euryarchaeota, normally split in at least two large groups in molecular phylogenies, cluster together as they share a large number of methanogenesis-related genes, and that Cenarchaeum symbiosum is placed within the Euryarchaota, in disagreement with its expected position within the Crenarchaeota. Although the type of analyses carried out is not innovative, the new arCOG database presented here will certainly be very useful to improve future genome annotations.
I have only a few minor comments or suggestions, as follows:
- First, it has to be noticed that the euryarchaeal core (282 arCOGs) and the crenarchaeal core (336 arCOGs) are not dramatically larger than the pan-archaeal core, emphasizing the general volatility of archaeal genomes.
The affirmation that 282 and 336 arCOGs are not dramatically larger than the 166 core arCOGs appears quite subjective. It is roughly twice the size. How does this compare with the situation in bacteria? It would be nice to include this information here, and even better, to relate/normalize this information to the average genetic distance in a reference conserved genetic marker, such as the 16S rRNA gene.
Authors' response: "Dramatic", certainly, is in the eye of the beholder. We believe the reader will see it that way, so no changes. Comparing to bacteria is dubious because there are no two major groups of bacteria emulating Euryarchaeota and Crenarchaeota. Calibration – complex exercise that goes beyond the scope of this paper.
Defining genome volatility would also be useful. Genome volatility has been defined in the literature as the mean volatility of all codons weighted by their frequency within the genome, codon volatility being a measurement related to the non-synonymous versus synonymous mutations (e.g. Dagan and Graur, Mol Biol Evol 2004, 22:496). I believe the meaning is more informal and vague here, and also subjective. Can you provide a reference showing that archaeal genomes are "volatile"?
Authors' response: Good point, we changed the wording to avoid any wrong connotations, "volatility" is not used anymore.
Horizontal gene transfer from bacteria has apparently contributed to shape the C. symbiosum genome. In page 14, it is mentioned that C. symbiosum falls within the euryarchaeotal part of the gene-content tree. Would you predict that HGT from euryarchaeota may partly explain this observation as some (although very limited) environmental genomic studies appear to suggest (Lopez-Garcia, Brochier et al, Environ Microbiol 2004, 6:19?
Authors' response: Yes, a valid point, we included this possibility in the revision and cite the paper.
This work was supported by the Intramural Research Program of the National Institutes of Health, National Library of Medicine.
- Fitch WM: Distinguishing homologous from analogous proteins. Systematic Zoology 1970, 19: 99-106. 10.2307/2412448PubMedView Article
- Koonin EV: Orthologs, paralogs and evolutionary genomics. Annu Rev Genet 2005, 39: 309-338. 10.1146/annurev.genet.39.073003.114725PubMedView Article
- Ohno S: Evolution by gene duplication. Berlin-Heidelberg-New York , Springer-Verlag; 1970.View Article
- Lynch M, Katju V: The altered evolutionary trajectories of gene duplicates. Trends Genet 2004,20(11):544-549. 10.1016/j.tig.2004.09.001PubMedView Article
- Galperin MY, Koonin EV: Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol 1998, 1: 55-67.PubMed
- Kunin V, Ouzounis CA: The balance of driving forces during genome evolution in prokaryotes. Genome Res 2003,13(7):1589-1594. 10.1101/gr.1092603PubMedPubMed CentralView Article
- Mirkin BG, Fenner TI, Galperin MY, Koonin EV: Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol Biol 2003,3(1):2. 10.1186/1471-2148-3-2PubMedPubMed CentralView Article
- Snel B, Bork P, Huynen MA: Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res 2002,12(1):17-25. 10.1101/gr.176501PubMedView Article
- Sicheritz-Ponten T, Andersson SG: A phylogenomic approach to microbial evolution. Nucleic Acids Res 2001,29(2):545-552. 10.1093/nar/29.2.545PubMedPubMed CentralView Article
- Tatusov RL, Mushegian AR, Bork P, Brown NP, Hayes WS, Borodovsky M, Rudd KE, Koonin EV: Metabolism and evolution of Haemophilus influenzae deduced from a whole- genome comparison with Escherichia coli. Curr Biol 1996,6(3):279-291. 10.1016/S0960-9822(02)00478-5PubMedView Article
- Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science 1997,278(5338):631-637. 10.1126/science.278.5338.631PubMedView Article
- Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 2001, 29: 22-28. 10.1093/nar/29.1.22PubMedPubMed CentralView Article
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003, 4: 41. 10.1186/1471-2105-4-41PubMedPubMed CentralView Article
- Nolling J, Breton G, Omelchenko MV, Makarova KS, Zeng Q, Gibson R, Lee HM, Dubois J, Qiu D, Hitti J, Wolf YI, Tatusov RL, Sabathe F, Doucette-Stamm L, Soucaille P, Daly MJ, Bennett GN, Koonin EV, Smith DR: Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol 2001,183(16):4823-4838. 10.1128/JB.183.16.4823-4838.2001PubMedPubMed CentralView Article
- Makarova K, Slesarev A, Wolf Y, Sorokin A, Mirkin B, Koonin E, Pavlov A, Pavlova N, Karamychev V, Polouchine N, Shakhova V, Grigoriev I, Lou Y, Rohksar D, Lucas S, Huang K, Goodstein DM, Hawkins T, Plengvidhya V, Welker D, Hughes J, Goh Y, Benson A, Baldwin K, Lee JH, Diaz-Muniz I, Dosti B, Smeianov V, Wechter W, Barabote R, Lorca G, Altermann E, Barrangou R, Ganesan B, Xie Y, Rawsthorne H, Tamir D, Parker C, Breidt F, Broadbent J, Hutkins R, O'Sullivan D, Steele J, Unlu G, Saier M, Klaenhammer T, Richardson P, Kozyavkin S, Weimer B, Mills D: Comparative genomics of the lactic acid bacteria. Proc Natl Acad Sci U S A 2006,103(42):15611-15616. 10.1073/pnas.0607117103PubMedPubMed CentralView Article
- Snel B, Lehmann G, Bork P, Huynen MA: STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res 2000,28(18):3442-3444. 10.1093/nar/28.18.3442PubMedPubMed CentralView Article
- Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment, evolution of prokaryotic genome organization and prediction of gene function using genomic context. Genome Res 2001, 11: 356-372. 10.1101/gr.GR-1619RPubMedView Article
- Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL, Szekely LA, Koonin EV: Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res 2002,30(10):2212-2223. 10.1093/nar/30.10.2212PubMedPubMed CentralView Article
- von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Kruger B, Snel B, Bork P: STRING 7--recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 2007,35(Database issue):D358-62. 10.1093/nar/gkl825PubMedPubMed CentralView Article
- Cort JR, Koonin EV, Bash PA, Kennedy MA: A phylogenetic approach to target selection for structural genomics: solution structure of YciH. Nucleic Acids Res 1999,27(20):4018-4027. 10.1093/nar/27.20.4018PubMedPubMed CentralView Article
- Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 2001,314(5):1041-1052. 10.1006/jmbi.2000.5197PubMedView Article
- Li L, Stoeckert CJ Jr., Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 2003,13(9):2178-2189. 10.1101/gr.1224503PubMedPubMed CentralView Article
- Jensen LJ, Julien P, Kuhn M, von Mering C, Muller J, Doerks T, Bork P: eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res 2007, in press. [Epub ahead of print]
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2007,35(Database issue):D5-12. 10.1093/nar/gkl1031PubMedPubMed CentralView Article
- Omelchenko MV, Wolf YI, Gaidamakova EK, Matrosova VY, Vasilenko A, Zhai M, Daly MJ, Koonin EV, Makarova KS: Comparative genomics of Thermus thermophilus and Deinococcus radiodurans: divergent routes of adaptation to thermophily and radiation resistance. BMC Evol Biol 2005, 5: 57. 10.1186/1471-2148-5-57PubMedPubMed CentralView Article
- Mulkidjanian AY, Koonin EV, Makarova KS, Mekhedov SL, Sorokin A, Wolf YI, Dufresne A, Partensky F, Burd H, Kaznadzey D, Haselkorn R, Galperin MY: The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci U S A 2006,103(35):13126-13131. 10.1073/pnas.0605709103PubMedPubMed CentralView Article
- Makarova KS, Aravind L, Galperin MY, Grishin NV, Tatusov RL, Wolf YI, Koonin EV: Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res 1999,9(7):608-628.PubMed
- Makarova KS, Koonin EV: Comparative genomics of Archaea: how much have we learned in six years, and what's next? Genome Biol 2003,4(8):115. 10.1186/gb-2003-4-8-115PubMedPubMed CentralView Article
- Makarova KS, Koonin EV: Evolutionary and functional genomics of the Archaea. Curr Opin Microbiol 2005,8(5):586-594. 10.1016/j.mib.2005.08.003PubMedView Article
- Waters E, Hohn MJ, Ahel I, Graham DE, Adams MD, Barnstead M, Beeson KY, Bibbs L, Bolanos R, Keller M, Kretz K, Lin X, Mathur E, Ni J, Podar M, Richardson T, Sutton GG, Simon M, Soll D, Stetter KO, Short JM, Noordewier M: The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc Natl Acad Sci U S A 2003,100(22):12984-12988. 10.1073/pnas.1735403100PubMedPubMed CentralView Article
- Brochier C, Gribaldo S, Zivanovic Y, Confalonieri F, Forterre P: Nanoarchaea: representatives of a novel archaeal phylum or a fast-evolving euryarchaeal lineage related to Thermococcales? Genome Biol 2005,6(5):R42. 10.1186/gb-2005-6-5-r42PubMedPubMed CentralView Article
- Archaeal Clusters of Orthologous Genes
- Hallam SJ, Konstantinidis KT, Putnam N, Schleper C, Watanabe Y, Sugahara J, Preston C, de la Torre J, Richardson PM, DeLong EF: Genomic analysis of the uncultivated marine crenarchaeote Cenarchaeum symbiosum. Proc Natl Acad Sci U S A 2006,103(48):18296-18301. 10.1073/pnas.0608549103PubMedPubMed CentralView Article
- Gaasterland T, Ragan MA: Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb Comp Genomics 1998,3(4):199-217.PubMed
- Galperin MY, Koonin EV: Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol 2000,18(6):609-613. 10.1038/76443PubMedView Article
- Makarova KS, Wolf YI, Koonin EV: Potential genomic determinants of hyperthermophily. Trends Genet 2003,19(4):172-176. 10.1016/S0168-9525(03)00047-7PubMedView Article
- Snel B, Bork P, Huynen MA: Genome phylogeny based on gene content. Nat Genet 1999,21(1):108-110. 10.1038/5052PubMedView Article
- Wolf YI, Rogozin IB, Grishin NV, Koonin EV: Genome trees and the tree of life. Trends Genet 2002,18(9):472-479. 10.1016/S0168-9525(02)02744-0PubMedView Article
- Beiko RG, Harlow TJ, Ragan MA: Highways of gene sharing in prokaryotes. Proc Natl Acad Sci U S A 2005,102(40):14332-14337. 10.1073/pnas.0504068102PubMedPubMed CentralView Article
- Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV: Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evolutionary Biology 2001.,1(8):
- Slesarev AI, Mezhevaya KV, Makarova KS, Polushin NN, Shcherbinina OV, Shakhova VV, Belova GI, Aravind L, Natale DA, Rogozin IB, Tatusov RL, Wolf YI, Stetter KO, Malykh AG, Koonin EV, Kozyavkin SA: The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc Natl Acad Sci U S A 2002,99(7):4644-4649. 10.1073/pnas.032671499PubMedPubMed CentralView Article
- Brochier C, Forterre P, Gribaldo S: Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox. Genome Biol 2004,5(3):R17. 10.1186/gb-2004-5-3-r17PubMedPubMed CentralView Article
- Lopez-Garcia P, Brochier C, Moreira D, Rodriguez-Valera F: Comparative analysis of a genome fragment of an uncultivated mesopelagic crenarchaeote reveals multiple horizontal gene transfers. Environ Microbiol 2004,6(1):19-34. 10.1046/j.1462-2920.2003.00533.xPubMedView Article
- Rogozin IB, Babenko VN, Wolf YI, Koonin EV: Dollo parsimony and reconstruction of genome evolution. In Parsimony, Phylogeny, and Genomics. Edited by: Albert VA. Oxford , Oxford University Press; 2005:190-200.
- Forterre P: A hot story from comparative genomics: reverse gyrase is the only hyperthermophile-specific protein. Trends Genet 2002,18(5):236-237. 10.1016/S0168-9525(02)02650-1PubMedView Article
- Makarova KS, Grishin NV, Shabalina SA, Wolf YI, Koonin EV: A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol Direct 2006, 1: 7. 10.1186/1745-6150-1-7PubMedPubMed CentralView Article
- Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P: CRISPR provides acquired resistance against viruses in prokaryotes. Science 2007,315(5819):1709-1712. 10.1126/science.1138140PubMedView Article
- Aravind L, Koonin EV: DNA polymerase beta-like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. Nucleic Acids Res 1999,27(7):1609-1618. 10.1093/nar/27.7.1609PubMedPubMed CentralView Article
- Koski LB, Golding GB: The closest BLAST hit is often not the nearest neighbor. J Mol Evol 2001,52(6):540-542.PubMedView Article
- Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV: Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet 1998,14(11):442-444. 10.1016/S0168-9525(98)01553-4PubMedView Article
- Esser C, Ahmadinejad N, Wiegand C, Rotte C, Sebastiani F, Gelius-Dietrich G, Henze K, Kretschmann E, Richly E, Leister D, Bryant D, Steel MA, Lockhart PJ, Penny D, Martin W: A genome phylogeny for mitochondria among alpha-proteobacteria and a predominantly eubacterial ancestry of yeast nuclear genes. Mol Biol Evol 2004,21(9):1643-1660. 10.1093/molbev/msh160PubMedView Article
- Koonin EV, Mushegian AR, Galperin MY, Walker DR: Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol 1997,25(4):619-637. 10.1046/j.1365-2958.1997.4821861.xPubMedView Article
- Dagan T, Martin W: The tree of one percent. Genome Biol 2006,7(10):118. 10.1186/gb-2006-7-10-118PubMedPubMed CentralView Article
- Jain R, Rivera MC, Lake JA: Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci U S A 1999,96(7):3801-3806. 10.1073/pnas.96.7.3801PubMedPubMed CentralView Article
- NCBI genomes
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997,25(17):3389-3402. 10.1093/nar/25.17.3389PubMedPubMed CentralView Article
- Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV: Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res 2001,11(4):555-565. 10.1101/gr.GR-1660RPubMedPubMed CentralView Article
- Lespinet O, Wolf YI, Koonin EV, Aravind L: The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res 2002,12(7):1048-1059. 10.1101/gr.174302PubMedPubMed CentralView Article
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004,32(5):1792-1797. 10.1093/nar/gkh340PubMedPubMed CentralView Article
- Kawarabayasi Y, Hino Y, Horikawa H, Yamazaki S, Haikawa Y, Jin-no K, Takahashi M, Sekine M, Baba S, Ankai A, et al.: Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1. DNA Res 1999,6(2):83-101, 145-52. 10.1093/dnares/6.2.83PubMedView Article
- Brugger K, Chen L, Stark M, Zibat A, Redder P, Ruepp A, Awayez M, She Q, Garrett RA, Klenk HP: The genome of Hyperthermus butylicus: a sulfur-reducing, peptide fermenting, neutrophilic Crenarchaeote growing up to 108 degrees C. Archaea 2007,2(2):127-135.PubMedPubMed CentralView Article
- Fitz-Gibbon ST, Ladner H, Kim UJ, Stetter KO, Simon MI, Miller JH: Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum. Proc Natl Acad Sci U S A 2002,99(2):984-989. 10.1073/pnas.241636498PubMedPubMed CentralView Article
- Chen L, Brugger K, Skovgaard M, Redder P, She Q, Torarinsson E, Greve B, Awayez M, Zibat A, Klenk HP, Garrett RA: The genome of Sulfolobus acidocaldarius, a model organism of the Crenarchaeota. J Bacteriol 2005,187(14):4992-4999. 10.1128/JB.187.14.4992-4999.2005PubMedPubMed CentralView Article
- She Q, Singh RK, Confalonieri F, Zivanovic Y, Allard G, Awayez MJ, Chan-Weiher CC, Clausen IG, Curtis BA, De Moors A, Erauso G, Fletcher C, Gordon PM, Heikamp-de Jong I, Jeffries AC, Kozera CJ, Medina N, Peng X, Thi-Ngoc HP, Redder P, Schenk ME, Theriault C, Tolstrup N, Charlebois RL, Doolittle WF, Duguet M, Gaasterland T, Garrett RA, Ragan MA, Sensen CW, Van der Oost J: The complete genome of the crenarchaeon Sulfolobus solfataricus P2. Proc Natl Acad Sci U S A 2001,98(14):7835-7840. 10.1073/pnas.141222098PubMedPubMed CentralView Article
- Kawarabayasi Y, Hino Y, Horikawa H, Jin-no K, Takahashi M, Sekine M, Baba S, Ankai A, Kosugi H, Hosoyama A, Fukui S, Nagai Y, Nishijima K, Otsuka R, Nakazawa H, Takamiya M, Kato Y, Yoshizawa T, Tanaka T, Kudoh Y, Yamazaki J, Kushida N, Oguchi A, Aoki K, Masuda S, Yanagii M, Nishimura M, Yamagishi A, Oshima T, Kikuchi H: Complete genome sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7. DNA Res 2001,8(4):123-140. 10.1093/dnares/8.4.123PubMedView Article
- Klenk HP, Clayton RA, Tomb JF, White O, Nelson KE, Ketchum KA, Dodson RJ, Gwinn M, Hickey EK, Peterson JD, Richardson DL, Kerlavage AR, Graham DE, Kyrpides NC, Fleischmann RD, Quackenbush J, Lee NH, Sutton GG, Gill S, Kirkness EF, Dougherty BA, McKenney K, Adams MD, Loftus B, Venter JC, et al.: The complete genome sequence of the hyperthermophilic, sulphate- reducing archaeon Archaeoglobus fulgidus [published erratum appears in Nature 1998 Jul 2;394(6688):101]. Nature 1997,390(6658):364-370. 10.1038/37052PubMedView Article
- Baliga NS, Bonneau R, Facciotti MT, Pan M, Glusman G, Deutsch EW, Shannon P, Chiu Y, Weng RS, Gan RR, Hung P, Date SV, Marcotte E, Hood L, Ng WV: Genome sequence of Haloarcula marismortui: a halophilic archaeon from the Dead Sea. Genome Res 2004,14(11):2221-2234. 10.1101/gr.2700304PubMedPubMed CentralView Article
- Ng WV, Kennedy SP, Mahairas GG, Berquist B, Pan M, Shukla HD, Lasky SR, Baliga NS, Thorsson V, Sbrogna J, Swartzell S, Weir D, Hall J, Dahl TA, Welti R, Goo YA, Leithauser B, Keller K, Cruz R, Danson MJ, Hough DW, Maddocks DG, Jablonski PE, Krebs MP, Angevine CM, Dale H, Isenbarger TA, Peck RF, Pohlschroder M, Spudich JL, Jung KW, Alam M, Freitas T, Hou S, Daniels CJ, Dennis PP, Omer AD, Ebhardt H, Lowe TM, Liang P, Riley M, Hood L, DasSarma S: Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci U S A 2000,97(22):12176-12181. 10.1073/pnas.190337797PubMedPubMed CentralView Article
- Bolhuis H, Palm P, Wende A, Falb M, Rampp M, Rodriguez-Valera F, Pfeiffer F, Oesterhelt D: The genome of the square archaeon Haloquadratum walsbyi : life at the limits of water activity. BMC Genomics 2006, 7: 169. 10.1186/1471-2164-7-169PubMedPubMed CentralView Article
- Smith DR, Doucette-Stamm LA, Deloughery C, Lee H, Dubois J, Aldredge T, Bashirzadeh R, Blakely D, Cook R, Gilbert K, Harrison D, Hoang L, Keagle P, Lumm W, Pothier B, Qiu D, Spadafora R, Vicaire R, Wang Y, Wierzbowski J, Gibson R, Jiwani N, Caruso A, Bush D, Reeve JN, et al.: Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics. J Bacteriol 1997,179(22):7135-7155.PubMedPubMed Central
- Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, Kerlavage AR, Dougherty BA, Tomb JF, Adams MD, Reich CI, Overbeek R, Kirkness EF, Weinstock KG, Merrick JM, Glodek A, Scott JL, Geoghagen NSM, Venter JC: Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii [see comments]. Science 1996,273(5278):1058-1073. 10.1126/science.273.5278.1058PubMedView Article
- Hendrickson EL, Kaul R, Zhou Y, Bovee D, Chapman P, Chung J, Conway de Macario E, Dodsworth JA, Gillett W, Graham DE, Hackett M, Haydock AK, Kang A, Land ML, Levy R, Lie TJ, Major TA, Moore BC, Porat I, Palmeiri A, Rouse G, Saenphimmachak C, Soll D, Van Dien S, Wang T, Whitman WB, Xia Q, Zhang Y, Larimer FW, Olson MV, Leigh JA: Complete genome sequence of the genetically tractable hydrogenotrophic methanogen Methanococcus maripaludis. J Bacteriol 2004,186(20):6956-6969. 10.1128/JB.186.20.6956-6969.2004PubMedPubMed CentralView Article
- Galagan JE, Nusbaum C, Roy A, Endrizzi MG, Macdonald P, FitzHugh W, Calvo S, Engels R, Smirnov S, Atnoor D, Brown A, Allen N, Naylor J, Stange-Thomann N, DeArellano K, Johnson R, Linton L, McEwan P, McKernan K, Talamas J, Tirrell A, Ye W, Zimmer A, Barber RD, Cann I, Graham DE, Grahame DA, Guss AM, Hedderich R, Ingram-Smith C, Kuettner HC, Krzycki JA, Leigh JA, Li W, Liu J, Mukhopadhyay B, Reeve JN, Smith K, Springer TA, Umayam LA, White O, White RH, Conway de Macario E, Ferry JG, Jarrell KF, Jing H, Macario AJ, Paulsen I, Pritchett M, Sowers KR, Swanson RV, Zinder SH, Lander E, Metcalf WW, Birren B: The genome of M. acetivorans reveals extensive metabolic and physiological diversity. Genome Res 2002,12(4):532-542. 10.1101/gr.223902PubMedPubMed CentralView Article
- Maeder DL, Anderson I, Brettin TS, Bruce DC, Gilna P, Han CS, Lapidus A, Metcalf WW, Saunders E, Tapia R, Sowers KR: The Methanosarcina barkeri genome: comparative analysis with Methanosarcina acetivorans and Methanosarcina mazei reveals extensive rearrangement within methanosarcinal genomes. J Bacteriol 2006,188(22):7922-7931. 10.1128/JB.00810-06PubMedPubMed CentralView Article
- Deppenmeier U, Johann A, Hartsch T, Merkl R, Schmitz RA, Martinez-Arias R, Henne A, Wiezer A, Baumer S, Jacobi C, Bruggemann H, Lienard T, Christmann A, Bomeke M, Steckel S, Bhattacharyya A, Lykidis A, Overbeek R, Klenk HP, Gunsalus RP, Fritz HJ, Gottschalk G: The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J Mol Microbiol Biotechnol 2002,4(4):453-461.PubMed
- Fricke WF, Seedorf H, Henne A, Kruer M, Liesegang H, Hedderich R, Gottschalk G, Thauer RK: The genome sequence of Methanosphaera stadtmanae reveals why this human intestinal archaeon is restricted to methanol and H2 for methane formation and ATP synthesis. J Bacteriol 2006,188(2):642-658. 10.1128/JB.188.2.642-658.2006PubMedPubMed CentralView Article
- Falb M, Pfeiffer F, Palm P, Rodewald K, Hickmann V, Tittor J, Oesterhelt D: Living with two extremes: conclusions from the genome sequence of Natronomonas pharaonis. Genome Res 2005,15(10):1336-1343. 10.1101/gr.3952905PubMedPubMed CentralView Article
- Futterer O, Angelov A, Liesegang H, Gottschalk G, Schleper C, Schepers B, Dock C, Antranikian G, Liebl W: Genome sequence of Picrophilus torridus and its implications for life around pH 0. Proc Natl Acad Sci U S A 2004,101(24):9091-9096. 10.1073/pnas.0401356101PubMedPubMed CentralView Article
- Cohen GN, Barbe V, Flament D, Galperin M, Heilig R, Lecompte O, Poch O, Prieur D, Querellou J, Ripp R, Thierry JC, Van der Oost J, Weissenbach J, Zivanovic Y, Forterre P: An integrated analysis of the genome of the hyperthermophilic archaeon Pyrococcus abyssi. Mol Microbiol 2003,47(6):1495-1512. 10.1046/j.1365-2958.2003.03381.xPubMedView Article
- Maeder DL, Weiss RB, Dunn DM, Cherry JL, Gonzalez JM, DiRuggiero J, Robb FT: Divergence of the hyperthermophilic archaea Pyrococcus furiosus and P. horikoshii inferred from complete genomic sequences. Genetics 1999,152(4):1299-1305.PubMedPubMed Central
- Kawarabayasi Y, Sawada M, Horikawa H, Haikawa Y, Hino Y, Yamamoto S, Sekine M, Baba S, Kosugi H, Hosoyama A, Nagai Y, Sakai M, Ogura K, Otsuka R, Nakazawa H, Takamiya M, Ohfuku Y, Funahashi T, Tanaka T, Kudoh Y, Yamazaki J, Kushida N, Oguchi A, Aoki K, Kikuchi H: Complete sequence and gene organization of the genome of a hyper- thermophilic archaebacterium, Pyrococcus horikoshii OT3 (supplement). DNA Res 1998,5(2):147-155. 10.1093/dnares/5.2.147PubMedView Article
- Fukui T, Atomi H, Kanai T, Matsumi R, Fujiwara S, Imanaka T: Complete genome sequence of the hyperthermophilic archaeon Thermococcus kodakaraensis KOD1 and comparison with Pyrococcus genomes. Genome Res 2005,15(3):352-363. 10.1101/gr.3003105PubMedPubMed CentralView Article
- Ruepp A, Graml W, Santos-Martinez ML, Koretke KK, Volker C, Mewes HW, Frishman D, Stocker S, Lupas AN, Baumeister W: The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum [In Process Citation]. Nature 2000,407(6803):508-513. 10.1038/35035069PubMedView Article
- Kawashima T, Amano N, Koike H, Makino S, Higuchi S, Kawashima-Ohya Y, Watanabe K, Yamazaki M, Kanehori K, Kawamoto T, Nunoshiba T, Yamamoto Y, Aramaki H, Makino K, Suzuki M: Archaeal adaptation to higher temperatures revealed by genomic sequence of Thermoplasma volcanium. Proc Natl Acad Sci U S A 2000,97(26):14257-14262. 10.1073/pnas.97.26.14257PubMedPubMed CentralView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.