On the molecular mechanism of GC content variation among eubacterial genomes
© Wu et al; licensee BioMed Central Ltd. 2012
Received: 12 August 2011
Accepted: 10 January 2012
Published: 10 January 2012
As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes.
Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group.
Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years.
This paper was reviewed by Nicolas Galtier, Adam Eyre-Walker, and Eugene Koonin.
As one of the key parameters of genome sequences, the genomic GC content, confined to between 25% and 75%, has been investigated for over half a century [1–3]. There are several essential questions to be addressed concerning GC content and its variability. First, how does it vary: randomly, gene-centrically, species-specifically, regulated, or selected? Second, at what level does GC content vary: replication, transcription-coupled, or functionally selected (proteins)? Third, what are the outcomes or biological significances of GC content variability: thermostability, protein-coding requirement, or biased mutations? Fourth, could GC content be changed in vitro globally or locally in terms of genes and genomes? It is obvious that we have very limited knowledge of how a genome ends up with a particular GC content.
Codon usage bias, especially GC content at the third codon position, correlates with the trend of GC content variations , and accumulating evidence indicates that it may be selected by gene expression [5–7]. Therefore, it has been proposed that codon usage bias may be driven by GC content changes, but not vice versa [8, 9]. Mutations should generally conform to two patterns--global or transcript-centric--each derived from different mechanisms. The former is attributable to DNA replication and global repair and the latter is mainly the result of transcription-coupled repair [10–12]. Concerning the fundamental role of the environment or habitat in species evolution [13–15], another way to study GC content variation is to differentiate intrinsic from extrinsic (mostly environmental) factors, and to measure their impacts on GC content variability and evolvability, both qualitatively and quantitatively. Different hypotheses have been proposed by numerous authors to explain why GC content varies and how it is related to different intrinsic and extrinsic factors [16–28].
To better understand the relationship between GC content variation and mutational mechanisms, we attempted to correlate global GC content changes with DNA replication and repair, focusing on prokaryotes [28–30]. We discovered an excellent correlation between GC content variations and the dimeric combinations of DNA polymerase III alpha subunits, which showed that eubacteria can be grouped into different GC variable groups: the full-spectrum or dnaE1 group, the high-GC or dnaE2-dnaE1 group, and the low GC or polC-dnaE3 group . We have extended our analyses into several mutator genes [31, 32] to further elucidate the potential mechanisms.
In this study, we analyzed GC content variability based on a comprehensive evaluation of its relationship to various intrinsic and extrinsic factors, as well as an in-depth investigation of the translesion synthesis (TLS) pathway and its relevant mutator genes. The results indicated that replication and SOS mutagenesis are the major processes affecting GC content, and other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or indirectly rely on different mutator genes to alter the GC content. Our results provide a comprehensive insight into the robustness of eubacterial genomes in adapting to their ever-changing environments through a basic composition parameter change--the GC content.
GC content variations in the three dnaE-based eubacterial groups
Hypotheses proposed to explain GC content variations in eubacteria
Since ultraviolet radiation induces the formation of thyminedimers, higher GC content gives a selective advantage to organisms living in niches that are susceptible to direct and intense sunlight.
Thermophilic organisms demonstrate a tendency to high GC content because thermostable and thermolabile amino acids are encoded by GC-rich and GC-poor codons respectively.
AT to GC mutation
Practically all organisms are subjected to directional mutation pressure and this offers plausible explanations for the intensive GC content heterogeneity among different chromosomal regions of vertebrate genomes.
Differences in directional nucleotide substitution among lineages of mammals can be explained by changes in metabolic physiology. This relationship is thought to be mediated by the effect of oxygen radicals.
Coding sequence length
The longest coding sequences (exons) of vertebrates and genes of prokaryotes are more GC-rich than the shortest ones.
There is a significantly higher GC content in the nitrogen-fixing members of the genus than in those unable to fix nitrogen.
Aerobic prokaryotes display a significant increment in genome GC% in relation to anaerobic ones.
The GC content of complex microbial communities seems to be globally and actively influenced by the environment, such as bacteria in surface water samples having a GC-content median of around 34%, while for soil samples, it is around 61%.
The relationship between genome size and GC level is valid for aerobic, facultative, and microaerophilic species.
DNA polymerase III
According to the dimeric combination of alpha subunits, GC contents of eubacterial genomes are partitioned into three groups with distinct GC content variation spectra: dnaE1 (full-spectrum), dnaE2/dnaE1 (high-GC), and polC/dnaE3 (low-GC).
Bacteriological features among the dnaE-based eubacterial groups
We explored the correlation between our grouping scheme (which is largely GC content-related and mechanism-based) and a variety of bacteriological features, including oxygen requirement, temperature, habitat, and several metabolic features.
Oxygen requirements of the dnaE-based groups
Temperature adaptations of the dnaE-based groups
Hosts of the dnaE-based bacterial groups
Metabolic features of eubacteria in the dnaE-based groups
The correlation of mutator genes, dnaE2 and polC, to GC content variation
The correlation of genome size with GC content
Mutator genes and GC content variations in the dnaE-based groups
Tropheryma whipplei TW08 27
Anaplasma marginale St Maries
Bifidobacterium adolescentis ATCC 15703
Thermus thermophilus HB27
Leifsonia xyli xyli CTCB0
Deinococcus geothermalis DSM 11300
Flavobacterium psychrophilum JIP02 86
Lactobacillus delbrueckii bulgaricus
Moorella thermoacetica ATCC 39073
Clostridium novyi NT
Clostridium tetani E88
The gain-and-loss of mutator genes underlies GC content variation
Deficiencies in mutator genes can dramatically increase the mutation rate [31, 32, 36, 37]. For example, in the absence of both mutY and mutM, thousands-fold increase in CG-to-AT mutations was observed, and the same magnitude of mutations is evident in mutT-deficient strains, but with an opposite mutation spectrum, namely AT-to-GC . Therefore, the isolation and characterization of mutator genes have led to a better understanding of mutation mechanisms. Mutation-driven bacterial adaptive strategies to the environment are widely reported to be beneficial for bacteria in surviving periods of stress, such as starvation and drug exposure [37–40]. It could be argued that such mutator loss is very rare in evolution, yet there is evidence indicating that the incidence of mutator strains among pathogenic isolates is quite high [37, 41].
The mutators, dnaE2 and polC, are two major contributors to GC content variation
Our analysis demonstrates that the existence of dnaE2 and polC is associated with higher GC (>50%) and lower GC contents (<50%), respectively. To further verify the association between dnaE dimer asymmetry and GC content variation, we also carried out two case studies on several closely related bacteria to exclude the contribution of phylogenetic distance, because GC content also displays a strong phylogenetic signal . Our results clearly indicated that gain-and-loss of dnaE2 can greatly increase or decrease the GC content, respectively, providing further evidence that dnaE2 is the major contributor to GC content variation in the dnaE1|dnaE2 group. In addition, we also found that a single copy of dnaE2 in S. thermophilum IAM 14863 leads to an Actinobacteria-like high GC content. There has been some debate about the status of this bacterium: whether it belongs to the Actinobacteria because of its high GC (69%) or to Firmicutes, which share its bacteriological features. Recently, it was confirmed that S. thermophilum IAM 14863 is a member of the Firmicutes , and our analysis agrees with that study and suggests that its Actinobacteria-like high GC content is a result of an additional copy of dnaE2, possibly gained through horizontal gene transfer (HGT). Its higher GC content should not be considered as a factor confounding its taxonomic position. Furthermore, increasing evidence indicates that dnaE2 may participate in SOS mutagenesis through the TLS pathway instead of replication [44–47], as it is a possible member of the error-prone Y family polymerases. Furthermore, bacteria without dnaE2 normally have the TLS-related polV for functional compensation [48, 49]; therefore, we believe that these polymerases are associated with the replication machinery and have strong influences on DNA synthesis, leading to biased compositional changes (e.g., pol η and pol κ lead to AT-rich DNA and pol ζ and Rev1 lead to GC-rich DNA) .
As to the relationship between polC and high AT content, we only found one example, namely bacterium P. thermopropionicum SI, whose loss of polC is consistent with its higher GC content as compared to the average of other Firmicutes. In addition, we found that the linear correlation between GC content and genome size in the dnaE3|polV bacteria tends to have a less steep slope compared with that in the dnaE1|polV group, which further suggests that polC may be responsible for the lower level of GC content in the dnaE3|polV group.
The loss of AT-increasing mutator genes may contribute to genome size reduction and GC content variation
Our analysis showed that Treponema pallidum (#3) has lost mutT but possesses mutY. The lost of mutT may be related to its 15% higher GC content as compared to its phylogenetically closely related relative, T. denticola ATCC 35405, which has both mutT and mutY. A similar situation is also found in Anaplasma marginale St Maries (#2). However, the reason it has a higher GC content (8%) than the closely related A. phagocytophilum HZ is not because of its loss of mutT, as neither of them possess mutT, but may be attributable to the absence of mutY in the latter bacterium. Despite the fact that dnaE1|dnaE2 bacteria were not included in this part of the analysis, we still managed to find an example. Yoji Nakamura et al. found that the GC content of Corynebacterium efficiens is 10% higher than that of C. glutamicum and C. diphtheriae, probably because it lacks mutT . Whether each mutator gene is a causative factor for a particular GC content variation requires further experiments and a larger dataset, which may prove problematic when HGT is factored in.
It is well established that genome size is positively correlated with GC content. Our analyses not only confirmed this notion, but also showed that this correlation is more pronounced in the dnaE1|polV and dnaE3|polV groups, especially when the gene number of each bacterium is less than 2,500. Generally, bacteria with <2,500 genes often experience genome reduction or gene loss. Therefore, the strong and significant positive correlation between genome reduction and AT increase may reflect dramatic gene losses, especially the loss of mutator genes, because mutator gene defects cause AT increase more than GC increase  (Additional file 2). To test this hypothesis, the correlation between GC content and gene number for bacteria possessing less than 2,500 genes was examined, revealing the underlying reasons for these outliers. For instance, those belonging to 'high-GC' are all confirmed to have lost their mutT gene. In other words, when a genome suffers a significant size reduction, it most likely experiences both loss of mutator gene (s) and AT-increase. The fact that most insect pathogens undergo genome reduction and possess AT-rich genomes is testimony to this hypothesis [51–53]. A more rigorous analysis is required to confirm whether the observed higher number of de novo GC-AT mutations [54, 55] are directly related to the loss of AT-increasing mutator genes.
A recent study investigated specificity and rates of different mutational biases of the Salmonella typhimurium genome in the absence of major DNA repair systems , where mutator genes result in GC-to-AT mutations. By sequencing two S. typhimurium mutants grown for 5,000 generations, they observed that the mutation spectrum coincides with the expected pattern, where among the 943 identified nucleotide substitutions, 91% were GC-to-TA transversions and 7% were GC-to-AT transitions . This is the first large-scale genomic level experiment that confirms the relationship between mutator genes and genome GC variation, and strongly supports our hypothesis.
Environmental factors do correlate with GC variation, but to a variable extent
Our dnaE-based grouping scheme not only guides GC content analysis, but also provides a framework for the analysis of different environmental factors. Taking temperature as an example, we found that thermophilic Thermoanaerobacter tengcongenesis, presumed to have a higher GC content, and non-thermophilic Streptomyces coelicolor, presumed to have a lower GC content, actually have genomic GC contents of 38% and 72%, respectively. However, our grouping scheme explains the contradiction: the former is a dnaE3|polV bacterium, while the latter is a dnaE1|dnaE2 bacterium.
Another minor correlation between GC content and environmental factors was found when the habitats of various bacteria were examined. It was reported that the environment plays an active role in shaping GC content, such as surface water vs. soil, and indeed, bacteria living in aquatic conditions have an average GC content of ~34%, whereas soil-dwellers have an elevated GC content of ~61% . Our grouping scheme confirms that the former are mostly dnaE1|polV bacteria and the latter are mostly dnaE1|dnaE2 bacteria. But the six aquatic bacteria are observed to have higher GC content than soil-dwelling bacteria within the dnaE3|polV group. Further analysis reveals that, among the six aquatic bacteria analyzed, five are thermophiles and one is uranium/chromium-reducing. This also raises the question as to whether dwelling conditions are relevant or if they are simply an ascertainment bias introduced by the difference of species distribution under different environmental conditions or metagenomics. Therefore, we should be very cautious when addressing the relationship between environmental or bacteriological features and genomic GC content, especially when the number of genomes analyzed is rather limited.
In summary, although the contribution of oxygen requirement, nitrogen-fixing, terrestrial dwelling, and larger genome size to GC content variation has been discussed within a unified scheme, some of the previously identified correlations (Table 1) should be reconsidered, as there is a higher chance for these bacteria to be members of the dnaE1|dnaE2 group. Therefore, taxonomy-based classification should be factored in for this type of analyses when there are sufficient sequenced genomes in the near future.
Is GC content variation intrinsic or driven by environmental factors?
Based on our dnaE-based grouping scheme, we believe that GC content variation is governed by replication and repair mechanisms, but is influenced by environmental factors. As prokaryotes, eubacteria are robust, but have never evolved to be more complex. Such robustness builds upon genome variations that are promulgated by a large population. These variations in genome compositions permit the loss, acquisition, or change in DNA sequences. When such composition dynamics are at work, bacterial GC contents comply with our grouping scheme, regardless of whether they are mutating for the better or are being selected and suffering a bottleneck. For detailed tendencies, specific conditions should be investigated and different mechanisms proposed. Future investigations will comprise more detailed analysis of outliers that either have extreme GC contents, or do not follow the dnaE-based rules. Experiments to construct new organisms whose grouping scheme is disrupted will also be performed. Extreme environmental conditions could be applied to the three bacterial groups separately to enforce selective pressure to determine if they are able to produce the predicted mutation spectrum mirroring that seen in naturally isolated counterparts.
DNA polymerase III α subunit and its isoforms participate either in replication (such as polC) or in SOS mutagenesis/TLS (such as dnaE2), playing a dominant role in producing GC variations that can be classified into three basic spectra: GC variable, high GC, and low GC groups. Mutator genes, especially those that have dominant effects on mutation spectra towards either GC or AT content biases, can also alter GC content in either direction to a certain extent. For example, the presence of dnaE2 is a definite sign of higher GC content. Increased bacterial genome size (gene number) appears to rely on genomic GC content increase. However, it is unclear whether the changes are directly related to certain environmental requirements. Indeed, environmental factors do influence GC content variation, but the correlations are more obvious when analyzed under our dnaE-based grouping scheme. For example, most terrestrial, plant-associated, and nitrogen-fixing bacteria are of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those live in aquatic environments, belong to the dnaE1|polV group.
The non-redundant eubacterial grouping was based on a random selection of a single isolate or strain from the collection in the NCBI (National Center for Biotechnology Information) databases (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/), yielding 364 non-redundant bacterial genomes. We classified them into dnaE1-dnaE1|polV (173 genomes), dnaE1-dnaE1|dnaE2 (115 genomes), and polC-dnaE3|polV (76 genomes) according to their presence of DNA polymerase III alpha subunit and damage-inducible dnaE2 or polV.
We collected most of the related information for the 364 non-redundant bacterial dataset from the Bergey's Manual of Determinative Bacteriology (9th edition, 1994) , NCBI's Entrez Genome Project database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj).
To avoid the interference of phylogenetic distance with GC content, we selected two special groups, Shewanella and Mycobacterium, where there are sufficient closely related genomes for the analysis, yet they belong to two different dnaE-based groups, dnaE1|polV and dnaE1|dnaE2, respectively. In addition, we constructed an OGT dataset to analyze the relationship between OGT and GC content. We randomly chose ten thermophiles with definite OGTs across three phyla (Firmicutes, Actinobacteria, and Thermotogae) and in two dnaE-based groups (dnaE1|dnaE2 and dnaE3|polV) for an in-depth analysis. We employed MEGA (version 4.0)  to construct all phylogenetic bootstrap trees using the neighbor-joining method  based on 16S rRNA sequences.
Identification of mutator genes
To identify mutator genes, we collected 13 experimentally confirmed common mutator genes and used the online BLAST tools (http://blast.ncbi.nlm.nih.gov/Blast.cgi) for in silico identification in all candidate bacterial genomes. Both protein size (the cutoff value > 50%) and sequence homology (E-value 1 × 10-5) were considered.
Nicolas Galtier, CNRS-Université Montpellier II, France
This article revisits the literature about genomic GC-content distribution across bacteria in the light of variations in the structure of the catalytic subunit of DNA polymerase III. Three classes of the dimeric subunit of DNA pol III have been described in bacteria, each influencing the genomic GC-content in a specific way.
This paper confirms/demonstrates that DNA pol III is a major determinant of between-species GC-content variations in bacteria, and pinpoints a couple of previous studies in which inappropriate conclusions were reached by not accounting for this effect.
In my opinion, this manuscript contains two important results, which revive and illuminate long-lasting controversies. The first one is about the relationship between GC-content and aerobiosis. We have known for ten years or so that aerobic bacteria show a higher GC-content than anaerobic ones, on average, and this is paradoxical given that C->T and G->A are generally the most common mutations in oxidative context. This study demonstrates that the relationship is largely, or entirely, explained by the differential usage of DNA pol III subunit between aerobes and anaerobes: aerobes tend to carry the GC-enriching polymerase, and anaerobes the AT-enriching one. The second strong result, in my opinion, is about the relationship between genomic GC-content and optimal growth temperature (OGT), two variables that were found unrelated across prokaryotes [60, 61]. Here it is shown that, within each of the three categories of DNA pol III, GC% and OGT do correlate positively. The reason why this relationship did not come out in all-species analyses is that thermophiles most frequently use the AT-enriching polymerase, and mesophiles or psychrophiles the GC-enriching one. It seems to me that these two results, if confirmed, should have a strong impact on bacterial comparative and environmental genomics, in which GC-content variations are obvious, and so far poorly understood.
That said, I have a number of comments/concerns about the form of the paper, the underlying statistics, and its potential implications, which I hope might help improve the manuscript.
I would suggest introducing the current work as an attempt to account for a confounding factor so far overlooked. Currently the manuscript focuses on their importance of replication genes in GC-content variations, but this very result was previously published (by the same authors), and this study does not add so much to that argument.
Rather, I would suggest developing the two results I outline above: specifically review the relevant bibliography; show the GC%/OGT relationship within DNA pol groups, and globally (similarly to figure 5b); perform two-way ANOVA of GC% on DNA pol category and OGT (on one hand), and on DNA pol category and aerobiosis (on the other hand), and discuss the percentage of variance of GC% explained by these variables; conclude about misinterpretations in existing literature.
By comparison, it seems to me that the analyses of ecological and metabolic features and of genomic gene content (figure 2c, 2d, 5, 6) add less to existing bibliography. I would suggest shortening these sections, and especially the section about gene number, in which separating species by DNA pol III classes does not appear to change much of the prevailing hypotheses.
Authors' response : After analyzing the contribution of OGT and oxygen requirement to GC content variation, based on our dnaE-based group framework, we think that it is necessary for us to perform analysis on the contribution of other related factors, such as several ecological and metabolic features, to provide evidence for the universality of the dnaE-based grouping scheme. For example, plant- and terrestrial-associated bacteria that are reported to have higher GC content are mostly grouped in the dnaE1|dnaE2 group. Therefore, we think that some of the previously described relationships between GC content and environmental factors may also fall into our scheme, but have not been realized. Indeed, from Tables 4and 5, we observe that there are still not enough data for a meaningful statistical analysis. We hope that we can draw a more significant conclusion in the near future, when more bacterial genome sequences become available. As to the analysis performed on gene number, our major conclusion is that the dnaE2 group bacteria that have a higher GC content tend to have larger genomes, in contrast to the opposite situation in the dnaE3 group bacteria. Therefore, we believe that the positive correlation between genome size (or gene number) and GC content is much more pronounced when analyzed under our dnaE-based grouping scheme.
The manuscript does not explicitly address the problem of phylogenetic independence of the observations. The author might think of using the Independent Contrast method, or any related method, to check further the significance of the relationships they uncover. At any rate, the authors must give an idea of the phylogenetic distribution of the three classes of DNA pol III: are they scattered throughout the bacterial tree, or clustered by phyla/families? This is partly answered by figure 4, in which within-genus variations of DNA pol III class are reported, somewhat suggesting that the phylogenetic inertia on this trait is weak. Confirmation welcome.
Figure 3, figure 4 and many sentences in the manuscript make convincing cases suggesting that changes in DNA pol III affect bacterial GC-content evolution. However, I wonder how representative are these examples: were they specifically selected to illustrate the main pattern reported in this study, or are they more or less random instances? Figure 3: why choosing just ten thermophilic species, and why these ten?
Authors' response : We thank the reviewer for his constructive comments. We wanted to explain the ambiguous relationship between OGT and GC content based on real data. The reasons we choose these 10 bacteria are as follows. First, we needed to select bacteria that have precise OGT information. Second, to exclude the interference of phylogenetic distance with GC content, we need to select several bacteria that have close phylogenetic relationships in each phylum. Third, all the bacteria should fall into the three different dnaE-based groups evenly. Fourth, both their GC content and OGT have to vary significantly.
Figure 4: are Shewanella and Mycobacterium the only genera showing variations in DNA pol III? If not, could you please provide a more global picture, and mention counter-examples if there are some? I have a similar concern about the discussion, in which the focus is presumably put on examples fitting the general theory, not counter-examples.
Along the same lines, the removal of "outliers" (figure 6) does not appear justified to me, even though I agree that horizontal gene transfer presumably perturb the observed relationship, which is good to mention.
Authors' response : Agreed. We further revised the corresponding description by performing linear regression analysis and removing the "outliers" by more robust upper and lower 90% prediction limits.
It seems to me that the surprising report by Foerstner et al.  of very different GC- content distributions between distinct environmental samples (despite comparable representation of the bacterial phyla) could reflect a differential usage of the three DNA pol III across environments. This could perhaps be checked by identifying DNA pol III sequences in the corresponding metagenomic data.
Having demonstrated that the DNA pol III subunit plays a major role in GC% variations, it is tempting to ask what determines variations in DNA pol III usage across groups of bacteria. For instance: do aerobic bacteria most frequently use the GC-enriching DNA pol III because it is GC-enriching, or because it is more efficient in aerobic conditions, and incidentally GC-enriching?
Authors' response : The reviewer poses a very interesting and challenging question here. We believe that the four dnaE isoforms diverged at a very early stage of eubacterial evolution and drove the bacteria towards not only different GC contents, but also different evolutionary routes or landscapes, either randomly or under environmental pressures. Over time, bacteria that possess different dnaE isoforms have favored different environments, leading to the current diversity.
The manuscript would strongly benefit from English corrections
Abstract (and introduction, last paragraph):
"The contribution of other environmental or bacteriological factors, such as genome size, temperature, oxygen requirements, and habitats, either indirectly rely on the choice of mutator genes or take the advantage of their fine-tuning effect on the trends determined by other factors." This sentence is unclear to me and probably deserves rephrasing.
The Background section introduces codon usage biases and transcription-coupled mutation/repair, but these two aspects are not addressed in this study. The potential role of OGT, aerobiosis, metabolism and environment are not, or very briefly, introduced.
Table 2 and figure 2a: I suggest grouping "microaerophilic" with "anaerobic" (or "microaerophilic" with "facultative" if you think it is more appropriate). This is because percentages are meaningless in small groups of species, and percentages are very important in this table.
Adam Eyre-Walker, Centre for the Study of Evolution and School of Life Sciences, University of Sussex, Brighton, United Kingdom.
The current paper follows up work the authors have done on the relationship between genomic GC and the presence of various DNA polymerase alpha subunits in eubacterial genomes. They confirm, as in their previous work  that species which use a combination of dnaE3 and polC subunits tend to have lower genomic GC contents than those which use dnaE1 subunits, which have much lower genomic GC contents than those which use a combination of dnaE1 and dnaE2. They argue therefore that mutation biases introduced by the alpha polymerase is a major determinant of genomic GC content in bacteria.
Unfortunately, this conclusion is not justified given that there is a high level of phylogenetic non-independence in their data. If we accept their classification of alpha subunits into the four main familes (dnaE1-3 and polC) then almost all bacteria that have dnaE3 and polC are firmicutes and almost all bacteria with dnaE1 and dnaE2 bacteria are proteobacteria and actinobacteria . Hence it is possible that the association between alpha polymerase subunits and GC content is coincidental, established by a few coincidental evolutionary changes; for example, it might be that the evolution of the dnaE2 subunit happened at the same as another unrelated evolutionary change which caused a shift towards high genomic GC content. If there have been relatively few instances in which the alpha polymerase has evolved then association with GC content may be coincidental.
Authors' response : We thank the reviewer for the critical comments. We have overlooked the molecular mechanisms that govern compositional (sequence) variations, but concentrated on sequence variation itself. A minute change in the conformation of these mutator enzymes may alter the GC content in another direction. Clearly, Figure 4shows that in genera Shewanella and Mycobacterium, bacteria in the dnaE1|dnaE2 group generally have higher GC content (by about 10%) as compared with those in the dnaE1|polV group. In addition, we found that all three newly sequenced (deposited in the public database) bacteria in Firmicutes (the dnaE3 group) have unexpectedly high GC content (>60%) and two of them (Alicyclobacillus acidocaldarius subsp. Acidocaldarius DSM 446 and Symbiobactrium thermophilum IAM 14863) correlate well with the presence of dnaE2. One bacterium (Candidatus desulforudis audaxviator MP104C) has been proven to have lost polC, similar to what we found in Pelotomaculum thermopropionicum SI. Furthermore, analyzing the pattern and distribution of bacterial SSR (simple sequence repeats), we found one bacterium, Acidiphilium cryptum JF-5, which was previously identified as dnaE1|polV group bacterium, has now been proven to have SSR patterns similar to that of dnaE1|dnaE2 group bacteria. Our further genome-wide screening led to the discovery of a single copy dnaE2 in one of its plasmids (manuscript in preparation). Therefore, we think that the correlation between dnaE polymerases and GC content is a rule rather than coincidental and exceptional, albeit lacking direct experimental confirmation. Of course, we do not think that there are no exceptions to the rule, but we predict that they are the minority.
The authors need to conduct a proper comparative analysis by, for example, selecting related pairs of bacteria that differ in their alpha-polymerase subunits. They give some examples at the end of the current paper, but they need to find more examples, and to find these without reference to the genomic GC content. Once they have set the problem within a proper comparative framework they can start to investigate the relative correlation between GC content and alpha polymerase subunits, genome size, lifestyle....etc.
Authors' response : We have conducted a comparative analysis by selecting bacteria that differ in their alpha-polymerase subunits, as shown in Figure 4. In future investigations, we may be able to show more examples, but what we have now is limited by the availability of the relevant public data.
As it stands I do not think there is much evidence to support the authors' hypothesis that GC content evolution is determined by alpha polymerase subunits. Even if this was proven it is evident from their figure 1 that a large proportion of the variance in genomic GC content is not explained by subunits, since there is a large variance in genomic GC content within each subunit category.
Authors' response : We cited our previous related papers and added several lines of evidence to support our hypothesis. It is true that GC content variation in each group varies to different extents. What we are emphasizing here are two concepts. One is the fact that there are boundaries or specific spectra in compositional variability. The dnaE1|polV group is the extreme, which appears to have no limit in GC content variation but is regulated by mutator genes. Other groups have boundaries and they either prefer low-GC or high-GC contents. The other concept is why GC content varies and the complexity required to explain such variability. Large variances within each subunit category reflect the complexity of diverse factors contributing to GC content variation. As exemplified in our manuscript, there are also many other mutator genes (such as mutT, mutY, and mutM), as well as several environmental and bacteriological factors contributing to GC content variations. Horizontal gene transfer is another major factor that often results in broader GC content variability; not only as a mechanism of genetic material exchange, but also the material itself often makes significant contributions.
Quality of written English: Needs some language corrections before being published.
Authors' response : We have carefully checked the wording throughout the manuscript and revised the manuscript for clarity.
Eugene Koonin, National Center for Biotechnology Information, NIH, Bethesda, Maryland, United States.
Wu et al. claim to have solved a very old enigma, that of the molecular basis of the GC-content variation in bacteria. They come to the conclusion that the defining factor is the asymmetry of the DNA polymerase III dimer, in particular, the presence of one of the two mutator forms, polC or dnaE2. It is certainly plausible that the structure of the replicative DNA polymerase substantially contributes to mutational biases. Nevertheless, unfortunately, the data presented in the article do not convince me at all that the structure of polymerase III alone determines the GC-content or even contributes to it significantly. Part of the problem is the puzzling lack of statistical analysis in the paper: the authors simply report some base composition preferences in different groups of bacteria without presenting correlation coefficients let alone p-values. More importantly, I think the authors fail to recognize and properly interpret the current status of the study of evolution of nucleotide composition in bacteria and archaea (their references 54-56). By now it appears certain that there is mutational bias toward AT in all prokaryotes, and accordingly, the high GC-content seen in many bacteria and archaea is most likely due to selection pressure. Both the molecular mechanisms underlying the mutational bias and especially the selective factors that offset this bias are of major interest but I am afraid the current article does not significantly contribute to our understanding of this evolutionary conundrum.
Authors' response : We are grateful for the reviewer's critical comments. The conclusion we draw in this study is based on comparative analysis of genomic sequences and correlations between GC content and various bacteriological features are examined. We plan to design experiments to test our hypothesis by investigating mutation patterns in reporter genes or even on a genome-wide level after introduction or elimination of dnaE2. We hope that we can provide more convincing experimental evidence to answer this question in the near future.
Quality of written English: Needs some language corrections before being published.
Authors' response : We have carefully proofread the manuscript and invited a native English-speaking colleague to edit our revised manuscript.
Reviewer 1: I am still concerned by many of the methodological and conceptual problems raised by the reviewers, which were only partially addressed in this revised version, in my opinion.
Authors' response : This is a fair assessment. We apologize for not be able to meet all expectations from the reviews. It is a spirit of scientific research that a publication should not easily satisfy a scientific question in a one-on-one fashion but stimulates deeper thinking and generates even more questions. Nevertheless, we will try to address some of the legitimate concerns in our future work.
Reviewer 3: Unfortunately, the authors do not address the substance of the criticisms in their responses to reviewers. Neither have they made adequate language corrections.
Authors' response : We have added more analysis to the first revision and addressed some of the questions raised by the reviewers but we admit that we were unable to address all the concerns since some of them are obviously subjects for future debates. Only time will tell whether our dnaE-based grouping scheme is correct or not. In addition, for better written English, the final manuscript has been further revised by Edanz group editors.
We thank Mr. Tongwu Zhang and Dr. Hongzhu Qu for helpful discussions and constructive comments. We are grateful to the thoughtful comments, valuable suggestions, and helpful criticisms of the three respected reviewers. The study was supported by grants from the Knowledge Innovation Program of the Chinese Academy of Sciences (KSCX2-EW-R-01-04), the Natural Science Foundation of China (90919024), the Natural Science Foundation of China (30900831), from the Ministry of Science and Technology as the National Science and Technology Key Project (2008ZX10004-013), and the National Basic Research Program (973 Program) from the Ministry of Science and Technology of the People's Republic of China (2011CB944100).
- Sueoka N: On the genetic basis of variation and heterogeneity of DNA base composition. Proc Natl Acad Sci USA. 1962, 48: 582-592. 10.1073/pnas.48.4.582.PubMedPubMed CentralView ArticleGoogle Scholar
- Li W, Grauer D: Fundamentals of Molecular Evolution. 1991, Sunderland MA: Sinauer Associates Inc, FirstGoogle Scholar
- Belozersky AN, Spirin AS: A correlation between the compositions of deoxyribonucleic and ribonucleic acids. Nature. 1958, 182: 111-112. 10.1038/182111a0.PubMedView ArticleGoogle Scholar
- Bernardi G: Codon usage and genome composition. J Mol Evol. 1985, 22: 363-365. 10.1007/BF02115693.PubMedView ArticleGoogle Scholar
- Sharp PM, Devine KM: Codon usage and gene expression level in Dictyostelium discoideum: highly expressed genes do 'prefer' optimal codons. Nucleic Acids Res. 1989, 17: 5029-5039. 10.1093/nar/17.13.5029.PubMedPubMed CentralView ArticleGoogle Scholar
- Gouy M, Gautier C: Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 1982, 10: 7055-7074. 10.1093/nar/10.22.7055.PubMedPubMed CentralView ArticleGoogle Scholar
- Bulmer M: Coevolution of codon usage and transfer RNA abundance. Nature. 1987, 325: 728-730. 10.1038/325728a0.PubMedView ArticleGoogle Scholar
- Knight RD, Freeland SJ, Landweber LF: A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2001, 2: RESEARCH0010Google Scholar
- Zhang Z, Yu J: Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biol Direct. 2010, 5: 63-10.1186/1745-6150-5-63.PubMedPubMed CentralView ArticleGoogle Scholar
- Wong GK, Wang J, Tao L, Tan J, Zhang J, Passey DA, Yu J: Compositional gradients in Gramineae genes. Genome Res. 2002, 12: 851-856. 10.1101/gr.189102.PubMedPubMed CentralView ArticleGoogle Scholar
- Green P, Ewing B, Miller W, Thomas PJ, Green ED: Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 2003, 33: 514-517. 10.1038/ng1103.PubMedView ArticleGoogle Scholar
- Mugal CF, von Grunberg HH, Peifer M: Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol. 2009, 26: 131-142.PubMedView ArticleGoogle Scholar
- Tyerman J, Havard N, Saxer G, Travisano M, Doebeli M: Unparallel diversification in bacterial microcosms. Proc Biol Sci. 2005, 272: 1393-1398. 10.1098/rspb.2005.3068.PubMedPubMed CentralView ArticleGoogle Scholar
- Rainey PB, Travisano M: Adaptive radiation in a heterogeneous environment. Nature. 1998, 394: 69-72. 10.1038/27900.PubMedView ArticleGoogle Scholar
- Spencer CC, Tyerman J, Bertrand M, Doebeli M: Adaptation increases the likelihood of diversification in an experimental bacterial lineage. Proc Natl Acad Sci USA. 2008, 105: 1585-1589. 10.1073/pnas.0708504105.PubMedPubMed CentralView ArticleGoogle Scholar
- Gause GF, Dudnik YV, Laiko AV, Netyksa EM: Induction of mutants with altered DNA composition: effect of ultraviolet on Bacterium paracoli 5099. Science. 1967, 157: 1196-1197. 10.1126/science.157.3793.1196.PubMedView ArticleGoogle Scholar
- Singer CE, Ames BN: Sunlight ultraviolet and bacterial DNA base ratios. Science. 1970, 170: 822-825. 10.1126/science.170.3960.822.PubMedView ArticleGoogle Scholar
- Kagawa Y, Nojima H, Nukiwa N, Ishizuka M, Nakajima T, Yasuhara T, Tanaka T, Oshima T: High guanine plus cytosine content in the third letter of codons of an extreme thermophile. DNA sequence of the isopropylmalate dehydrogenase of Thermus thermophilus. J Biol Chem. 1984, 259: 2956-2960.PubMedGoogle Scholar
- Musto H, Naya H, Zavala A, Romero H, Alvarez-Valin F, Bernardi G: Correlations between genomic GC levels and optimal growth temperatures in prokaryotes. FEBS Lett. 2004, 573: 73-77. 10.1016/j.febslet.2004.07.056.PubMedView ArticleGoogle Scholar
- Sueoka N: Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci USA. 1988, 85: 2653-2657. 10.1073/pnas.85.8.2653.PubMedPubMed CentralView ArticleGoogle Scholar
- Martin AP: Metabolic rate and directional nucleotide substitution in animal mitochondrial DNA. Mol Biol Evol. 1995, 12: 1124-1131.PubMedGoogle Scholar
- Oliver JL, Marin A: A relationship between GC content and coding-sequence length. J Mol Evol. 1996, 43: 216-223. 10.1007/BF02338829.PubMedView ArticleGoogle Scholar
- Xia X, Xie Z, Li WH: Effects of GC content and mutational pressure on the lengths of exons and coding sequences. J Mol Evol. 2003, 56: 362-370. 10.1007/s00239-002-2406-1.PubMedView ArticleGoogle Scholar
- McEwan CE, Gatherer D, McEwan NR: Nitrogen-fixing aerobic bacteria have higher genomic GC content than non-fixing species within the same genus. Hereditas. 1998, 128: 173-178.PubMedView ArticleGoogle Scholar
- Naya H, Romero H, Zavala A, Alvarez B, Musto H: Aerobiosis increases the genomic guanine plus cytosine content (GC%) in prokaryotes. J Mol Evol. 2002, 55: 260-264. 10.1007/s00239-002-2323-3.PubMedView ArticleGoogle Scholar
- Foerstner KU, von Mering C, Hooper SD, Bork P: Environments shape the nucleotide composition of genomes. EMBO Rep. 2005, 6: 1208-1213. 10.1038/sj.embor.7400538.PubMedPubMed CentralView ArticleGoogle Scholar
- Musto H, Naya H, Zavala A, Romero H, Alvarez-Valin F, Bernardi G: Genomic GC level, optimal growth temperature, and genome size in prokaryotes. Biochem Biophys Res Commun. 2006, 347: 1-3. 10.1016/j.bbrc.2006.06.054.PubMedView ArticleGoogle Scholar
- Zhao X, Zhang Z, Yan J, Yu J: GC content variability of eubacteria is governed by the pol III alpha subunit. Biochem Biophys Res Commun. 2007, 356: 20-25. 10.1016/j.bbrc.2007.02.109.PubMedView ArticleGoogle Scholar
- Zhao XQ, Hu JF, Yu J: Comparative analysis of eubacterial DNA polymerase III alpha subunits. Genomics Proteomics Bioinformatics. 2006, 4: 203-211. 10.1016/S1672-0229(07)60001-1.PubMedView ArticleGoogle Scholar
- Hu J, Zhao X, Zhang Z, Yu J: Compositional dynamics of guanine and cytosine content in prokaryotic genomes. Res Microbiol. 2007, 158: 363-370. 10.1016/j.resmic.2007.02.007.PubMedView ArticleGoogle Scholar
- Cox EC: Bacterial mutator genes and the control of spontaneous mutation. Annu Rev Genet. 1976, 10: 135-156. 10.1146/annurev.ge.10.120176.001031.PubMedView ArticleGoogle Scholar
- Tanaka MM, Bergstrom CT, Levin BR: The evolution of mutator genes in bacterial populations: the roles of environmental change and timing. Genetics. 2003, 164: 843-854.PubMedPubMed CentralGoogle Scholar
- Bao Q, Tian Y, Li W, Xu Z, Xuan Z, Hu S, Dong W, Yang J, Chen Y, Xue Y, et al: A complete sequence of the T. tengcongensis genome. Genome Res. 2002, 12: 689-700. 10.1101/gr.219302.PubMedPubMed CentralView ArticleGoogle Scholar
- Kosaka T, Kato S, Shimoyama T, Ishii S, Abe T, Watanabe K: The genome of Pelotomaculum thermopropionicum reveals niche-associated evolution in anaerobic microbiota. Genome Res. 2008, 18: 442-448. 10.1101/gr.7136508.PubMedPubMed CentralView ArticleGoogle Scholar
- Konstantinidis KT, Tiedje JM: Trends between gene content and genome size in prokaryotic species with larger genomes. Proc Natl Acad Sci USA. 2004, 101: 3160-3165. 10.1073/pnas.0308653100.PubMedPubMed CentralView ArticleGoogle Scholar
- Yanofsky C, Cox EC, Horn V: The unusual mutagenic specificity of an E. Coli mutator gene. Proc Natl Acad Sci USA. 1966, 55: 274-281. 10.1073/pnas.55.2.274.PubMedPubMed CentralView ArticleGoogle Scholar
- Horst JP, Wu TH, Marinus MG: Escherichia coli mutator genes. Trends Microbiol. 1999, 7: 29-36. 10.1016/S0966-842X(98)01424-3.PubMedView ArticleGoogle Scholar
- Thompson JN, Woodruff RC: Mutator genes--pacemakers of evolution. Nature. 1978, 274: 317-321. 10.1038/274317a0.PubMedView ArticleGoogle Scholar
- Slupska MM, Baikalov C, Lloyd R, Miller JH: Mutator tRNAs are encoded by the Escherichia coli mutator genes mutA and mutC: a novel pathway for mutagenesis. Proc Natl Acad Sci USA. 1996, 93: 4380-4385. 10.1073/pnas.93.9.4380.PubMedPubMed CentralView ArticleGoogle Scholar
- Wiegand I, Marr AK, Breidenstein EB, Schurek KN, Taylor P, Hancock RE: Mutator genes giving rise to decreased antibiotic susceptibility in Pseudomonas aeruginosa. Antimicrob Agents Chemother. 2008, 52: 3810-3813. 10.1128/AAC.00233-08.PubMedPubMed CentralView ArticleGoogle Scholar
- Radman M, Taddei F, Matic I: DNA repair systems and bacterial evolution. Cold Spring Harb Symp Quant Biol. 2000, 65: 11-19. 10.1101/sqb.2000.65.11.PubMedView ArticleGoogle Scholar
- Ochman H, Lawrence JG: Phylogenetics and the amelioration of bacterial genomes. In F C Neidhardt et al (eds) Escherichia coli and Salmonella typhimurium: Molecular and Cellular Biology 2nd edition ASM Publications, Washington. 1996, 2627-2637.Google Scholar
- Nishida H, Beppu T, Ueda K: Symbiobacterium lost carbonic anhydrase in the course of evolution. J Mol Evol. 2009, 68: 90-96. 10.1007/s00239-008-9191-4.PubMedView ArticleGoogle Scholar
- Davis EO, Dullaghan EM, Rand L: Definition of the mycobacterial SOS box and use to identify LexA-regulated genes in Mycobacterium tuberculosis. J Bacteriol. 2002, 184: 3287-3295. 10.1128/JB.184.12.3287-3295.2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Boshoff HI, Reed MB, Barry CE, Mizrahi V: DnaE2 polymerase contributes to in vivo survival and the emergence of drug resistance in Mycobacterium tuberculosis. Cell. 2003, 113: 183-193. 10.1016/S0092-8674(03)00270-8.PubMedView ArticleGoogle Scholar
- Rand L, Hinds J, Springer B, Sander P, Buxton RS, Davis EO: The majority of inducible DNA repair genes in Mycobacterium tuberculosis are induced independently of RecA. Mol Microbiol. 2003, 50: 1031-1042. 10.1046/j.1365-2958.2003.03765.x.PubMedView ArticleGoogle Scholar
- Galhardo RS, Rocha RP, Marques MV, Menck CF: An SOS-regulated operon involved in damage-inducible mutagenesis in Caulobacter crescentus. Nucleic Acids Res. 2005, 33: 2603-2614. 10.1093/nar/gki551.PubMedPubMed CentralView ArticleGoogle Scholar
- Martins-Pinheiro M, Marques RC, Menck CF: Genome analysis of DNA repair genes in the alpha proteobacterium Caulobacter crescentus. BMC Microbiol. 2007, 7: 17-10.1186/1471-2180-7-17.PubMedPubMed CentralView ArticleGoogle Scholar
- Erill I, Campoy S, Barbe J: Aeons of distress: an evolutionary perspective on the bacterial SOS response. FEMS Microbiol Rev. 2007, 31: 637-656. 10.1111/j.1574-6976.2007.00082.x.PubMedView ArticleGoogle Scholar
- Nakamura Y, Nishio Y, Ikeo K, Gojobori T: The genome stability in Corynebacterium species due to lack of the recombinational repair system. Gene. 2003, 317: 149-155.PubMedView ArticleGoogle Scholar
- Moran NA: Microbial minimalism: genome reduction in bacterial pathogens. Cell. 2002, 108: 583-586. 10.1016/S0092-8674(02)00665-7.PubMedView ArticleGoogle Scholar
- van Ham RC, Kamerbeek J, Palacios C, Rausell C, Abascal F, Bastolla U, Fernandez JM, Jimenez L, Postigo M, Silva FJ, et al: Reductive genome evolution in Buchnera aphidicola. Proc Natl Acad Sci USA. 2003, 100: 581-586. 10.1073/pnas.0235981100.PubMedPubMed CentralView ArticleGoogle Scholar
- Wernegreen JJ: Genome evolution in bacterial endosymbionts of insects. Nature Rev Genet. 2002, 3: 850-861. 10.1038/nrg931.PubMedView ArticleGoogle Scholar
- Hildebrand F, Meyer A, Eyre-Walker A: Evidence of Selection upon Genomic GC-Content in Bacteria. PLoS Genet. 2010, 6: e1001107-10.1371/journal.pgen.1001107.PubMedPubMed CentralView ArticleGoogle Scholar
- Hershberg R, Petrov DA: Evidence That Mutation Is Universally Biased towards AT in Bacteria. PLoS Genet. 2010, 6: e1001115-10.1371/journal.pgen.1001115.PubMedPubMed CentralView ArticleGoogle Scholar
- Lind PA, Andersson DI: Whole-genome mutational biases in bacteria. Proc Natl Acad Sci USA. 2008, 105: 17878-17883. 10.1073/pnas.0804445105.PubMedPubMed CentralView ArticleGoogle Scholar
- Holt JG: Bergey's Manual of Determinative Bacteriology. 1994, Baltimore: Lippincott Williams and Wilkins, 9Google Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.PubMedView ArticleGoogle Scholar
- Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4: 406-425.PubMedGoogle Scholar
- Galtier N, Lobry JR: Relationships between genomic G + C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J Mol Evol. 1997, 44: 632-636. 10.1007/PL00006186.PubMedView ArticleGoogle Scholar
- Hurst LD, Merchant AR: High guanine-cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes. Proc Biol Sci. 2001, 268: 493-497. 10.1098/rspb.2000.1397.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.