Reviewer report 1 by W. Ford Doolittle (Dalhousie University, Canada)
I have nothing useful to say about the individual methods presented by Beauregard-Racine and colleagues, but one extended comment on the pluralistic approach they together embody. It is worth reminding ourselves that there is very little difference between the various sides in the TOL debate in terms of understanding of the genetic and ecological processes that determine the structures of individual genomes or the evolution of individual genes. There is not even much disagreement about the relative extents of verifiable vertical descent and LGT. What we are arguing about are relative importances and appropriate representations, matters of generalization about which there may be no facts. All that's really out there in the world are these genetic and ecological processes affecting and having affected one gene or one organism at a time over four billion years. So the pluralism endorsed in this contribution may not only be more useful (in suggesting new ways to look for new things), but more true, in that it discourages us from seeking generalizations and thinking of them as laws.
Authors' response: We fully agree with Ford Doolittle, and thank him very much for his major role in extending the research field of evolutionary biology beyond the TOL.
Reviewer report 2 by Tal Pupko (Tel-Aviv University, Israel)
In bacterial evolution, the hypothesis of "one tree to rule them all" is now widely rejected. In other words, there is not a single species tree topology that describes the evolution of all the genes - different gene trees have different topologies. Those different topologies cannot be explained by stochastic noise or phylogenetic artifacts. The lack of one true tree immediately calls for networks as a visualization and analysis tool to study bacterial evolution, be it either a genome network or gene network. In this paper, Eric Bapteste and colleagues clearly explain the need for networks to study bacterial evolution; they survey some network methodologies and apply them to study the genome evolution of E. coli. The paper provides easy exposition to these network tools, and how they can quickly be used to visualize evolutionary dynamics. Given the ever increasing number of bacterial species for which dozens of isolates have their genomic sequences fully determined, the utility of such methods is expected to increase significantly.
Since this is more of a review paper than a research paper, I would have liked to see more discussion about the open questions in the field (computational and biological challenges in the field of network analysis). Furthermore, many of these network analyses provide results that can also be obtained by other methods. I think it is important to mention other methodologies that aim to answer the same questions as those provided by network-based analyses. As a case in point, maximum-likelihood analyses of gene family presence and absence (phyletic pattern analyses) have provided many insights into genomic fluidity within and among bacterial species.
To summarize, this nicely written work clearly demonstrates the need for novel methodologies to analyse bacterial genome dynamics, methods that differ from those used to analyse the TOL. I expect that as more data accumulate, Bayesian and likelihood based inference tools will be used to capture better the peculiar evolutionary processes that cause genome fluidity in bacteria. This paper and others also seem to indicate that the involvement of phages in bacterial fluidity is underestimated and that bacterial genomics is tightly linked to molecular biology and evolution of phages.
Authors' response: We thank the referee very much for his comments. He is absolutely right on all grounds. There are indeed many open questions in the field of network analyses, but this particular issue would certainly deserve to be the focus of a separate paper. In this revised version, we mention some biological open questions associated with network approaches. However, we fully share the referee's interest, and we would like to encourage motivated colleagues to elaborate reviews on the computational and biological challenges in the field of evolutionary network analysis. Some good leads for this useful and timely work could for a start be found in the excellent special issue of 2009: [12, 61]. As methodological pluralists, we can only welcome the development of novel methods (based on maximum likelihood, Bayesian analyses, and specifically accounting for gene family presence and absence).
Reviewer report 3 by Richard M. Burian (Virginia Tech, USA)
During the last half-dozen years of so, Eric Bapteste and numerous colleagues have developed a long-term program of research aimed at providing a pluralistic framework for interpreting (mainly prokaryotic) processes of genomic change and evolutionary patterns in terms of networks of exchanges among genetic units of various sorts. The present manuscript explores lessons that can be gleaned from applying four different methods, two of them network methods, two of them methods for analysing the "forest of life" (FOL), i.e., the forest of (divergent) gene trees, employed on genomic and genetic data for E. coli and various archaea, bacteria, and mobile elements (plasmids and phages). A major purpose of the submission is to show how the application of different methods to large datasets can handle a diverse range of questions by following a variety of evolutionary units that evolve on different scales and in different patterns. In particular, real data in the highly fluid pangenome of E. coli serve as a model for application of this set of tools and methods to capture different sorts of units and different rates and kinds of exchanges that are more helpfully analysed via network and FOL tools than with standard tree-based analyses. The methods applied to the FOL utilize the concepts of clans (created by bipartition of trees of operational taxonomic units, often unrooted,) and slices (segments between two cuts in such unrooted trees). These methods provide evidence of lateral gene transfer into and/or out of clans or a slices; analysis of such transfers proves to be of considerable importance. In addition, a novel method analysing "polychromatic quartets" (involving pairwise comparison of gene trees that contain at least four distinct strains, here, with data for 30 strains of E. coli) allows a finer-grained analysis of lateral transfer. In the E. coli data, this tool was able to demonstrate, for example the (possibly surprising) result that (except perhaps for genes in the E. coli core) lateral exchange among pathogenic strains of E. coli has occurred more frequently than between pathogenic and non-pathogenic, or among non-pathogenic strains.
As a philosopher of biology who is not equipped to evaluate the methods as such, I concentrate on the results rather than the methods. The results of greatest interest concern the evidence for the extraordinary degree of genetic mosaicism both in recently evolved taxa and in the long-term evolution (and co-evolution) of a wide range of bacteria, archaea, and mobile elements.
To my eye, what is most striking is the fine tuning of adaptation achieved by lateral transfer, which, for archaea, bacteria, and mobile elements, serves something like the role of recombination in eukaryotes. Of particular interest is what this sort of work suggests regarding debates over the units of evolution. The perspective of the authors is firmly pluralist: they view their tools as exploratory, pragmatically accepting as units whatever entities the data show to have relative autonomy over a relevant range of variation within or among a relevant range of genomes. In short, they claim to utilize the data to identify, locate, and pursue different units of evolution, operating on different scales and in different contexts without strong advance commitments about the full-fledged autonomy of the units or the topology of the trees or networks within which they are found. In general, their findings, as I understand them, suggest that both the structure and the selective values of all units of evolution depend on context, including the other units of evolution with which they interact and (for genes and other embedded sequences of DNA) which sorts of entities they are embedded in. Given LGT, there is both intergenic and intragenic recombination across (larger) evolutionary units. The recombination does not respect the standard phylogenetic boundaries; exchanges take place among archaea, bacteria, and mobile elements, though, of course, at widely different rates. Such findings provide empirical support for a pluralist position, according to which the status of units as (locally and functionally) fundamental depends on the contexts considered and the scale of investigation (e.g., the genomic contexts of the units, the processes by which exchange occurs, the relative stabilities of the units among which there is evolutionary competition, and the extent of the environmental and organismal interactions under investigation).
The conceptual issues of greatest interest concern the extent of the effects of "genetic partnerships" between, e.g., mobile elements and cellular genomes, or across cellular genomes. Such entities as "mobile modules of pathogenicity" can be uncovered by the investigative methods developed by the authors (and others) and appear unlikely to be well understood without understanding the lateral transfers that are involved. More generally, the ways in which the units uncovered depend on the questions investigated, the scale of changes examined, and the investigative tools employed, strongly suggest that a pragmatic and pluralist understanding of the units of evolution and of genetic function is appropriate to the ongoing stream of investigations of evolutionary patterns and processes.
This general characterization provides the interpretative framework that I understand (from the present submission and from some previous publications) the authors to employ. I find little to criticize in the general framework, but have some questions at a finer grain. I address these questions directly to the authors.
Authors' response: We thank the referee: he described with very much insight the logic of our (past and present) contributions. It is a real honour from such a great specialist of history and philosophy of biology.
In the abstract, you mention genetic partnerships twice, but that concept never appears directly in the text of the article. It might help to revisit it in some fashion later in this paper, for the evolution of a gene caught up in a genetic partnership will, in general, differ from that of a gene that experiences only vertical inheritance and/or no effects from a symbiotic relationship.
Authors' response: We agree and have added this claim into the revised MS: "the evolution of a gene caught up in a genetic partnership will, in general, differ from that of a gene that experiences only vertical inheritance"
Similarly, although you are clear that methodological pluralism is called for in dealing with different (evolutionary) questions, it is not clear whether you wish to take a strong position about the extent to which the boundaries of evolutionary units drawn or accepted by investigators depend on the questions they are pursuing and the investigative tools that they use. This may not be the appropriate place to address that issue, but it is one that needs to be addressed carefully at some point in following up the lines you have opened up here and elsewhere. Does it deserve a comment in the present context?
Authors' response: Indeed, we wish to take that strong position: the boundaries of evolutionary units we draw depend on our questions and tools. There are so many connections in an evolutionary network, so many interactions and types of interactions, that results of scientific inquiries looking for some structure in this evolutionary web will always stress some privileged connections, for pragmatic and instrumental reasons. However we (evolutionary biologists) will particularly value the boundaries (and relationships) grounded in a biological process: our tools and questions can also be designed to try to unravel evolutionary groups based on evolutionary processes. By analogy, these groups can be seen as the consequences of "questions" asked not only by investigators, but also "asked" to the evolving entities by their biotic and abiotic environments (i.e. how to survive in a hypersaline environment with reduced organismal diversity, how to survive in an arms race with a predator, etc), defining some boundaries (e.g. in the sharing of some traits) and introducing some structure to the evolutionary web. When the investigators' questions can be framed in terms of "natural selection" for example, the units identified are easier to interpret and explain in an evolutionary framework, even without a TOL. Some researchers may therefore be willing to attribute a stronger ontological reality to these remarkable units (and their remarkable connections) than to consider them merely as conventional (pragmatically-defined) objects (which of course they are as well). Such units would be in some respect "hard" conventional objects (as opposed to "soft" conventional objects, purely stemming from the focus and interest of human minds): such units would still impact and emerge from the ecological and genetic processes mentioned by Ford Doolittle, even if no human investigators was around to study them. They would constitute aspects of biological reality with their own local causal effects. We would be interested to hear whether this intuitive (likely naïve) philosophy on units seems sound to the referee, and how it could be improved (or replaced).
You claim in the second paragraph of the Background that homologous characters comparable across all life forms are needed in order to reconstruct the TOL. I'm not convinced that this is correct. If there are several major evolutionary transitions (e.g., from a pre-DNA to a DNA-based genetic system, etc.), there may be no reason to expect ANY character to be identical by descent with a sufficiently distant ancestral character. If homology means something approximating identity by descent, your claim seems to require too much of those who seek to reconstruct a single TOL.
Authors' response: The referee is right. If there are several major evolutionary transitions, homology might not be a sufficient guideline to describe early evolution. For such a difficult task, this central notion must be complemented (or replaced) by additional evolutionary concepts. We edited the text accordingly.
In the fourth paragraph of this section, you might want to make a clearer (or stronger?) claim about the difficulty affecting inferences from pattern to process caused by the independent processes impacting the evolutionary histories of genes. This seems crucial both for the support of your pluralism and for your emphasis on the need to work on the impact of multiple processes on pattern in evaluating inferences from pattern to process.
Authors' response: This is a crucial point that certainly justifies pluralism in evolution. Evolutionary patterns (most obviously the most complex ones, i.e. phylogenetic networks) are indeed caused by independent processes impacting the evolutionary histories of genes. From a pluralistic perspective, methods specifically designed to tackle this issue (e.g. that there is often more than one process behind a pattern) must be encouraged, as opposed to attempts to explain all patterns by a single process (e.g. all evolution by a tree-like process of descent). We clarified this in the revised version of the manuscript, see the section "This kind of phylogenetic networks put forward [...] A tree alone is not going to help establish much of this evolutionary complexity."
In the second paragraph of the Results and Discussion, you claim to divide gene networks into temporal slices. Strictly speaking, this seems to be incorrect. As you indicate in a parenthetical comment, 100% identity of certain sequences in the data for the genome of an E. coli strain and a mobile element might be caused by recent exchange or by very strong purifying selection. It is plausible that the data for the 199 mobile elements and the various E. coli strains you examined do not result from purifying selection, but the claim that the data provide temporal slices is the conclusion of an argument, not appropriate as an initial characterization of the slices themselves.
Authors' response: We agree. We removed "temporal" before slices, and only concluded afterwards that the slices we studied at 100% identity treshold were likely to correspond to recent events of sharing.
Minor query: In the next paragraph, you report that Table 1 shows 41% of the 4361 100%-similarity sequences belong to the L functional category another 41% belong to the unknown function category. In working through the table to be sure that I understood your results, I found that (1838/4361) = 42.2% and (1832/4361) = 42.0%. So either I misunderstood the calculation or the numbers should read 42%.
Authors' response: Sorry, we fixed that number to 42%.
In paragraph 4, it might be worth adding a sentence or two (if it is correct) to the effect that your analysis suggests that gene networks are more helpful than gene trees in producing plausible inferences from evolutionary patterns to evolutionary processes - at least where lateral transfer is involved and leaves traces that have not yet been erased.
Authors' response: It is to some extent correct, although currently phylogenetics benefits from its history of use and from a rich body of tools to study gene trees, all of which would still need to be developed for gene networks. Yet, gene networks can be seen as more helpful than gene trees for inferences on complex evolutionary processes, since they are more inclusive than gene trees, and allow the investigation of mixed evolutionary processes that included vertical descent as well as recombination, domain fusion, etc. However, gene networks are not polarized like gene trees are, and they harbour no nodes corresponding to hypothetical ancestors. Future developments are likely to produce some improvements on these fronts. We have added a quick sentence in the text to introduce these claims.
In the section on lessons from networks, as part of the discussion of the results, it might be useful (if you think it correct) to suggest that the genes that exhibit LGT (including the ones that hitchhike with replication and repair genes) may well experience independent evolutionary processes (e.g. different selection regimes) while they reside in mobile elements than while they reside in cellular genomes. This exemplifies, as I understand it, a key reason for which direct inference from pattern (in trees) to process is fragile. If you agree, perhaps this would fit best into the last paragraph of this subsection.
Authors' response: We agree entirely. This may very well be an important distinction, worth modeling, that is currently missing in methods trying to reconstruct the TOL, as these mobile elements, or the trajectory of genes in and out these elements coupled to possible changes in selection regimes, is not modeled in TOL-based approaches. This issue calls for the inclusion of the mobile elements, and their selection regimes, in models of molecular evolution. We have briefly discussed this topic in the revised manuscript.
In the Lessons from the Forest, first paragraph of the section on Clanistic analysis, it would help if the E* index is explained. I have only a first approximation understanding of this index, but it seems unlikely to me that it can serve as a wholly general way of distinguishing intruders from natives in the intended sense. It is, or should be, an empirical question whether sequence partitions into clans and slices present so extensive a mélange that (in some cases) no clear answer derived simply from the sequence data as to what should count as a native is available. Abstractly, at least, insofar as the E* index is concerned, this seems to be an open question, though one that (I suspect) the data will resolve favorably for most of the familiar sorts of cases that have been examined. But as more esoteric sorts of genetic units and more difficult sorts of genetic partnerships are explored, there may be some surprises on this front. In any case, some sort of explanation, if feasible in brief compass, of the E* index would be of use.
Authors' response: The referee is right. It is indeed an empirical question whether the partition in clans or slices will show extensive mélanges of two categories of OTUs. The E* quantifies the extent of this mixing between entities belonging to two categories defined a priori. These categories are for now arbitrarily defined, rather than inferred from the data. Although they are currently called "natives" and "intruders" but they could very well have been called "cat1" and "non-cat1". We have added a brief explanation of the E* in the revised version of the MS.
In the next paragraph, what exactly do you mean by the claim that "Mobile genetic elements were present in 10.3% of the wild forest"? My assumption is that in 10.3% of the gene trees in the database, sequences matching some sequence in the sample of mobile elements included in the analysis were present. If that is correct, this result is likely to underrepresent the extent to which sequences derived from mobile elements are present in this set of trees. If it is incorrect, you need to clarify what your claim means. The importance of the sample in determining the fraction of gene families that have been impacted by mobile elements is unclear, but one might suspect that the number of gene families showing such impact might increase as we explore other wise of identifying sequences that have been impacted by LGT.
Authors' response: The referee's first interpretation is correct: the 10.3% depends on the sample of mobile elements included in the analysis, and therefore are very likely to underrepresent the extent to which sequences derived from mobile elements are present in this set of trees, since the diversity of mobile elements is currently undersampled. We have made this point clearer in the revised MS.
The conclusions do a nice job of summarizing important aspects of the findings of this paper and putting them into perspective. They might perhaps be expanded with a sentence or two about further steps suggested by the material reported on in this paper and/or by the general approach of the group that have contributed to this line of research. For example, two general directions that stand out for me are (1) exploring the variation in the rates of lateral transfer in different gene families (and, perhaps, the need to devise methods to detect lateral transfer in those gene families where such transfers are very rare) and (2) devising ways to determine whether there are differences in selection pressures or the direction of evolution (e.g., in GC content) when genes from a given family are embedded in viral or plasmidial genomes on the one hand, or in cellular genomes on the other hand.
Authors' response: These open questions are indeed important ones; we have introduced them in the revised MS.
Reviewer report 4 by James McInerney (Maynooth University, Ireland)
This manuscript deals with a few different issues relating to how prokaryotic genomes evolve. Of significant interest to many scientists are the methodological developments and the Polychromatic Quartets approach to the analysis of genome fluidity is indeed quite interesting. I have very few issues that I wish to raise and I think this is a useful addition to the literature in this area.
Authors' response: We thank the referee for his comments.
On page 6 in the last paragraph, you say that "[...] these genome networks highlighted that E. coli shared 90-100% identical genes with two pathogenic genomes [...]". Does this mean that it shares -some- sequences that are 90-100% similar? I think this is what it means, but I think this could be clarified a little.
Authors' response: Yes, we clarified this.
Of interest in the group of genes listed as being common to E. coli and Acholeplasma laidlawii is a 30S ribosomal protein S12. This is a slowly evolving gene and so perhaps it is shared through vertical rather than horizontal transfer. Are there any phylogenetic trees suggesting that there is a specific sister-group relationship between E. coli and A. laidlawii?
Authors' response: In fact, it is E. coli and S. putrefaciens that share the 30S ribosomal protein S12. They are both gamma-proteobacteria. In our dataset, if this sharing was only due to vertical descent, two other taxa, also closely related to E. coli (Coxiella burnetii RSA 493 and Psychrobacter arcticus 273-4) may have shared this rps12. We can certainly not rule out that this particular connection for rps12 reflects vertical descent however.
Concerning E. coli and Acholeplasma laidlawii: they are not closely related. Acholeplasma laidlawii is a mollicute. Interestingly, it is known to produce extracellular vesicles packaging genetic material [62]. As this process of vesiculation, generally captures random DNA found in a host cell, the shared transposase could very well have been transferred by this mechanisms.
Page 8: "The phylogenetic framework helps identifying gene trees compatible with a vertical evolution [...]" needs to be changed
Authors' response: We changed the sentence.
Page 8: "Either some non-E. coli branch within E. coli: [...]" You probably need to say "Either some non-E. coli -sequences- branch within E. coli [...]"
Authors' response: Yes, we edited the text accordingly.
Page 8: This sentence needs to be clarified: "First, analyses of the two forests showed that E. coli exchanged almost no genes with Archaea that appeared too distantly related."
Authors' response: We clarified the sentence. The revised version reads: "First, analyses of the two forests showed that E. coli exchanged almost no genes with Archaea. These organisms may be phylogenetically too distant for successful LGT. Alternatively, the Archaea of that particular dataset may seldom share the same environments with the E. coli investigated here, and therefore they may not rely on the same shell genes to adapt to the environment. This interpretation would explain this low proportion of exchanges."
Page 10: "The one-complement [...]". Could you say briefly what the one-complement is?
Authors' response: The one-complement corresponds to matrices in which values comprised between 0 and 1 (relative frequencies of each clans appearing in PQs) have been substracted from 1.
There are quite a few typographical errors and these should be sorted-out before publication - I don't wish to go through each of them one by one.
Authors' response: We edited the article carefully.
Reviewer report 5 by Didier Raoult (La Timone, France)
Thank you for giving the opportunity to review this paper which emerges at the time when the theory of the TOL becomes increasingly unstable, and does not appear likely any more to be really defended. This analysis of the pangenome stimulates some reflections. I think that the integration of these elements could bring to have a more ecological vision which could enrich the discussion.
Authors' response: We thank the referee very much. We agree with his views: a more ecological vision could enrich evolutionary studies beyond the TOL. To strengthen this claim, we now explain in the revised manuscript that: "This realization had some impact on phylogenetics, which had historically considered evolution through the lens of systematics rather than ecology. Core genes, often assumed to be vertically inherited, were typically expected to produce a fundamental vertical framework, against which the evolution of traits and lineages was to be interpreted. Such core genes appeared suited to think about "groups within groups", which is a logic consistent with systematics. However, the distribution of shell genes was clearly explained by additional evolutionary processes, involving in particular gene transfers between partners with overlapping lifestyles or environments. Most of gene evolution (that of shell genes) appeared therefore better interpreted in light of an ecological vision."
1. Regarding the exchange of genes, this is very dependent on the lifestyle of the bacteria. Bacteria exchange genes when they live together, and when the species are sympatric. We recently proposed the use of this definition to differentiate the bacteria which live isolated in an ecosystem (allopatric) to those which live in complex systems comprising many species (sympatric) by transfer of the concept of Mayr. Concerning human Escherichia coli, which has been much studied, they live in complex communities in the digestive tract; a very recent paper [46] shows that the bacteriophage population in the digestive tract is huge, explaining why in this ecosystem the bacterial species exchange many genes because a very significant number of phages and generalized transduction. This basic finding appears very important to me to explain these major genomic repertoire changes [63, 64].
Authors' response: We agree. We now stress more strongly that gene exchange is very dependent on bacterial lifestyles, and we have included in the manuscript the reference to bacteriophage populations in the gut [46], since we now report that our results are "consistent with previous findings [46], highlighting the role of huge viral populations to provide adaptive genes to their cellular hosts in the digestive tract".
2. A second point that could be developed is the impossibility in a certain number of cases of making trees of genes because of the importance of recombination. A recent work published on Legionella shows that sympatric bacteria recombination reaches a huge level that appears more related to genetic and ecological proximity than to any other factor [65]. This reinforces the idea that sympatric bacteria are all recent mosaics of gene sequences. In addition the recombination introduces the idea that term LGT is inappropriate and should be replaced by LST for Lateral Sequence Transfer. The idea of LGT is a functionalist idea which does not have any meaning, since it is only selective purification that is functionalist. The transfer is mechanical and does not have a goal (Court Jester theory). However this confirms well that the phylogenic proximity is one of the elements allowing easy recombination and the lateral transfer of sequence.
Authors' response: Two really good points. It is absolutely true that in certain cases gene trees do not reflect gene evolution (i.e. due to recombination, domains fusions, unequal evolutionary rates affecting homology detection and excluding fast evolving sequences from phylogenetic alignments). For those very likely common cases, other representations than trees may be better suited to study evolution. It is precisely for that reason that we have started developing gene networks.
It is also absolutely true that what transfers is genetic material (DNA or RNA sequences). Thus LGT is a particular case of LST, when the DNA fragment that was transferred functions as a gene. Some sequences functios as genes in multiple genomic contexts, whereas others don't. Gene networks are thus really good tools to study both recombination and LST. We have discussed and clarified these two points in the main text.
A point which appears to me to be an object for future work is to integrate the most pathogenic Escherichia coli: that is, Shigella. Shigella are among Escherichia coli phylogenetically but they present an extremely reduced genome because of their strict dependence on the host in contrast to Escherichia coli. Pathogenic E. coli do not have a degree of evolution in the pathogenicity, comparable at those of Shigella [63].
Escherichia coli remains a very large pangenome but we have a bias of selection because non human Escherichia coli are not yet sequenced at the same level. It appears that the most important source of Escherichia coli is animal (poultry, pigs, etc). The level of exchange between pathological species is probably also related to the fact that they have the capacity to meet in the gut, which is more important than with the non-pathogenic species. Finally beside the core genes of shell genes the authors do not analysed the ORFans, which represent the creativity of bacteria. It would be interesting to have at least an idea of the proportion of ORFans in each isolate from the pangenome, in order to have an idea of their proportion.
Authors' response: We have added the notion that pathological species may be able to meet in the gut, which would enhance their rate of LGT. The referee is also absolutely correct thatfuture works, beyond the TOL, will need to make real room for ORFans. These sequences pose a great methodological and conceptual challenge for evolutionary studies since comparative approaches are not in the first instance designed to deal with unique sequences that cannot be compared to any other sequences. We have briefly introduced this problem in the perspective of the manuscript.
Rewiever report 6 by Yan Boucher (University of Alberta, Canada)
The manuscript presents an ambitious attempt at using novel approaches to investigate large genomic datasets. The methods presented by the authors are able to produce results in agreement with previous findings on the evolution of E. coli genomes: that they are involved in frequent LGT and recombination. They also address more specific questions, such as rates of gene transfer for core and shell genes, mobile elements and genes from pathogens versus non-pathogens. What is unique about the approaches used is that they do not assume a single phylogeny, but can tell a story including multiple phylogenies. It is also easy to isolate specific types of genes or organisms from a more complex dataset, allowing the user to answer specific questions. What is difficult about the approaches used here is that they use novel concepts that can be difficult to understand (those linked to clanistics especially) and make the conclusions hard evaluate for most biologists.
Authors' response: We thank the referee for his comments.
Specific issues to address:
Abstract:
Problems with the grammatical structure in the results section. This needs to be reviewed by a native English speaker. Language is a bit cavalier, using colloquial terms such as "smoking guns", which are not appropriate for an international audience and only understandable by those with a certain cultural background.
Authors' response: A native english speaker kindly reviewed the manuscript (Thanks very much Dick!). We replaced "smoking guns" with "strong evidence".
Casual language: "(but the RNA viruses, maybe)", "In this paper, we use", "whose main interest is not so much in defining the relative branching order of species". This should be avoided.
We removed these sentences/words.
Main text: How were genes determined to be "mobile elements" in their comparison to E. coli genomes? The criteria need to be explained.
Authors' response: We downloaded the genes from plasmids and viruses from the NCBI. Genes from these mobile elements were considered to belong to mobilized or mobilizable gene families.
The authors should include a legend describing specific network terms such as "betweenness" and "articulation points" or "mélange" or "natives"
Authors' response: We have described these terms in the main text, where required.
The authors need to define terms such as "wild genome forest". I would limit the use of new terms to when they are absolutely required
Authors' response: Wild genome forest is only the name of one of the two forests we studied, reconstructed using all the genes from E. coli UTI89 (NC007946) as indicated in M&M. It is not a technical term. We have clarified this issue in the main text.
A better description of clanistics has to be provided, as it is a new practice. Perhaps some of the materials and method can be included in the main text.
Authors' response: We have introduced clanistics with some more details in the main text. Readers should also refer to the publications, quoted in the MS.
The authors should use subtitles to clarify results and highlight interesting findings, such as " similar recombination levels between core and shell genes'
Authors' response: We have added or edited subtitles accordingly. New sections are now called: Using genome networks to detect recent LGT in the E. coli pangenome; Massive tinkering in the evolution of restriction-modification endonucleases; High rates of LGT in E. coli; Pathogenic lifestyle affects the evolution of 30% of the E. coli pangenome; Detection of candidate mobile modules of pathogenicity; Polychromatic quartets reveal high recombination/LGT rates in core and shell genes within E. coli; Preferential exchanges of DNA material between pathogenic E. coli
Table 2 contains too much information and should be presented as graphs or included as supplementary materials
Authors' response: We have included Table 2 as supplementary materials.