Reviewer's report 1
Eugene V. Koonin (National Institutes of Health, Bethesda, MD, USA)
Reviewer comments
A comprehensive study on the evolutionary dynamics of LTR retroelements, using several network methods in conjunction with standard, alignment-based phylogenies. I generally agree with the authors that phylogenetic trees are not suited to fully describe the modular evolution of mobile elements such as retroelements, so I think the use of all these network analyses and representations is welcome. The conclusion on the existence of "big bang-like" transitions in the evolution of retroelements (if I understood the authors' meaning rightly) is very appealing. The details on evolution of specific groups of retroelements will be of interest to researchers in the field.
Authors' response
We thank this referee for his positive evaluation. Yes, our results suggest inflationary mode of evolution in the evolutionary history of LTR retroelements. This gives support to the idea of that LTR retroelement evolution does not proceeds as a tree but rather a network. The history of LTR retroelements can be traced as network that evolves alternating gradual and vertical means, with diverse episodes of modular, saltational, and reticulate evolution. Our study gives additional support to; 1) the notion arguing that the history of life not as a tree but as a network of networks; 2) the existence of "big bang" patterns in the evolution of biological systems as modeled by this reviewer in prior studies [52, 53] (see also our answer to Reviewer 3).
Reviewer comments
Despite my overall positive impression of this manuscript, I do perceive several problems at different levels:
-
1.
The repeated and emphatic mention of "adaptive landscape", "functional landscapes", "landscapes of functional strategies" etc in this article does not refer to fitness landscape as a mathematical concept or even a concrete image but rather as a very general metaphor. Along the same lines, the discussion of the power laws and how these apply to evolutionary networks, I think, needs to be more careful and concrete (or just dropped altogether as it adds precious little to our understanding of the evolution of retroelements, or any other class of genetic elements). I think the manuscript would benefit from toning down this rhetoric that has the potential confuse the general audience and to annoy the experts.
-
2.
I do not think that the authors' representation of eukaryotic phylogeny is up to date. In particular, the notion of "crown group" is, I believe, obsolete, and should be abandoned. Neither is it appropriate to speak of a "three-way split" between animals, fungi, and plants. We know for a fact now that animals and fungi are sister groups. The appropriate representation of eukaryotic phylogeny at this time is a star of 5 supergroups (see Keeling et al. The tree of eukaryotes. Trends Ecol Evol. 2005 Dec;20(12):670-6). Actually, the representation of the eukaryotic phylogeny in Figure 4 (very appealing figure, indeed) seems to adhere to the supergroup concept and is unobjectionable to me but the text needs to be brought in sync with this.
-
3.
I think the article would be more complete if the authors at least touched upon the prokaryotic roots of the modules that comprise the eukaryotic retroelements, for instance, the origin of protease of these elements from a distinct family of bacterial aspartic proteases (Krylov DM, Koonin EV. Curr Biol. 2001 Aug 7;11(15):R584-7).
Authors' response
1) To avoid excess in both manuscript size and topics, we have removed terms such as "adaptive landscape", "functional landscapes", and "landscapes of functional strategies" from the final manuscript version (see also our response to reviewer 2). However, in our humble opinion the underlying relationship between these terms and the notion of "fitness landscape" merits further attention as it may be not just metaphoric. That is, we noted that the distinct markers and their states evidence variability of diverse functional features of LTR retroelements. This variability gives universality to the network and represents functional phenotypes, which at the molecular level, offer taxonomical differentiation. Note that one can distinguish a retrovirus from a retrotransposon based on the presence or absence of an env gene, or can differentiate at least two distinct strategies of integration based on the presence or absence of a chromodomain module at INT [87], etc. It is thus reasonable that, if the features we describe enclose phylogenetically relevant taxonomic states, these suggest variations over function probably related with retroelement speciation. An interesting discussion is if "homoplasy" (convergence) represents a better explanation than "ancestral homology" (ancient divergence) of such universality. The acquisition of GFs seems independent as the original acquisition of such functions is not by inheritance from a common ancestor. This implies that the acquisition of a new mechanism is potentially capable to give selective advantage to the original ancestral lineage that captured the feature. Regarding PAMs, the similarity of their distinct states, the host-dependent hierarchy of these states, and the synchrony among particular states of distinct PAMs, suggest that the diversity of the system expands continuously, within and between families. Taking the Retroviridae as an example, one can see that this family can be divided into a number of genera collected in three classes. With a few exceptions, we found that the distinct elements classified in each class show the same markers states. If we now take Class 1 and within it the genus gammaretrovirus, we will find that each gammaretrovirus sequence has a particular marker combination. We originally described this observation as "functional landscape", as the relationship between this and fitness landscape derives from the fact that markers are lineage-specific and can be constituted in phenotypic combinations. Hence, is there is a phenotypic combination that is representative for gammaretroviruses, that is probably because such a combination is the most "adaptive" phenotypic combination for gammaretroviruses. If so, any variation of the usual combination might be representing functional variations in the fitness of a gammaretrovirus, and this can be tested by both empirical and computational methods. On the other hand, the redundancy of marker states observed between certain families is clear evidence of reticulate (network) evolution. This is not trivial regarding the discussion of power laws properties in the diversity distribution of LTR retroelements. Big-bang means can be related with power-law patterns (this is a universal property of real world networks). Taking this into primary consideration, we have rewritten the manuscript and improved several analyses to clarify this context.
2) Done, we removed expressions such as "crown eukaryotes" or "three way split" from the current manuscript version, updated Figure 4 (which in the final manuscript version is Figure 3) and rewritten the related text following the line traced by this reviewer and by reviewer 2.
3) Done, the new manuscript version includes and discusses the clan AA topic at the prokaryotic-eukaryotic transition.
Reviewer's report 2
Eric Bapteste (Dalhousie University, Halifax, Canada)
Reviewer comments
The present version of the manuscript requires substantial revisions. It is really difficult to read, both because its lack of focus and the impressive number of typos it contains (too numerous to be listed here!). The English also needs to be significantly improved: the manuscript simply cannot be published as it is. It is a draft. Although there are some interesting ideas in this work (the use of a non-tree-like framework to discuss complex evolutionary scenarios about mobile elements evolution), I do not feel its conclusions are solid enough or at least convincingly presented here.
Authors' response
The research has been significantly improved and the manuscript has been carefully re-written to make the language accessible to any reader. We hope that this expert will now find the new manuscript robust and straightforward.
Reviewer comments
Certainly, the polyphyly of Retroviridae into the Ty3/Gypsy bio-distribution is clearly showed. It has been extensively studied by the authors through the definition of polymorphic amino acids markers (PAMs), i.e. some key molecular characteristics used to classify mobile elements. Although I agree that Retroviridae origin is likely polyphyletic based on these criteria, I did not find that this fact proved the Three Kings Hypothesis (according to which Retroviridae of classes I, II and III trace to three Ty3/Gypsy sources that emerged at different times during evolution, before the split between Protostomes and Deuterostomes). In particular, I wonder whether the various mobile elements discussed here can not recombine with each other regularly enough, to the point that attempting any temporal polarization of their evolutionary history becomes impossible. To test that, I strongly encourage the authors to perform one more analysis that would value their research. They should use their PAMs to reconstruct a global network including Ty3/Gypsy and Retroviridae elements altogether. Such a network (improving over the fairly abstract figure 3) would display all the connections between these mobile elements and it would thus help deciding whether Retroviridae emerged from relatively well isolated Ty3/Gypsy lineages (or whether this is impossible to tell) and whether these Retroviridae can be suspected of subsequent recombinations (or not). This network reconstruction could be achieved by different means, either through the recoding of the PAMs features in a matrix of characters or by realizing a Splitstree analysis (for instance) of the whole alignments of Ty3/Gypsy and Retroviridae sequences. That way, it would become obvious whether their phylogenetic history is too complex to be told or follows the Three Kings hypothesis. Subsequently, the taxonomical context in which the different elements of this eukaryotic LTR retroelements network are found could be superimposed to it, thus allowing the identification of potential correlations between the taxonomic distribution and the recombination/emergence events in Retroviridae and other LTR retroelements evolution.
Authors' response
Done, the new manuscript version includes phylogenetic analyses and (as this expert recommended) network analyses. In particular, we have inferred the evolutionary history of the five groups addressed in the previous study. The phylogenetic inference is an important component of the manuscript. While the evolutionary history of these families has been fully addressed by prior research based on the RT, there is no previous study considering the LTR retroelement system as a whole, based on pol. This was important to elucidate the distinct markers and perform network analyses. With this aim, we used the tool SplitsTree to analyze the evolutionary history of LTR retroelements as a phylogenetic network. Then we superimposed the phylogeny of LTR retroelements over their distributions. The former figure 3 has been removed from this version and its information is redundant over Figures 6 and 7. These two are bipartite multigraphs addressing the same topic but in a more elegant way. From that point on, it is important to clarify that the main objective of the manuscript was not to test the three kings hypothesis but to provide new insights on the network mechanisms ruling in the evolutionary history of LTR retroelements. In this issue, the three kings hypothesis is a set of starting conditions to activated our attention to the possibility of a network history. In fact, while in the present paper we have tested such a hypothesis in diverse manners following the reviewer's indications, it was addressed and published in a prior paper [16]. On the other hand, we assumed various mechanisms of evolution including recombination but we did not try to test them. This is because their existence is well supported by prior publications investigating recent evolutionary patterns. We two interesting examples of recombinant histories in the potential Athila retroviruses of plants [9] and the HIV-like Retroviridae retroviruses [10] of mammals, whereby new forms emerge from the recombination of subtypes. Most ancient cases are difficult to test probably because LTR retroelements can recombine with each other regularly enough, to the point that attempting any ancient temporal polarization of their evolutionary history becomes a daunting task. Testing reticulate events between distant counterparts is extremely difficult because of 1) the different rates of evolution of the distinct genomic regions of the retroelement taxa; 2) the wide divergence between sequences accumulated during evolution; 3) extrinsic constraints; 4) conflicting signals, etc. For instance, based on the RT and RH the Retroviridae are more similar to each other than to any Ty3/Gypsy lineage but the PR and the gag polyprotein show different perspectives.
Reviewer comments
The authors could also discuss whether the variation of their Diversity Index H may be biased and partly due to an unequal taxonomical samplings between Deuterostomes, Protostomes, Viridiplantae and Fungi's mobile elements. In particular, they could establish whether the increase in diversity of mobile elements in Opisthokonts is significant. Presently however, the proposition that "many of these events [at least three independent radiations of mobile elements] are coincident with the major biological transitions and changes in molecular and cellular complexity of eukaryotes ", although seducing, does not seem fully justified by the material and analyses presented here. (In my view, this same criticism also holds true for the presumed "ancient radiation of mobile genetic elements that is as old as the transition from prokaryotes to eukaryotes" evoked - yet not tested- by the authors). To summarize, to justify that the big bang model better fits the data than the tree model, the manuscript should be more focused on Ty3/Gypsy and Retroviridae evolution and should better value the PAMs description and their analysis.
Authors' response
The diversity Index H analysis has been removed and substituted by network and relative frequency analyses. In particular, the notion of inflationary means of evolution is shown in figures 4-9. On the other hand, the set of sequences - 268 - used in this study is non-redundant and covers the most representative lineages of each family in plants, fungi, animals, and unicellular organisms. There is no bias in this study because it includes the non-redundant diversity of LTR retroelements to date known as well as a number of new sequences introduced in this study (summarized in Table 1). This means that we not only focus on all known lineages of LTR retroelements but also on new sequences and lineages addressed in this study (see, Figures 1 and 2). The three biological transitions were calibrated according to the most likely age of the LTR retroelement host as phyla. This has been conducted taking into primary consideration previous studies with a focus on molecular estimations and the fossil record. The transitions cover: 1) from the earlier eubacterial fossils and the first traces of unicellular algae eukaryotes until the segregation of crown eukaryotes into plants, fungi and animals: 2) From the split of plants, animals and fungi to the Cambrian explosion, the rise of vertebrates, the emergence of land plants, etc; 3) the origin of the gymnosperms; the split of amniotes into reptiles and mammals, the massive radiation of winged insects; and the emergence of flowering plants. The perspective of an ancient radiation of LTR retroelements is supported by the wide distributions of all LTR retroelement families and their differential distribution in eukaryotes. Ty3/Gypsy chromoviruses and the Ty1/Copia family are both widely distributed in the genomes of not only red and green algae but also in plants, fungi and animals. This derives in the common assumption these two are the most ancient LTR retroelement patterns in eukaryotes (see [7, 63]). A similar criterion applies in the relationship between distribution and age of the remaining LTR retroelement families. We think that figure 4 is the test asked by this referee, as it is a graphical description of what the post-genomic era suggests at this point over this topic.
Reviewer comments
Editorial changes to be considered:
The background section is very complex (with lots of information) and it is simply impossible to understand (for a non specialist) without a figure. This section needs extensive rewriting to clarify how the PAMs are defined and their nature. Interestingly, such a figure almost exists in the manuscript - it is figure 1-, and it should be introduced much earlier than page 8 (if possible by mapping on it the classes I, II, and III).
Authors' response
The whole manuscript has been completely rewritten and restructured. It includes a background section more accessible to any reader in broad terms. Much of the information mentioned by the expert has conveniently been moved to corresponding subsections under 'Results and discussion' and complemented by an improved version of Figure 1, although now it is Figure 3. The current manuscript is conducted appropriately to reach and understand the background of this figure.
Reviewer's comment to the revised manuscript
Reviewer comments
The paper by Llorens et al. entitled "Mapping the landscaping network principle in explaining the diversity and evolutionary patterns of eukaryotic LTR retroelements" is both too long and too complex (and in places hard to understand). It requires significant revisions. This manuscript should be:
-
1.
shortened
-
2.
clarified
To focus on its main original point, the proposition of a network framework to analyze the evolution of eukaryotic LTR retroelements and to show that, once a "good" combination of characters is obtained, then the LTR element mostly evolve within a lineage.
Authors' response
The final manuscript has been improved and reduced in size (we have removed 13 pages of the last version) to make it shorter and clearer than the former version as suggested by this reviewer.
Reviewer comments
To this end;
a) all the section entitled " Phylogenetic patterns of LTR retroelements based on pol" (p.3-11) could be significantly shortened (eventually removed or introduced as Supp. Mat). According to the authors, this part does not provide any really new perspective on the issue (cf. p.11), and truly the core argument of their paper does not start before p. 11 anyway
b) the text from "Under this scenario, " (p. 12) to 'in turn evolving in a tree-like fashion" (p.13), if summarizing ref. 19 can be removed.
c) almost all their figures, but figures 4 and 5, can be removed. Most of the networks presented here are too complex, in particular those with two kinds of nodes. It is not pedagogical. What the authors should do is reconstructing networks with one type of nodes (for instance the taxa) and connect them when two nodes (two taxa) share a common property (out of the 8 characteristics listed by them to define an LTR element). Alternatively, they could connect such nodes when they share a given combination of characteristics. This could be the only figure of their paper, and it would make their point. If they want to provide more in depth information, they could color the edges and the nodes of this single network according to various criteria (taxonomy, number of copies in taxa...) thus mapping any additional information they think is relevant.
d) there are many sentences, starting by the title, that are impossible to understand and should be rewritten. Sentences like:
-
"Here new strategies emerged passively and overlapped with prior strategies until configuring a complex network, whereby retroelement lineages co-opted the most adaptive landscape, depending on their molecular dynamics and host distribution. " (p.2)
-
"Here, it is important to emphasize that the three kings hypothesis does not intend to establish a direct relation between these lineages but that argues three Ty3/Gypsy ancestors in the evolutionary history of the Retroviridae common ancient poly- or paraphyletic scenario of diversity yet to be understood. » (p. 18)
-
"At the LTR retroelement level, the functional landscape of a LTR retroelement taxon can vary, or be more or less successful, if it gains or looses a marker or if a marker in this taxon evolves from one state into another." (p.24)
-
"Thus, new strategies emerged passively in the LTR retroelement system and overlapped prior strategies until configuring a network, whereby retroelement lineages co-opted the most adaptive landscape, depending on their molecular dynamics and host distribution. That is "the landscaping network principle" in explaining the diversity and evolutionary patterns of eukaryotic LTR retroelements. » (p.26) should be clarified or removed from the paper.
e) The notion of "landscape" is confusing and poorly explained. The analogy may fit or it may not. But the use of the word deserves more careful, direct, efficient explanations. The same criticism applies for the notion of "functional strategies". What does it mean? This lack of clarity entails that the notion of "landscaping network principle" is almost impossible to understand.
Authors' response
a) Phylogenetic analyses were performed in order to follow the line traced by reviewer 3. It would be conflictive if we remove such information. We have however reduced the text as much as possible, according to the criticisms of this reviewer, and particular care was taken in the process. In addition, it is important to highlight that the perspective that remains unchanged since 1998 (page 11) is the knowledge over the deep evolutionary history (the ancestral evolutionary relationships among families) and that phylogenetic reconstruction analyses cannot convincingly resolve this part of the LTR retroelement history. Actually, this is the foundational topic in this manuscript. Phylogenetic inference is essential as it is the most robust method to classify the OTU diversity into families and lineages. Our study makes an important update over prior knowledge of LTR retroelement diversity (and taxonomy). This is covered in the section entitled "Phylogenetic patterns of LTR retroelements based on pol", where we investigate the diversity patterns of all families and provide diverse and previously unpublished results.
b) Done in the manuscript.
c) Indeed, there are many methods to construct network models. Those we applied are well documented in graph theory, the area of mathematics that deals with the mathematical foundations of networks. Bipartite multigraphs (those with two types of nodes) are not complex; in fact, one of their properties is their simplicity. What is complex (because of its diversity) is the analyzed LTR retroelement system. Bipartite multigraphs are appropriate to evaluate the history of LTR retroelements based on two or more independent features, as this history does not depend only on a single feature. In this case, we considered markers (node 1), host distributions (node 2), and taxa or lineages (links). The obtained results can be quantitatively and qualitatively interpreted because the density of each set of links gives a clear vision of the frequency distribution of each PAM state under two independent scenarios. However, to simplify the final manuscript version (according to the general comment of this reviewer, "the manuscript is too long and too complex") bipartite multigraphs have been removed from results and are now provided within Additional file 3. On the other hand, we tested various graph methods. Constructing networks using the distinct taxa as nodes and the distinct states of a single maker as links only offers information regarding such a marker. By itself this kind of model is not very informative unless it integrates phylogenetic information. Note, for instance, that should we join "skipper" with "maggy" and "CoDi 7.1", based on a CCHC marker state, we have nothing in particular except that these sequences share that feature. For this reason it is important to resolve the multiple phylogenetic patterns. Constructing networks using taxa as nodes and using all markers as edges gives an impressive number of edges among all nodes. This results in a complete graph because the distinct markers have states, which are common not to two retroelements but to a number of them. Moreover, a systemic shortcoming with this model is that, in most cases, multiple distinct links among sequences within lineages mask other relationships which are a priori more interesting (e.g. those based on distinct families). In contrast, we found markers of reticulate evolution, which allowed us to overcome the mentioned obstacle. The most interesting aspect of these markers is their universality. Using combinations of markers we have phenotypes associated to phylogenetic identities. This allows a better interpretation of the network evolutionary dynamics, etc. Taking this into primary consideration and recognizing the contribution of this reviewer to the paper, we agree with him that the work needs a more explicative network framework addressing the main original point. This point is close to the feeling of this expert as noted in his phrase; "the proposition of a network framework to analyze the evolution of eukaryotic LTR retroelements and to show that, once a "good" combination of characters is obtained, then the LTR element mostly evolve within a lineage". With this aim we have rewritten the manuscript and revised, improved, and adapted undirected graphs to make them more comprehensible in the line traced by this reviewer.
d) The sentences addressed have been revised and restructured in order to better convey their intended meaning.
e) We have changed the term "functional landscape" by other terms such as "marker combination" and/or "phenotypic combination" (see also our response to Reviewer 1, in regard to a similar question). For the same reason, we have changed the title of the manuscript to a more appropriate one.
Reviewer minor comments
The authors should be more careful in interpreting "absence" if sequences from incomplete genomes are considered in their analyses. The possibility of an undersampling of the LTR retroelements should be discussed.
Authors' response
The study contemplates all currently known Caulimoviridae and Retroviridae genera considered at ICTV, plus a new spumaretrovirus sequence we introduce for the first time in this article. We also consider a number of Ty3/Gypsy, Ty1/Copia and Bel/Pao LTR retroelements. This not only covers all commonly known lineages but also extends knowledge on the phylogenetic diversity of these families (we describe new sequences and new lineages). Another question is if such a sample is a good approximation of the true diversity of LTR retroelements in eukaryotes. Upon this, the study evaluates a number ofnon-redundant Retroviridae retroviruses ranging from distinct fishes, amphibians and sauropsids to mammals. We equally consider a number of Ty3/Gypsy, Ty1/Copia and Bel/Pao LTR retroelements retrieved from the genomes of distinct protists, plants, algae, fungi, amoebas, plathyelminthes, nematodes, cnidarians, crustaceans, winged and non-winged insects, echinoderms, urochordates and vertebrates. We believe that this gives sufficient information to extrapolate conclusions and perspectives over both LTR retroelement evolution and macroevolution, even if some of the used host genomes are incomplete to date. We have shown that plants have particular lineages of LTR retroelements, and so do fungi, protostomes and deuterostomes, etc. Indeed, if one were to investigate LTR retroelements in the genome of a new flowering plant, a priori anything might be found, but what any expert in the topic expects to find are chromoviruses, Athila/Tat Ty3/Gypsy elements and Ty1/Copia elements. That is, it is really difficult to think that one can find a Retroviridae or a Bel/Pao sequence in such genome. However, should such a rare case happen, the most probable explanation is first, lab contamination, and second, a very rare horizontal transfer, an exciting exception meriting an important publication (despite all, nothing is impossible). With this manuscript we update prior perspectives but we are sure that further availability of data will help to improve and calibrate the introduced framework. With this, our conclusions are not problematic or biased are just a point in which we recapitulate prior knowledge, contrast current information, and offer a new framework for further evaluation (see also our response to reviewer 3, in regards of the same question). Another question is the presence of evolutionary gaps due to the loss of lineages associated to eukaryotic extinctions. Such a bias is plausible but we think that it is not an inquisitive obstruction for studying and modeling the evolution of LTR retroelements or that of their hosts. In fact, this is another important argument supporting the idea of using markers combinations instead of taxa. Networks based on combinations are not biased by the sampling or by the number of sequences used for each lineage. Even more interestingly, they give some clues about putative extinguished forms or extant uncharacterized ones, as we have illustrated in current Figure 5.
Reviewer minor comments
The authors indicate that their pol data contained multiple distinct splits, and that these splits can be explained by different modular, reticulate, and saltational evolutionary events (p. 11). They could also be due to "mutational saturation", which should be discussed.
Authors' response
We cannot dismiss mutational saturation in diverse traits of a retroelement genome. However, the sequences used in this study correspond to coding sequences from which we extrapolated their most conserved parts (cores) to perform alignments. All these cores show lineage specific phylogenetic signal. That is, all sequences of all lineages are more similar to each other than to other sequences, or in other words, we did not observe multiple substitutions in these traits that could lead us to think that their signal is random (mutational saturation). However, the discussion of this possibility is important and has been addressed in results to make an emphasis in the distribution of PAMs in both phylogeny and host distribution, which cannot be explained by random patterns.
Reviewer minor comments
The authors should use a better phylogeny of reference (see Simpson and Roger in Curr. Biol.): their knowledge of the protist taxonomy is a bit too vague (see for instance strange notions such as the supposed "three way split of plants, animals and fungi" (p.15), the odd branching of diatoms that the authors said are "informally classified as protists" (p. 18)...)
Authors' response
Done. For simplicity's sake we used reference [41] suggested by Reviewer 1 (see the comments of reviewer 1), as it is compatible and a bit more recent than the one suggested by this reviewer. In a similar manner, the expression "three way split" has been removed from the manuscript. Regarding diatoms, as far as we know, they are considered as chromalveolates. The root of chromalveolates in former Figure 4 (now Figure 3) delineates a star tree together with the remaining supergroups. That is correct on the basis of the investigated hosts and their LTR retroelement sequences we classify. Bear in mind that our study focuses on the evolutionary history of LTR retroelements, not on the most accurate tree of life topology.
Reviewer minor comments
Likewise what are "crown" eukaryotes (p. 15)? Or rather what eukaryotes are not crown eukaryotes? This notion is problematic. And so his the notion of reptiles (a grade not a clade) for a phylogenetic-based interpretation (p.17). Strictly speaking, sentence like "In tetrapods, for instance, the Ty3/Gypsy and the Retroviridae distributions overlap until reptiles, but there is no evidence of functional Ty3/Gypsy elements in other amniotes (neither Ty1/Copia nor Bel/Pao). " (p.17) are problematic (and meaningless) for many biologists trained as cladists. (This can be of concern since the authors seem to embrace the cladistic logic when they reject protists since these taxa are a paraphyletic group, p. 18).
Authors' response
The manuscript has been amended to offer a commonly accepted vision. In particular, we removed the terms "tetrapods" and reptiles from the text and use terms such as sauropsids or synapsids. In regard to protists we do not reject them, we just noted that as a group they are not considered monophyletic. We have however adapted Figure 3 according to the most recent trends over the tree of life. We hope this referee will find the new topology appropriate.
Reviewer minor comments
Are the first trace of unicellular algae really as ancient as 3,500 Mya? (p.14)
Authors' response
The text did not exactly claim this. What we said is that the first transition covers "from the earlier eubacterial fossils and the first traces of unicellular algae eukaryotes to the segregation of crown eukaryotes into plants, fungi and animals" (this is from 3,500 to 1500-1330 Mya). We have rewritten the text to clarify this and have changed "crown eukaryotes" by a more appropriate term.
Reviewer minor comments
Some claims should be slightly toned down, such as "The distribution of Ty3/Gypsy chromoviruses not only in algae [21, 68, 69] but also in land plants, amoeba [32], fungi and animals [20, 21] indicate that these constitute the oldest branch of Ty3/Gypsy LTR retrotransposons." (p.14) It does not "indicate" this, it "suggests" it at best. What if these elements were moving over large taxonomical distances? Broad distribution, in presence of lateral transfers, is hard to interpret as ancient common ancestry. The same comment applies to the sentence "It is thus unclear which of the two represents the oldest phylogenetic pattern in the Ty1/Copia family, but the wide distribution of both branches indicates that the Ty1/Copia family co-existed with chromoviruses before the segregation of the crown group into plants, fungi and animals (1,550 Mya according to molecular dates [71, 72])" (p.14), and p.19.
Authors' response
Done, the above points have been rewritten with the required care, and following the reviewer's recommendation. However, it is important to emphasize that, when evaluating LTR retroelements, broad distribution in presence of lateral processes is not hard to interpret, as these lateral processes happen between organisms of the same phylum. Caulimoviruses infect plants, retroviruses of insects normally infect insects, those of vertebrates infect vertebrates, etc. This means that there are biological barriers imposed to caulimoviruses and retroviruses and that these barriers have an evolutionary meaning. On the other hand, it is obvious that while there are some examples of LTR retrotransposons believed to be horizontally transmitted, the usual means of LTR retrotransposon transmission are germ lines. This is commonly accepted and indicates that the wide distribution of the LTR retroelement diversity is a sign of deep ancestry. This is so even if we characterize events of lateral transfer, which usually only occur among closely related biological species. Detecting ancient events of lateral transfer is a daunting task but in certain cases can be mapped with reservations (note the discussed example of chromoviruses in fungi and vertebrates addressed in this paper). In one way or another, this suggests that LTR retroelements can be used as evolutionary markers of their host evolution.
Reviewer minor comments
I was unable to visualize the figures of network in the supp. mat.
Authors' response
As indicated in methods, these files are not figures. Such material is provided as Mathematica notebooks, so that any other author can reproduce our analyses using the tool "Mathematica". To facilitate visualization and management of this material, the final manuscript version joins Table 2 and all these notebooks in a mini web-site provided as Additional file 4 (for more details, see Methods).
Reviewer minor comments
There are additional sentence for which the meaning is unclear:
"These markers indicate that the diversity patterns of LTR retroelements are not casually distributed. » p. 2 What does casually means?
Authors' response
The intended word is "randomly", changed in the manuscript.
Reviewer minor comments
"The model finds support in the phylogeny of LTR retroelements superimposed over their distributions." What distributions? Taxonomical ones?
Authors' response
Distributions refer the distinct host distributions of LTR retroelements. We have clarified in the manuscript.
Reviewer minor comments
"The evolutionary history of LTR retroelements is not a tree but a networking system evolving in a tree-like fashion" (p.11) is a bit confusing. It should be improved somehow. Same thing for "Under the assumption that functional landscapes have not evolutionary meaning, what we would expect is no redundancy among them." (p.23) and "As this number is much closer to the number of LTR retroelement lineages elucidated than to the number of LTR retroelement taxa we can conclude that the landscapes are lineage specific equally to the markers constituting it. " (p. 23)
Authors' response
Again, these points have been rewritten. We hope that the revised text is now appropriately exposed.
Reviewer minor comments
The English is much improved (there are still a few oddities:"fossil register" should be "fossil record"
Authors' response
Changed in the manuscript
Reviewer minor comments
How can biological transitions be edges on the multigraph network? (p.31)
Authors' response
Transitions constitute links; it is a relationship derived from the distinct LTR retroelement hosts. Anyway, to clarify understanding of the manuscript, we substituted the model based on transitions by a more appropriate model more according to previous points addressed by this reviewer.
Reviewer's report 3
Emmanuelle Lerat (Université de Lyon, Villeurbanne, France)
Reviewer comments
In this paper, Lloréns and Moya, using the occurrence of protein markers in different products of retroelement and retroviridae sequences, have matched the combination of signatures according to the host species in order to determine the link between elements. This approach is by some extend similar to other methods consisting of phylogenetic reconstructions based on presence/absence of genes. The approach is quite interesting and gives a different view of the classical phylogenetic representations.
Authors' response
We thank this expert for her help and positive feedback on the background's manuscript. This new manuscript version is a great improvement over the former version. It is co-authored with three other researchers and includes more sequences of not only the Ty3/Gypsy and Retroviridae families but also of the Bel/Pao, Ty1/Copia, and Caulimoviridae families. Based on this material we have performed more and new analyses following the line traced by this and the two other referees. We hope this referee will find this new manuscript version improved and useful.
Reviewer comments
I have however some criticisms upon different points and speculations made in this paper. The first problem can seem trivial but I am really concerned by the fact that all the article is based on another manuscript by the same authors currently under submission in another journal. It is completely unusual. Generally journals do not allow reference to any publication unless it is at least accepted. What if the other manuscript is never accepted? What if reviewers point particular problems that would completely change the conclusions concerning the existence of the protein markers? That would be a complete paradox. I don't know what is the position of Biology Direct upon such a problem, and in doubt I will assume the other submitted paper as accepted even if it is absolutely not satisfactory for me.
Authors' response
In this regard, we think that the policy of Biology Direct is similar to that of other journals (on the basis of our experience with this journal). Our aim when submitting the first version of this manuscript to Biology Direct was just in order to have it reviewed in advance, but we were aware that we should/must wait regarding this publication until having the first manuscript published (currently available under the following citation [16]).
Reviewer comments
In the introduction, the authors present the different families of Retroviridae (alpha, beta etc.), and the different classes I, II, III. I think that a clearer explanation about the relationship between the two classifications and also where the Ty3/gypsy are positioned in this system would greatly help readers that are not familiar with such classification. About classification, a reminder of what contains the Ty3/gypsy group seems essential (with a table for example) and would facilitate the comprehension of some parts in the paper. For example, p5 in introduction, it is said that GANG architecture of class I is similar to several Ty3/gypsy and that GIGG of class II is similar to Ty3/gypsy lineages like micropia/mdg3 and others. That means that we find the signature of the two classes in different members of Ty3/gypsy. Maybe you should use the term metaviridae to name the global group Ty3/gypsy to avoid confusion? I would also point out that chromoviruses are classified as members of the Ty3/gypsy group (Gorinsek et al. 2004). The references of "phyla", "sample", "lineage", "clade" are also confusing. The authors should homogenize the paper on that point.
Authors' response
The text in the section "Background" referred by this reviewer has been moved to "Results" where we make a description of all used evolutionary markers. The new manuscript version presents new phylogenetic and network analyses. In particular, we performed a comprehensive phylogenetic analysis of all families based on the pol polyprotein (see methods). In this new version, we deal with a non-redundant set of 268 sequences belonging to the five aforementioned families, many of which are introduced for the first time in this manuscript (see Table 1). This new manuscript version also presents and discusses a phylogeny for each family and other based on all LTR retroelements (provided in Additional file 1). This will help the readers have a perspective on the multiple distinct phylogenetic patterns in the LTR retroelement system, which in turn helps in clarifying why the different markers are lineage-specific within families or redundant among families. This is the network basis, which as this referee indicates means that we find two or more states of the same signature in different lineages of the Ty3/Gypsy family and all other investigated families. Regarding chromoviruses, in the previous manuscript version we did not intend to separate this branch from the Ty3/Gypsy family. We apologize if such a perspective was implied in any way. In fact, previous to the approach of Gorinsek et al. cited by the referee and the corresponding author, together with other researcher (Dr. Marin), described chromoviruses as Metaviridae Ty3/Gypsy LTR retrotransposons [28]. Before, Wright et al. had originally described chromoviruses as a Ty3/Gypsy class [79] and later, shortly before Marin and Llorens work, Malik and Eickbush described the chromodomain at INT of these Ty3/Gypsy LTR retrotransposons [4]. The contribution of Gorinsek and Kordis et al. in [63] was important and overdue as they showed (among other pieces of evidence) that chromoviruses are the most ancient branch of Ty3/Gypsy LTR retroelements (an important evolutionary perspective supporting this paper). Anyway, we have rewritten the manuscript to clarify that chromoviruses are Ty3/Gypsy LTR retroelements. It is difficult however to refer chromoviruses simply as Metaviridae elements because, as we show, they constitute the largest phylogenetic branch in the Ty3/Gypsy family, while the remaining Ty3/Gypsy retroviruses and LTR retrotransposons (including the Errantiviridae) fall in another branch. Of course, chromoviruses are Metaviridae Ty3/Gypsy LTR retrotransposons, but perhaps what the current LTR retroelement classification needs is more taxonomical levels such as groups, orders, families, classes, genus, clades, etc. The establishment of these levels is a daunting task because they should be concomitant with 1) the phylogenetic patterns of each family; 2) the diversity patterns common to all families, which in turn are polyphyletic. We think that the network shown in this paper may help in this regard. However, meanwhile the current Metaviridae classification is discussed, we would appreciate if this kind reviewer would allow us to simply describe chromoviruses as Ty3/Gypsy LTR retrotransposons in order to avoid higher confusions (note for instance that the Errantiviridae are Ty3/Gypsy elements, but they are not Metaviridae elements). Finally, the three classes pointed by the reviewer are specific in the classification of Retroviridae retroviruses. This means that no Ty3/Gypsy lineage has place (to date) in such a classification (see [16, 35] and references therein). As suggested by this reviewer, we have rewritten the manuscript to avoid confusion within and between LTR retroelement families and between these and the eukaryotic taxonomies. We use the terms "genus" and "class" when referring to retroviruses and viruses such as Retroviridae and Caulimoviridae according to ICTV and most recent taxonomical approaches (see [16, 35] and references therein); we use the terms "genus" and "clade" to describe the Ty3/Gypsy, the Bel/Pao and the Ty1/Copia lineages supported by bootstrap, which are in agreement to current ICTV classification or are commonly accepted in the field; and use the term "branch" for describing (with just descriptive purposes) the deep clustering elucidated in this study regarding the Ty1/Copia, the Ty3/Gypsy, and the Bel/Pao LTR retroelements. We hope this expert will find now the topic cleaner and clearer for any reader and the manuscript straightforward.
Reviewer comments
The material and methods part is not clear. You don't need to present all the content of the GyDB database as there is already a publication on it. It would be better to know exactly what element you have analyzed from which species.
Authors' response
Done, we have re-written the whole manuscript, but we took particular care in to remove all references to the GyDB except when it is strictly necessary (citation, URLs, etc). The new manuscript version includes an inferred phylogeny of all sequences used (Additional file 1) with information of the names, hosts, and Genbank accessions of all sequences. The file is provided in html format so any reader can directly access more information and/or the Genbank accession of each sequence by clicking its name or acronym in Additional file 1.
Reviewer comments
The authors propose to compute an index variability for each element based on the different possible combinations of signatures. The results are shown on table 1. It is probably an interesting way to have a rapid look at the diversity present in species. The trouble is that the authors never really refer to this table and they don't either explain what they are expecting from the values. Especially how the H value is supposed to vary? I don't really see the point of computing this index as it is almost not interpreted and used in the paper.
Authors' response
In the new manuscript version of this work, the index H is redundant over the set of different network analyses performed and was removed from this particular issue. See also our response to Reviewer 2. In particular, network analyses shown in Additional file, 3B-C present similar perspectives than the previous index H but in turn they are more informative and presented in a more elegant way.
Reviewer comments
A particular assumption is made concerning the observations of the authors and a theory emitted by E. Koonin concerning evolution. I think this is a clear over statement. First of all, the observations made by the authors cannot be taken as proof that each increase in diversity is coincident with major biological transitions (as stated p22 in discussion). There are no time scales and too many missing data concerning the representation of species and elements to allow such conclusion. Concerning the adequacy with the theory of Koonin, this is also quite speculative. Again, the scales are not the same. Transposable elements and viruses are known to be able to quickly evolve by gene acquisition and genome recombination between elements. When a new element is formed it does not mean that a new host species is born. Even if in some case, the link between mobile elements and speciation is possible, I encourage the authors to be more prudent with their conclusions.
Authors' response
Done, we toned down the speculation and refer only the Koonin's model where it is strictly necessary. It would be interesting to highlight that while the scales are different we describe a similar evolutionary means at the molecular level. The evolutionary principle is invariant and while several points of the Koonin's model might be debated (for instance, are the evolutionary phases of Koonin's model really fast and lower steps or they are simply transitions?) our study gives support to the notion of an inflationary mode of evolution. This is exactly the perspective derived from Figure 3, which is an evolutionary map based on the diversity and distribution of LTR retroelements in eukaryotes. We therefore think that our study gives support to the commonly assumed notion that viruses and mobile genetic elements are evolutionary indicators of the evolutionary history of their hosts and vice versa. In the previous manuscript, our interpretation when relating the evolutionary history of LTR retroelement with that of eukaryotes was in order to point the role of mobile genetic elements as evolutionary vectors in the evolution of eukaryotes towards the complexity (as suggested in [1] and other studies), not that the origin of a new LTR retroelement lineage will raise to the born of a new biological species. To clarify this whole framework we have carefully rewritten the manuscript, where we have toned down the speculation and aged the different LTR retroelement host distributions, according to prior molecular estimations and data derived from the fossil record. We hope this expert will find now this scenario better presented and supported by previous research.
Reviewer comments
The authors propose the hypothesis of the 3 kings as originators of Retroviridae and Ty3/gypsy elements. But they temper their position in the discussion p26 saying that there could be more than three classes. They also propose that adding the endogenous retroviruses could possibly change this view. The thing I don't understand is why not having added endogenous retroviruses in this analysis. I don't think that a particular analysis dedicated to these elements makes sense. Moreover, I am wondering if the observed diversity mainly in vertebrates concerning Retroviridae is not biased by different effects like a bias in species sequencing. It seems that the high diversity observed in deuterostomia is mainly due to the retroviruses. I am not convinced that there are less diversity in arthropods or even in plants. You can also imagine that other viruses have been more successful in plants for example that are not represented here because they are not member of the retrovirus classes.
Authors' response
The main focus of this research is the network history of not only the Ty3/Gypsy and Retroviridae families but also in all other LTR retroelements and caulimoviruses. Again, we apologize for the way we prepared the first manuscript version, where the Ty3/Gypsy and Retroviridae families seemed the main focus of research. The Ty3/Gypsy and Retroviridae network and the three kings hypothesis were published in [16]. In this paper these are the starting conditions that activated the study. To clarify this, we have entitled the manuscript with a more appropriate title. Regarding the question addressed by this reviewer, the new manuscript version collects both "Results and Discussion" in a single section. Here, we test the three kings hypothesis in diverse manners (see also our answer to Referee 2) but the main manuscript focus is the network principle. In fact, the most precise way to investigate the Ty3/Gypsy and Retroviridae network is by studying it based on all LTR retroelement families known to date. On the other hand, the three kings hypothesis does not argue that all Ty3/Gypsy and Retroviridae LTR retroelements evolve from three common ancestors but that the three Retroviridae classes delineate Ty3/Gypsy ancestors in the evolutionary history of Retroviridae retroviruses. This sounds similar but it is not the same. The three kings hypothesis neither proposes that Class 1 directly evolves from Athila/Tat elements, nor Classes 2 and 3 evolve from Micropia/Mdg3 clade and errantiviruses, but that all of them evolve from a common ancient poly- or paraphyletic scenario with different times of emergence. Our study (in both versions) includes a number of endogenous Retroviridae retroviruses. In particular, the two Retroviridae genera - gammaretrovirus and betaretrovirus - are rich in both endogenous and exogenous retroviruses (for more information in this topic see also [35]), so we certainly implement a wide number of Retroviridae endogenous retroviruses in our study. There is not a significant bias for the different Retroviridae species used. This can be seen in Additional file 1 and in Figure 3, which show how the multiple distinct Retroviridae sequences we evaluate cover a wide host range, from fishes, amphibians, reptiles and mammals. The range covered by this framework has sufficient information to perform hypotheses and conclusions because this is what the post-genomic era suggests at this point. As this reviewer has interestingly pointed out, the different gaps in this biological history, which are derived from bias in the availability of sequenced genomes, will calibrate this map in the future. However, this is not a bias inherent in the corpus of available data. Our study is a first step of current and further research value, and in agreement with previous arguments [38], "surveillance for emerging diseases should extend to sampling and characterization of the entire panoply of viruses, which are circulating not only in people but also in animals" (and all other organisms). Obviously, we temper our position regarding the three kings hypothesis because further sequencing projects might reveal in the future that the diversity of Retroviridae retroviruses needs more genera and taxonomical classes to be explained. However, if this occurs, it will not be due to a biased hypothesis, but due to an update based on more and better information. This will not change the fact that while Ty1/Copia, Bel/Pao, and Ty3/Gypsy LTR retroelements spread in both protostomes and deuterostomes, the Retroviridae are only distributed among diverse deuterostomes. This by itself shows that the diversity of LTR retroelements in deuterostomes is greater than that of protostomes and plants, even if we include caulimoviruses of plants (as we have done) in the study. The known to date genomes of plants and fungi, show only Ty3/Gypsy and Ty1/Copia elements and it is difficult to think that further sequencing data will change this scenario. Moreover, this study includes caulimoviruses because of their retroelement-like gag-pol component close in similarity to the Ty3/Gypsy family. While caulimoviruses stay in a separate system of classification because they are DNA viruses, their diversity based on evolutionary markers in their gag-pol component is lower than that of Retroviridae retroviruses. At this point, it is important to highlight that this study focuses only on LTR retroelement s (we included caulimoviruses because they evolve from this system). However, we do not known if plants, fungi, protostomes and deuterostomes are more or less diverse in other types of RNA or DNA viruses and mobile genetic elements because the topic addressed in this study is only the particular system of LTR retroelements and related caulimoviruses.
Reviewer's comment to the revised manuscript
I have read the new manuscript and the responses made by the authors. They have indeed made a lot of modifications that make the manuscript very interesting and much more clear than the first version. I don't see any new comments and I think the article can be published.