Distinct groups of repetitive families preserved in mammals correspond to different periods of regulatory innovations in vertebrates

Background Mammalian genomes are repositories of repetitive DNA sequences derived from transposable elements (TEs). Typically, TEs generate multiple, mostly inactive copies of themselves, commonly known as repetitive families or families of repeats. Recently, we proposed that families of TEs originate in small populations by genetic drift and that the origin of small subpopulations from larger populations can be fueled by biological innovations. Results We report three distinct groups of repetitive families preserved in the human genome that expanded and declined during the three previously described periods of regulatory innovations in vertebrate genomes. The first group originated prior to the evolutionary separation of the mammalian and bird lineages and the second one during subsequent diversification of the mammalian lineages prior to the origin of eutherian lineages. The third group of families is primate-specific. Conclusions The observed correlation implies a relationship between regulatory innovations and the origin of repetitive families. Consistent with our previous hypothesis, it is proposed that regulatory innovations fueled the origin of new subpopulations in which new repetitive families became fixed by genetic drift. Reviewers Eugene Koonin, I. King Jordan, Jürgen Brosius.


Background
Transposable elements (TEs) are DNA segments that are capable of reproducing and inserting themselves into genomic DNA using a number of different mechanisms. Typically, some TEs remain active for a limited period of time during which they produce mostly inactive copies of themselves known as families of interspersed repetitive elements, or repeats. The TE families are preserved in eukaryotic genomes over an extended period of time but the number of elements per family declines with time due to attrition of non-functional DNA, also referred to as "junk DNA". Some of the TE-derived repeats or repeat fragments originally fixed in the population may become recruited as functional components of the non-coding genomic DNA [1][2][3][4][5][6][7][8], and those can be identified in DNA segments conserved across species [9,10]. Thus, the overall number of repetitive elements in the conserved genomic regions, relative to the total number of repeats in the entire genome, is expected to grow over time, primarily due to attrition of the repetitive DNA that didn't assume any functional role in the genome. We use this process to identify ancient repetitive families of different age.

Analysis of ancient families of TEs
We selected 381 consensus sequences from Repbase [11], which represent repetitive families present in all mammalian species sequenced to date. We also report an additional 152 families of conserved repetitive elements (Additional file 1: Table S1, column F; consensus sequences to be released in Repbase), which were identified and reconstructed using the human genome sequence data (see Methods). The total set of 533 consensus sequences present in all mammals (listed in Additional file 1: Table S1, column C), was screened against the human genome, using Censor [12], to determine the genomic count of repetitive elements for each family. The same set was screened separately against the previously published human genomic regions conserved among different vertebrates including mammals and birds, and representing~5% of the human genome [10]. Using binomial and chi square statistics we identified 266 families that are significantly overrepresented in the conserved regions relative to the rest of the human genome, which are hereafter referred to as families of "conserved repeats". Most of the 266 consensus sequences, with the median length of~238 bp, are likely fragments of ancient repetitive elements. To date,~44% of them were classified as belonging to a particular group of transposable elements based on the presence of TE-specific features or similarity to known TEs.
For each of the 266 families, we divided their numbers of repeats present in the conserved regions (cTE) by their total numbers in the human genome (gTE). Figure 1 presents the cumulative numbers of families corresponding to different cTE/gTE ratios for the entire set of 266 families. The families cluster around two major peaks separated by a minimum around 0.35. The first peak corresponds mostly to younger repetitive families present in mammals only (Figure 1, red), and the second one represents more ancient families shared by mammals and chicken (Figure 1, green). The clustering is similar for families positively identified as derived from TEs ( Figure 1A) and for yet unclassified repetitive families ( Figure 1B). Thus, the cTE/gTE ratios can be used to approximately rank the conserved repetitive families from the youngest to the oldest ones. Figure 1 Frequency distribution of 266 conserved repetitive families based on their repeat counts in the conserved regions (cTE) and in the entire human genome (gTE). Each family is counted in the interval (bin) with its corresponding cTE/gTE ratio (vertical axis). The red bars indicate families present in mammals only, and the green bars represent families shared by mammals and chicken. A. Frequency distribution of 119 classified families of conserved TEs, analogous to that in Figure 1. B. Frequency distribution of 147 unclassified families of conserved TEs, analogous to that in Figure 1.
A comparable clustering can be obtained using a different approach based on the ratio of repeat densities per 1 Mb of the mouse (mdTE) and of the human genome (hdTE). The ratio reflects a faster attrition of non-essential DNA in mouse than in human, driven by differences in the mutation rates between the two [13]. In Figure 2, the 266 families of conserved repeats are also separated into a younger group present in mammals only (red bars), and the older one shared by mammals and chicken (green bars), based on the mdTE/hdTE ratio. The grey bars represent most of the 533 mammalian-wide families (for details see Materials and Methods). The two clusters of conserved families presented in Figure 1 and Figure 2, are likely to be remnants of at least two major outbursts of repeat amplificationone predating the separation of the mammalian and bird lineages and the other one corresponding to subsequent diversification of the ancient mammals prior to the origin of eutherians.
Clustering of repetitive families is not only found among the ancient families of conserved repeats, but also among mammalian repeats that are species and lineage-specific. In Figure 3 a separate, younger cluster (blue bars) is represented by the "human-specific" families, partially overlapping with the cluster of the entire set of 533 mammalian-wide families (red bars), based on sequence differences between individual repeats and their family consensus sequences (horizontal axis). The "human-specific" families of TEs are defined as those present in humans and in primates ancestral to humans (listed in Additional file 1: Table S1, col. A). In Figure 3, the mammalian-wide families form a single distribution rather than the bimodal distribution presented in Figures 1 and 2 because the separation of the more ancient families of TEs cannot be accomplished simply based on their sequence identity to consensus. Noticeably, the decline in the number of mammalianwide families in Figure 3 coincides with the outburst of the human-specific families. A strikingly similar distribution pattern can be seen for a different set of TEs from the mouse genome ( Figure 4, blue bars). The number of   mouse/rodent-specific families also began growing rapidly during the decline in the number of mammalianwide families (Figure 4, red bars).
Remarkably, the clusters shown in Figures 1-4 also largely correspond to independently identified periods of regulatory innovations [14]. The most ancient period, prior to the separation of mammalian and bird lineages, was dominated by regulatory innovations near transcription factors controlling the key developmental (trans-dev) genes [14]. The second period, approximately between 300 and 100 million years ago (Ma), was characterized by a gradual decline in changes near trans-dev genes accompanied by a high frequency of regulatory innovations near receptors of extracellular signals. A possible third period of regulatory innovations can be also seen in placental mammals [14].

Repetitive families, population structure and speciation
Outbreaks of new TE families are often lineage and species-specific [15][16][17][18][19][20][21][22][23][24], which suggests a potential involvement of TEs in speciation. Such an involvement has been repeatedly explored since the eighties but no general explanation has emerged that would address the observed correlation between the origin of repetitive families and speciation (reviewed in [24,25]). The subject is attracting a renewed attention thanks to the growing avalanche of the genomic data on repetitive families. A recently published TE-Thrust hypothesis [26] correlates a massive activity of TEs with speciation, which indirectly links the origin of repetitive families and the origin of species. Specifically it states that: ". . .there is differential survival of those lineages that contain or can acquire suitable germline repertoires of TEs, as these lineages can more readily adapt to environmental or ecological changes, and can potentially undergo, mostly intermittently, fecund radiations." The critical applicable questions are: (1) what is the suitable repertoire of TEs, and (2) why the entire sets of TEs expand into multiple families only at certain times in the history of a lineage and then decline? Our data indicate that major expansions of repetitive families were triggered and inactivated at least several times during the history of the mammalian lineage. In the original version of their hypothesis, the authors attribute the explosion of new repetitive families to "waves of TE infestation" [27]. Even if a substantial acquisition of active TEs by a population were taking place, there is no compelling evidence that their activities alone would automatically lead to a massive fixation of new families and new adaptations.
Our alternative hypothesis, also referred to as the carrier subpopulation (CASP) hypothesis [24], emphasizes the fundamental role of the population structure as the key factor driving the expansion of repetitive families and their impact on genome evolution. It specifically states that most populations carry random repertoires of active TEs, which can have mutagenic impact due to fixation of multiple copies in the population [24]. It further states that small populations are the primary incubators of repetitive families, due to fixation of TEs by genetic drift. Therefore, the fixation of TEs depends strongly on factors affecting the population structure of the host. The factors include major biological breakthroughs or cumulative innovations that can lead to new adaptations, or environmental changes that can open new niches due to extinctions of competitors. Such factors are likely to promote the increase in the number of small sub-populations exploring and adapting to new niches. The hypothesis implies that TEs are mere multiplying opportunists that may contribute to evolutionary changes mostly due to their mutagenic activities in small sub-populations. Interestingly, their highest mutational impact coincides with explorations of new adaptations by small subpopulations.
TE-generated mutations are essentially stochastic in nature, and many of them will have negative impact on survival. Small populations are much more vulnerable to extinction than the large ones [28], but those which survive are likely to become repositories of multiple pre-screened mutations that can be carried back to the meta-population or exchanged between different subpopulations by crossbreeding. Thus, multiple extinctions of small sub-populations may be an important evolutionary path of weeding out harmful mutations. Some of the surviving subpopulations may continue to diverge due to accumulation of TEs until they become founder populations for new species. A successful crossbreeding between individuals from diverse subpopulations may also lead to an abrupt emergence of new species [29].

Biological innovations and the diversity of repetitive families
The results presented in this paper illustrate a recurrent rise and fall in the number of new TE-derived repetitive families during the history of the mammalian lineage, which coincides with evolutionary periods when different types of regulatory innovations began and ended. According to the CASP hypothesis [24], the rise and fall in the number of the repetitive families was caused by the rise and fall in the number of viable subpopulations that originated during that period. The origin of new subpopulations was probably fueled by adaptive advantages introduced by changes in gene regulation. Population subdivisions facilitate the fixation of TE insertions by genetic drift. The fixed TEs may become the raw material for biological innovations. The recruitment or elimination/attrition of inserted TEs is on a timescale that far exceeds the lifespan of any subpopulation. Moreover, regulatory innovations do not necessarily coincide with TE fixation. Once a regulatory innovation takes place, the population is likely to explore new niches, and to become subdivided again.
Based on this scenario, the mammalian genomic DNA combines families of TEs incubated in countless subpopulations over hundreds of millions of years. Most of the TE copies deposited in the genome are presumed neutral, and they are lost over time by attrition. A minority of TEs became exapted and survived as a conserved DNA. The attrition of TEs continues at a relatively steady rate as indicated by the time-dependent increase in the fraction of conserved TEs, relative to the total number in the genome.
It remains to be elucidated why a particular pattern of regulatory changes continued for hundreds of millions of years. Overall, the data imply that each pattern of regulatory innovations yielded diminishing adaptive advantages over time and eventually became exhausted. The cycle was repeated again as soon as new productive types of regulatory innovations came into play.
In summary, the observed outbursts of ancient families of TEs might have been associated with the origin of small subpopulations exploring the regulatory innovations in different ecological contexts. Crossbreeding between individuals from different subpopulations increased the number of different TE families in the surviving populations. The populations likely went through multiple subdivisions and crossbreeding during the exploration of regulatory innovations. After each extended period of innovations ran its course, the number of viable subpopulations declined and fixation of new repetitive families declined with it. The process was resumed when new types of regulatory innovations came into play during subsequent evolutionary periods.

Conclusions
Ancient families of conserved repeats in mammalian genomes belong to distinct groups of repetitive families. The origin of different groups of families coincides with different periods of regulatory innovations [14]. Based on a recent hypothesis describing the origin of repetitive families [24], it is proposed that regulatory innovations increased the number of small viable subpopulations of individuals with new biological adaptations. New families of TEs became fixed in these subpopulations by genetic drift and preserved in the lineage leading to modern mammals.

Analysis of repetitive families in the human genome
Using Censor [12], we masked the entire human genome sequence [30] (version18, March 2006) by running it against the library of the human and ancestral mammalian repeats extracted from Repbase [11]. We also used Censor to identify new repetitive sequences in the human genome by systematically screening different sequence portions of the masked genome against each other. The multiple copies of the newly identified sequences were clustered into families based on sequence similarity and used to generate consensus sequences. All newly generated consensus sequences were screened again against the entire masked human genome and the next generation of consensus sequences was obtained from the new sets of identified repeats with incrementally expanded flanking regions. The process was repeated until screening of consensus sequences against the genome produced consensus sequences of stable lengths. All consensus sequences were analyzed manually for presence of characteristic hallmarks such as terminal inverted repeats (TIRs), open reading frames and similarity to known families of repeats. The annotated sequences of the newly identified 152 families will be released in Repbase upon publication of the article.
The newly identified and refined consensus sequences were combined with the remaining human and ancestral mammalian-wide consensus sequences from Repbase, and the entire set was screened against the original (unmasked) version of the human genome, using both Censor and RepeatMasker [31], to determine the number of repeats for each family. The same set was screened against the previously published conserved fraction of the human genome [10].

Statistical evaluation
The entire output was analyzed statistically to determine families with repeat numbers significantly overrepresented in the conserved fractions representing~4.8% of the human genome, relative to the remaining portion of the genome (95.2%). We compared the observed repeat count in the conserved and non-conserved regions to expected repeat count based on uniform distribution. We identified 266 families with repeat numbers significantly higher than expected in the conserved regions (P<0.05, see Additional file 2: Table S2). An additional 267 families present in all mammals are not significantly overrepresented in the conserved portion of the human genome used in this analysis.

Comparison of repeat densities in mouse and human
All 533 mammalian-wide repeats (Additional file 1: Table S1, column C), and mouse-specific sequences from Repbase (listed in Additional file 1: Table S1, column B), were combined. The entire set was screened against the mouse genome [32] (version 9, July 2007), using Censor, to determine the number of repeats for each mammalian-wide family. We calculated the number of TE elements from each family per 1 Mb of the mouse genome (mdTE). Analogous densities were determined for the mammalian-wide repeats present in the human genome (hdTE). In general, the mdTE/hdTE ratio is expected to range from 0 to 1 due to a faster elimination rate of non-essential DNA in mouse than in human, driven by a higher mutation rate in rodents than in human [13]. However, some of the mammalian-wide families might have continued to expand in the rodent lineage but not in the primate lineage after the two separated, which would bias the mdTE/hdTE ratios towards higher values. Of the 533 mammalian-wide families, 26 were eliminated based on mdTE/hdTE ratio >1. The remaining ones were used in analysis presented in Figure 2.

Additional files
Additional file 1: Table S1.

Competing interests None.
Authors' contributions JJ designed the study and wrote the manuscript. All authors contributed to data analysis. WB and KKK contributed both to data analysis and revisions of the manuscript. All authors read and approved the final manuscript.
Reviewers' comments Reviewer 1: Eugene V. Koonin, NCBI, NLM, NIH, Bethesda, MD 20894, USA This is a very straightforward paper that links the fixation of different families of TEs in mammalian genome with major evolutionary transitions (here described as periods of regulatory innovations). I think the following conclusion that closes the article "it is proposed that regulatory innovations increased the number of small viable subpopulations of individuals with new biological adaptations. New families of TEs became fixed in these subpopulations by genetic drift and preserved in lineages leading to modern mammals" is quite correct. The proposed chain of causation here is: regulatory innovations -> subdivision of the ancestral populations into small subpopulations -> fixation of propagating TEs via drift. I wonder if an alternative interpretation could be at least as plausible: population bottleneck -> regulatory innovations+fixation of TE via drift? Regardless, the title of the article and some language, especially in the abstract, seem to imply direct contribution of TEs to the co-temporal regulatory innovations. However, one should avoid falling prey to 'post hoc ergo propter hoc': both regulatory innovations and bursts of TEs could be triggered by the same population-genetic factors, and these need not be adaptive.
Author's response: Indeed, regulatory innovations either due to fixation of TEs or other mutations can occur in small (lucky) populations. We elaborated on this point in the revised version "Biological innovations and the diversity of repetitive families".There is also a broader point inspired by this and the last review that deserves closer attention. Specifically, small subpopulations are more likely to go extinct due to harmful mutations than to benefit from advantageous mutations. Nevertheless, the subpopulations that survive can become the treasure troves of pre-screened mutations (beneficial, neutral and slightly harmful), which can be transferred back to the meta-population by crossbreeding. Therefore, the existence of multiple co-temporal sub-populations may be critical for successful weeding out harmful mutations and passing the remaining ones for further testing by natural selection (see "Repetitive families, population structure and speciation" in the main text).

Reviewer 2: King Jordan, Georgia Institute of Technology, USA
Jurka and colleagues present an analysis of repeat sequences, many of which have been deeply conserved across mammalian evolution. Detailed analysis of such sequences allowed the authors to identify distinct clusters of such repeats, three groups of which show marked expansions/contractions corresponding to previously characterized periods of regulatory innovation. Similar observations have been reported previously and have been taken to suggest that repeat families, transposable element (TE)-derived sequences in particular, may have provided raw material (e.g. novel regulatory sequences) that facilitated regulatory innovations during such periods of evolutionary transition (Oliver and Greene 2009 Bioessays 31:703; Oliver and Greene 2011 Mobile DNA 2: 8). Interestingly, the authors of this work favor a very different explanation for the expansion of repeats at such times. They hold that these periods of regulatory innovation were likely to be characterized by elevated levels of population subdivision, which in turn allowed for the fixation of numerous TE families by genetic drift. In this sense, the abundance of TE families can be considered to be a by-product of the population dynamics of the species whose genomes bear the elements rather than a driving force of evolutionary innovation (Jurka et al. Biol Direct 2011 6:44). This manuscript is therefore in some sense a continuation of an ongoing debate in the literature regarding the global role of TE family expansions and contractions in regulatory innovation, speciation and evolution. The new data the authors bring to bear on the question is a welcome addition to the debate. One thing that is missing from the discussion of the results, however, is a consideration of the evidence in favor of their very different world view compared to the groups who favor a more active role for TEs in driving regulatory innovation and evolution. Below, I pose a series of questions regarding the findings and how they relate to this debate. As an aside it is worth noting that this paper reports a major addition of consensus sequences for newly identified human and mammalian conserved repeats, all of which have been deposited in Repbase. These consensus sequences represent and important resource for the research community and a new source of information and annotations for the human genome. Major Points 1. The work of Oliver and Greene on the genomic drive or TEthrust hypothesis for the role of TEs and regulatory innovation, which has direct relevance to this manuscript as discussed above, is not cited or discussed. Given that this work and the TE-thrust hypothesis articulate two clearly distinct perspectives on the role of TEs, in particular whether they are active agents or passive by standers, in the processes of regulatory innovation, speciation and evolution, these papers should be cited and discussed in the context of the current work (Oliver and Greene 2009 Bioessays 31:703; Oliver and Greene 2011 Mobile DNA 2: 8). For instance, if the authors disagree with the previous perspectives on an active role for TEs in driving regulatory innovation and speciation, they should state why. Or perhaps, if they feel these two world views are not actually mutually exclusive this could be discussed. But to ignore the works reporting a conflicting perspective is a mistake in my view.
Author's response: In the original version of the manuscript we mentioned speciation only in passing as it was discussed in our hypothesis published a year earlier. Clearly, the issue continues to be of substantial interest because it was brought up in this and the last review. We appreciate the comments, and we elaborated on this point in the revised version (see "Repetitive families, population structure and speciation"). TEs have an evolutionary impact due to their mutagenic activities, but they are not the "drivers" of evolution. The fixation of any mutations, including the TE-generated ones, depends to a large extent on the population structure, which in turn is driven by biological innovations and ecological factors. The original "genomic drive hypothesis" and the followup paper lack the population genetics perspective.
2. The distributions of conserved repeat densities reported here, e.g. cTE/gTE seen in Figure 1, are uneven and lead the authors to conclude that there have been expansions and contractions of TE families that correspond to "three previously described periods regulatory innovations in vertebrate genomes." I wonder if these apparent uneven distributions could be an artifact of the methods used whereby two extreme sets of repetitive sequences were used in the analysis: ancient repeats found in all mammals and human-specific repeats. What about families of repeats that are found in some, but not all, mammalian species including humans? Would inclusion of such families in the analysis change the distributions seen?
Author's response: None of the 266 conserved repeats listed in Additional file 2: Table S2 are human-specific. Eutrep families are present only in eutherian mammals and the cTE/gTE ratio for this group is~0.17. The cTE/gTE ratio for platypus-specific families is~0.3. Typically, the cTE/gTE ratio increases with the age of the family. The peaks in Figure 1 correspond to multiple families of similar age. Figure 1 and Figure 2 describe essentially the same uneven distributions obtained by two different approaches. We don't see any room for artifacts.
3. The majority of "conserved repeats" identified here are small fragments that cannot be related to any particular TE family. Do the authors feel that these are likely to be TE-derived as well? If they are not TE-derived how would that impact the conclusions relating the observations reported here to their previous work on the population dynamics of TEs? In other words, is it possible that many of regulatory innovations provided by the repetitive families observed here were not seeded via the fixation of TEs by drift in small populations, but rather by some other process?
Author's response: The vast majority, if not all, of these fragments are likely to be TE-derived as well. After the paper was submitted, we classified over a dozen of conserved repeats as fragments of known TEs preserved in other vertebrates. The successful classification continues as new genomes become available (see also the response to the last reviewer). Even if some of the conserved repeats are not derived from TEs, they are unlikely to end up in our dataset, which was repeatedly filtered over the last five years, or so. Author's response: We added Figure 1A and Figure 1B showing the distribution of the classified and unclassified repeats. The two peaks continue to be distinguishable. In Figure 1A the first peak is larger than the second one. This simply illustrates that younger families of TEs are easier to classify than the older ones.
Minor Points 5. The authors state that "Using binomial and chi square statistics we identified families composed of repetitive elements significantly overrepresented in the conserved regions relative to the rest of the genome." and the analyses reported in the paper are based on those 266 conserved families. However, no data or information on this statistical analysis is provided in the Methods or Results sections. For instance, how significant are the overrepresentations seen? Do they vary widely? Are they different for TE-derived versus non TE-derived families (see points #3 and #4 above)?
Author's response: Typically, older repeats are more significantly overrepresented in the conserved regions than the younger ones, due to continuing attrition of the corresponding non-conserved copies over time. We don't see any other obvious patterns.
6. Is it known whether populations of mammals were indeed more subdivided between periods of regulatory innovation as suggested by the authors on page 9-10?
Author's response: The data suggest a more subdivided population during the evolutionary innovation periods and a less subdivided one when the periods end (i.e. between the periods). There is no direct way to determine the number of subpopulations that emerged and vanished during the evolutionary history of vertebrates. Based on our original hypothesis [24], a surge in the number of subpopulations translates into an increase in the number of repetitive families in the genomic fossil record of a particular lineage. The surge is also likely to trigger a parallel surge in the number of new species and lineages, consistent with the punctuated equilibria theory. The analysis of the corresponding speciation patterns is possible based on the geological fossil record, but this goes beyond the scope of the paper. Author's response: Corrected. The supplemental Materials and Methods are in the main text.

Reviewer 3: Jürgen Brosius, University of Muenster, Germany
This manuscript describes detection of novel transposed elements in vertebrate mammalian and primate lineages and correlates their estimated times of expansion with regulatory innovations in vertebrate genomes. The authors detected~150 additional "families of TEs" overrepresented in conserved regions of the genome. Some of these elements have similarities to known repeats, but most of them remained unidentified, thus far. The majority of newly described elements present in genomes are less than 150 copies. The "abundance" of some is less than 10 copies. It might be hard to discriminate between bona fide TEs and sequences that amplified via segmental duplications. Here, the zinc-finger proteins or domains come to mind. This should be discussed and perhaps more information given as to why those elements are not derived from frequent segmental duplications or are merely retropseudogenes of, e.g., tRNAs or other small RNAs.
Author's response: Based on our experience, the elements that are homologous to known pseudogenes, functional motifs and known TEs are relatively easy to identify. More difficult is to classify some of the LTRs and nonautonomous elements, which are often very diverse even in a single species. However, as indicated in Figures 1A and 1B, the observed patterns are quite similar for classified and unclassified families. As stated in the response to the second reviewer, the unclassified pile of conserved repeats continues to shrink as sequences from more vertebrate genomes become available.
Perhaps the authors should spend more effort to bridge the gap between events that lead to the formation of sub-populations, which could happen over as little as a few hundred years or less and time frames of the proposed regulatory innovations during separation of the mammalian and bird lineages, the diversification of the mammalian lineages prior to the origin of eutherian lineages or even the diversification of mammals. Here, time spans of a few million up to tens of million years are probably involved. It is likely that very few, if any of the TEs insert are chock-full of pre-assembled, readyto-use functional gene modules, such as enhancer sequencesenhancers usually consisting of arrays of transcription factor binding sites that, with~10 bp, are relatively short. In analogy to grape juice requiring further steps to generate wine, initially TEs are the raw material for innovation. As we have shown for the exaptation of (parts of) SINE elements as novel protein domains, it can take tens even 100 million years for such raw material to fortuitously acquire mutations that pave the way to functionality and exaptation Author's response: We expanded on this point mostly in the section "Biological innovations and the diversity of repetitive families." The pattern of regulatory changes during a particular period of vertebrate evolution probably drove new biological adaptations. In fact, similar changes took place independently in different lineages [14]. Based on our original hypothesis [24], we propose that the origin of new subpopulations was fueled by the new adaptations. Most subpopulations probably originated and became extinct in a relatively short time. The surviving subpopulations either diverged into separate species and lineages or channeled their TE-produced mutations back to the metapopulation by crossbreeding. Therefore, the genomic DNA of the mammalian lineage combines mutations from countless subpopulations that existed at some point of time during the history of the lineage. These mutations likely contributed to the "raw material" fueling the genomic changes (see also the response to the first reviewer).
In nature almost anything that is possible does happen and one cannot rule out such TE induced "regulatory hopeful monsters" resurrecting ideas of Richard Goldschmidt (see below) occasionally arise. However, one can imagine that a novel TE family with high copy number, that readily alters expression of most genes in whose vicinity it integrates, would cause havoc and would not bode well for the fitness of small populations.