How many antiviral small interfering RNAs may be encoded by the mammalian genomes?

Background The discovery of RNA interference phenomenon (RNAi) and understanding of its mechanisms has revolutionized our views on many molecular processes in the living cell. Among the other, RNAi is involved in silencing of transposable elements and in inhibition of virus infection in various eukaryotic organisms. Recent experimental studies demonstrate few cases of viral replication suppression via complementary interactions between the mammalian small RNAs and viral transcripts. Presentation of the hypothesis It was found that >50% of the human genome is transcribed in different cell types and that these transcripts are mainly not associated with known protein coding genes, but represent non-coding RNAs of unknown functions. We propose a hypothesis that mammalian DNAs encode thousands RNA motifs that may serve for antiviral protection. We also presume that the evolutional success of some groups of genomic repeats and, in particular, of transposable elements (TEs) may be due to their ability to provide antiviral RNA motifs to the host organism. Intense genomic repeat propagation into the genome would inevitably cause bidirectional transcription of these sequences, and the resulting double-stranded RNAs may be recognized and processed by the RNA interference enzymatic machinery. Provided that these processed target motifs may be complementary to viral transcripts, fixation of the repeats into the host genome may be of a considerable benefit to the host. It fits with our bioinformatical data revealing thousands of 21-28 bp long motifs identical between human DNA and human-pathogenic adenoviral and herpesviral genomes. Many of these motifs are transcribed in human cells, and the transcribed part grows proportionally to their lengths. Many such motifs are included in human TEs. For example, one 23 nt-long motif that is a part of human abundant Alu retrotransposon, shares sequence identity with eight human adenoviral genomes. Testing the hypothesis This hypothesis could be tested on various mammalian species and viruses infecting mammalian cells. Implications of the hypothesis This hypothesis proposes that mammalian organisms may use their own genomes as sources of thousands of putative interfering RNA motifs that can be recruited to repress intracellular pathogens like proliferating viruses. Reviewers This article was reviewed by Eugene V. Koonin, Valerian V. Dolja and Yuri V. Shpakovski.


Background
The discovery of RNA interference (RNAi) and understanding of its mechanisms has opened a new era in molecular genetics. It is now clear that very small complementary RNAs may modulate expression of large genes. RNAi and related mechanisms may interfere with gene expression at the stages of transcription, processing of mRNA and translation [1,2]. They may alter transcript stability and may even cause methylation of extended genomic loci [3,4]. RNAi is a very conservative mechanism that is likely to be active in almost all eukaryotic taxa. This phenomenon attracts growing attention and studying RNA interference is probably one of the most rapidly developing fields of modern science. Known RNAi pathways are very diverse [2] and many new mechanisms are probably still to be discovered [5]. A fundamental step of RNAi is the basepairing of the two interacting RNAs. The resulting duplexes, that may be perfectly or not perfectly matched, are further recognized by the cellular RNAi machinery which may result in silencing of the source gene for one of the above interacting RNAs. The complementary motifs in RNA may be only 21 nucleotides long or even smaller [6,7]. At present, RNAi is known to control genes involved in all fields of cell functioning, including proliferation, growth, differentiation and cell death [1,8,9]. There are also some "exotic" functions like the control of genomic transposable elements. Transposable elements (TEs) are "selfish" fragments of genomic DNA able to self-reproduce and to insert into new locations into the host genome. TEs occupy huge space in eukaryotic DNA, e.g. they account for at least 50% of the human genome [10] and 50-90% of the genomes of many plant species [11]. Different TE families may be represented in genomes by a very different number of representatives like tens, hundreds, thousands, and even millions of TE copies per genome. Although TE copies fixed in the genome are most frequently neutral or even advantageous to the host organism, their uncontrolled proliferation and insertional activity may cause multiple genetic and developmental deleterious effects [12][13][14][15][16].
At least in several species, TE proliferation is controlled by the RNAi mechanisms [17,18]. For example, in the DNA of fruitfly Drosophila melanogaster there are several conserved loci that do not encode any functional genes, but instead contain mutated copies of some TEs [19]. These loci are transcribed in both sense and antisense orientations, which results in generation of large double-stranded RNAs (dsRNAs) including TE sequences. Such dsRNAs are recognized by the cellular RNAi machinery that further represses expression of all genomic TEs identical to those located in the above loci. When these loci are highly transcriptionally active, protection against TE expression is strong, when they are silent or hardly transcribed -the protection is weak [20,21].

Presentation of the hypothesis
In the DNAs of some prokaryotes (e.g., E. coli), there arẽ 50 bp long sequence motifs identical to fragments of bacteriophagal genomes. It was shown that the presence of these motifs in the bacterial DNA protects them from bacteriophagal infection, although the mechanism of this protection (probably not RNAi) remains a mystery [22]. Also, it was recently hypothesized that there may be a specific mechanism in crustaceans that provides reverse transcription of viral transcripts and subsequent insertion of the resulting cDNAs into the genome. Further transcription of these cDNAs in the antisense orientation may help the host organisms to resist future viral infections by acting as an intracellular specific immunity system [23]. There is a growing number of instances of acquisition of RNA virus sequences by eukaryotic genomes that were proposed to function in antiviral defense [24]. However, at present, the mechanism of the RNAi based antiviral protection in eukaryotic organisms has not been sufficiently investigated [25]. Theoretically, RNAi could well serve as intracellular "immune system" by repressing transcription of not only intra-genomic parasites like TEs, but also of external ones like viruses. In plants and in invertebrates, many cases of viral gene suppression using small interfering RNAs originating from viral double stranded RNAs have been documented to the date [26]. Furthermore, several examples of virus-encoded small interfering RNAs that may regulate host gene expression became available recently [27]. Finally, at least in the four cases mammalian siRNAs are thought to interfere with viral transcripts, thus preventing efficient virus replication [28][29][30][31].
The genome growth occurs by virtue of various processes the most important of which are random DNA duplication and propagation of transposable elements [34,35]. In the large eukaryotic DNAs, protein coding sequences occupy a rather modest fraction (a few percent or lower), whereas significantly bigger parts of these genomes appear to be transcribed. For example, according to the published data, more than 50% of the human DNA is transcribed in different cell types [36]. These transcripts are frequently not associated with any known genes, but represent non-coding RNAs of unknown functions [37]. At least part of these transcripts is likely to participate in RNAi-mediated regulation of gene expression.

The hypothesis
We propose a hypothesis that virus suppression mediated by self-encoded small interfering RNAs is not an exception but is rather a general case for the mammalian genomes and, probably, for other relatively big eukaryotic genomes. These genomes may include relatively short motifs sharing high sequence identity with viral genes, and, when transcribed, these motifs may function for the antiviral host cell defense. In this light the enlargement of genome sizes may be beneficial to the host organisms as a source of novel putative interfering RNA motifs that can be recruited to repress intracellular pathogens like proliferating viruses. An increased genome would allow an "empty space" for the evolution of different genetic elements, including noncoding DNA. Casual combinations of nucleotides in the new part of the genome might create new DNA motifs that theoretically, after being transcribed, could be used by the host organism as a tool for recognition and targeting of intracellular pathogen transcripts. Novel transcribed DNA motifs that would target the host genes would be eliminated from the genome, whereas those that complementarily match with the pathogen RNAs would be positively selected. Neutral motifs could be "stored" in the genomes as ordinary non-coding DNA.
Mechanisms of genome size increase might include DNA duplications, expansion of satellite sequences, emergence of polyploid chromosomes, and insertions of transposable elements [38]. Initially, the newly amplified part of the genome is identical or very close to the "progenitor" genomic DNA. However, neutral or non-neutral mutation pressure may form a novel DNA landscape within the amplified fragments during genome evolution. In case of being transcribed, these loci might significantly increase the repertoire of cellular interfering RNAs.
We also presume that the evolutional success of some groups of genomic repeats and, in particular, transposable elements (TEs) at least partly may be due to their ability to provide antiviral RNA motifs to the host organism. Intense propagation of the repetitive elements into the genome would necessarily cause bidirectional transcription of these sequences, and the resulting double-stranded RNAs may be recognized and processed by the RNA interference enzymatic machinery. Provided that these processed target motifs may be complementary to viral transcripts, fixation of the repeats into the host genome may be of a considerable benefit to the host.
We performed a bioinformatic assay aimed to quantify in human DNA sequence motifs that perfectly match on 26 published adenoviral genomes ( Figure 1). Human adenoviral genomes have similar lengths of~34-36 kb and encode each for approximately 35 viral genes [39]. Only different nucleotide motifs were taken into account, motifs repeated in human or adenoviral genome several times were considered as a sole motif. Each motif was quantified only once, e.g. 25 nt-long sequence was registered only as one 25 nt-long motif, but not also as 24-, 23-, 22-and 21 nt-long motifs.
For different human adenoviruses, we identified 47-106 perfectly matched 21 nt-long motifs, 16-44 22 ntlong, 4-19 23 nt-long, 0-8 24 nt-long, 0-6 25 nt-long, 0-6 26 nt-long and 0-1 27-, 28-and 29 nt-long motifs per genome. The overall number of such motifs varied from 85 to 161. Provided that more than 50% of human DNA is transcribed, and that this transcription may be driven in both directions, we may expect that more than a quarter of the above complementary motifs are transcribed within the RNA molecules in the antisense orientation relatively to adenoviral gene transcriptional direction. At least theoretically these motifs might be somehow involved in downregulation of viral genes.
Similar data were obtained when comparing human DNA with 10 human pathogenic herpesvirus genomes. We further compared relative occurrences of 21-29 nt long hits among adenoviral, herpesviral and bacteriophagal genomes (a list of the investigated viral genomes can be found in Table 1). To this end, the number of the respective identified BLAST hits was normalized to 1 kb of each virus genome sequence. The resulting figure clearly shows an approximately 3-fold greater average number of hits for adenoviral and herpesviral genomes rather than for bacteriophages in all size ranges ( Figure  2). The excess of hits in human-pathogenic virus genomes compared to bacteriophages was statistically significant with p-values < 0,01 for 21-25 nt-long hits (p-values shown on Table 1).
Alternatively, we compared human-virus sequence identities using a panel of randomly generated genomes. Using 2 nd -and 5 th order Markov model we generated random sequences by shuffling the actual adenoviral, herpesviral and bacteriophagal genomes. Under this approach, 1000 random sequences were generated separately for each investigated viral genome. We next compared numbers of BLAST hits for the existing viral genomes, and for in silico generated ones. The observed BLAST hits (total numbers of BLAST hits were found for each genome) were statistically analyzed (Table 2), and the following was found: (i) the number of hits for the existing adenoviral or herpesviral genome was mostly higher than the 95 th percentile for a set of the corresponding in silico-generated sequences (in 81% or 89% of the cases, respectively, for 5 th order Markov model), (ii) for the bacteriophages, this number was mostly below the 95% percentile and expanded it only in 38% of the cases). Again, these data confirm that human-pathogenic viruses share significantly greater structural identity with the human genome than do the bacteriophagal genomes. Importantly, this also implies that there was a kind of positive selection for either the "virus-like" sequences in the human DNA, or for the "humanized" DNA in the human-pathogenic viruses, or both.
Our further studies revealed that many human-virus BLAST hits appeared to be transcribed in human, as learned from the analysis of human EST database (Figure 3). As before, numbers of transcribed hits for herpes-and adenoviral genomes were far greater than those for the phages. Notably, there was a clear-cut tendency towards a greater representation of longer herpesand adenoviral hits in the human EST database, compared to the absence of relatively "long" transcribed hits for the phages (Figure 3). In several cases the number of "transcribed" hits was even higher than the number of hits matching human genomic DNA database (adenoviral genomes, 26 nt-long motifs). Detailed analysis of those sequences revealed that this was due to transcriptional processing features such as splicing and polyadenylation that increased variability of the transcribed part of human genome.
Therefore, the results of this pilot assay point to accumulation and functional relevance of the infectiousvirus-like sequences of the human genome.
Many such identical motifs were parts of human transposable elements. For example, one 23 nt-long motif (CGTACTTCAGCCTGGGCAACAAG) that shared perfect sequence identity with three adenoviral genomes and considerable identities -with five other adenoviral genomes, was included in a variant of human transposable element of AluS family and was represented in the genome by multiple copies. We further investigated whether consensus sequences of other human TE families include sequence motifs perfectly matching to human adenovirus or human herpesvirus genomes. We were registering only the hits displaying perfectly matched 16 nt-long or more extended motifs ( Figure  4). Such hits have been found for 23 out of 51 human TE families, and the distribution of hits there was not uniform. The highest relative numbers of identities per 1 kb of the TE consensus sequence were detected for different subfamilies of human retrotransposon Alu, which is known to be the most successful human TE in terms of propagation of its copies (over 1 million of copies per genome). Interestingly, it has been reported previously that adenoviral infection results in a dramatic increase in Alu transcription in human cells [40]. Our hypothesis might at least partly explain this phenomenon.   We further screened available analogous mouse viral genomes (Murine adenovirus A and Murid herpesvirus 1) against human and mouse genomic and EST databases. For both mouse adenoviral and herpesviral genomes, the number of 21-28 nt-long hits was higher when searched through the mouse genomic and EST databases compared to the human ones ( Figure 5).
Among the identified virus-like hits presented in both human and mouse DNAs, three sequences were simple repeats represented by multiple copies in both genomes (motifs TGCTGATGCTGATGCTGATGCTGATG, CATC-CATCCATCCATCCATCC and ATTCTTTCATTCTTT-CATTCTTT). Importantly, their copy numbers were very different in the mouse and human DNAs (mouse/human): 1216/194, 20384/13893 and 1120/192, respectively. Thus, a kind of positive selection for simple repetitive elements matching genomes of the viruses with the respective tropism may theoretically take place in this case.
Finally, in addition to antiviral adaptations the above identities of the host and viral genomes may also represent a virus adjustment to the host aimed at the regulation of the host gene expression that may facilitate viral life cycle progression (reviewed in [27]). Both lines of co-evolution are possible, and detailed experimental studies will be necessary to explore each case of the hostvirus sequence coincidence.

Objects
Objects for testing this hypothesis could be various mammalian species and viruses infecting mammalian cells. The total number of BLAST hits found for the original viral genome. The default "Nucleotide BLAST" for the "BLAST at NCBI" search criteria were used. 3 The data obtained for the sets of random genomes generated using Markov chain 2 nd -and 5 th order algorithm. 1.000 random sequences were generated for every existing viral genome. Expected numbers of BLAST hits for the 5 th , 50 th and 95 th percentiles are given, accordingly. The data was generated using specially designed script. The script "BioVictor-Python1" is available upon the request to the authors.

Experiments
Apart from investigating susceptibility to viral infections, many other types of experiments can be proposed to test this hypothesis. For example, the comparisons of host and viral DNAs can be done in order to identify homologous nucleotide motifs. It can be further investigated (e.g., using Northern blot or microarray hybridization) whether there are such motifs transcribed in the antisense orientation relatively to viral gene transcriptional direction. For those transcribed in the antisense orientation relatively to viral gene expression, complete host RNA primary structures can be established (e.g., using 5'-and 3' RACE technique [41]). These RNAs may be assayed in functional tests whether they do interfere with viral gene expression and progression of ongoing infection using multiple in vitro and in vivo approaches (e.g., by assessing the effects of overproducing RNAs of interest on viral infection or viral gene expression).

Implications of the hypothesis
It is proposed here that non protein-coding parts of the mammalian transcriptomes include thousands of nucleotide motifs that can be employed to suppress viral gene expression. We hypothesize that the evolutionary success of some families of mammalian transposable elements may at least partly be due to their ability to provide substantial amounts of antiviral RNAs. It could be also generalized that theoretically species having increased genome sizes may resist various viral Figure 4 Normalized content of the DNA motifs identical between the consensus sequences of human transposable elements and human herpesviral and adenoviral DNAs. Bar heights are proportional to the relative content of perfectly matched BLAST hits per 1 kb of the respective TE group consensus sequence. Human TE consensus sequences were taken from the database RepbaseUpdate [45].  infections stronger than related organisms with more compact genomes. A practical implication probably might be that introducing artificially engineered TE sequences encoding antiviral RNAs could be advantageous for creating strains and breeds of eukaryotic organisms with the complex genomes that would be more resistant against intracellular parasites, e.g. for the needs of plant bioengineering. However, in this case enlargements of genomic DNAs must be followed by artificially accelerated mutation processes in order to increase genome diversity and, therefore, to create additional structure motifs potentially interfering with viral expression. The latter goal could be achieved using a wide number of available physical or molecular methods like gamma-irradiation and treatment with various mutagenic chemicals [42,43]. Moreover, quick evolution of the pathogenic viral genomes may be somewhat compensated by the accumulation of mutations in multiple TE copies which might significantly strengthen antiviral response.

Conclusions
It is proposed that mammalian genomes contain thousands of relatively short sequence motifs that may be beneficial to the host organisms as a source of putative interfering RNA molecules that can be recruited to repress intracellular pathogens like viruses. We identified a large number of short sequences (21-29 bp long) in human genome that are identical to sequences of different types of human adenoviruses and herpesviruses. Many such motifs are transcribed and may be involved in RNAi-mediated defense to viral infection. In this case, RNAi could serve as an intracellular "immune system" by repressing transcription of intra-genomic parasites like active viruses. We hypothesize here that that the evolutionary success of some types of mammalian genomic repeats and, in particular, of some TE families may at least partly be due to their ability to provide substantial amounts of antiviral RNAs.

Reviewer's comments
Reviewer's report 1 Eugene V. Koonin (The National Center for Biotechnology Information, NLM, NIH, Bethesda, USA)

Reviewer comments
Zabolotneva and Buzdin speculate that animal genomes expand under selective pressure for generation of antiviral siRNAs. They illustrate the hypothesis by identifying multiple 20-25 bp sequences identical to sequences in adenovirus genome and also claim but do not show similar findings for herpesvirus genomes.
I have major comments on both the conceptual and technical levels. Conceptually, I am confident that the only answer the question posed in the title of the paper is: No, and the question itself makes little sense. Enlargement of the genome cannot be a mechanism at any rate but, regardless of the semantics, to claim that it is an adaptation, even in the general sense, let alone specifically for antiviral defense, is an obvious fallacy. To be more specific, this idea assigns to the evolutionary the kind of foresight it can never possess. There is no good reason to question the population-genetic explanation of the major increase in genome size seen in vertebrates, namely, that the small effective population size of animals results in inefficient purifying selection and so provides fixation of even slightly deleterious features. The genome growth is a manifestation of this fundamental phenomenon and occurs by virtue of various processes the most important of which are random DNA duplication and propagation of transposable elements. It is a completely different matter than much (we currently do not have a clear idea just how much) of the junk DNA is co-opted for various functions including control of selfish elements, both transposons and viruses, which is indeed crucial.
The above is not an unqualified condemnation of this manuscript in its entirety. In principle, it could be salvaged by reformulating the hypothesis to "Are antiviral small interfering RNAs encoded in animal genomes?" When discussing this question, the authors should be clear about the major known mechanism of generation of antiviral siRNAs, namely, production from dsRNA through the action of the RISC complex.

Author's response
We agree. In the revised version, we re-formulated title of the manuscript which is now as follows: "How many antiviral small interfering RNAs may be encoded by the mammalian genomes?", and put numerous changes in text to avoid conceptual problems mentioned by the referee. Milestone references mentioned by Dr. Koonin were added to the manuscript and discussed in the text.

Reviewer comments
However, not all viruses produce dsRNA, and in any case, it would be quite interesting if animal genomes indeed encoded siRNA against viruses, in addition to known ones against transposable elements. In my view, to substantiate such a hypothesis, several types of analysis are necessary: (i) expand the analysis of virus-specific sequences (at least, include the herpesvirus data but better additional families of viruses), (ii) compare the occurrence of virus-specific sequences to the random expectation and calculate p-values, (iii) examine the available transcriptome data for the presence of these sequences in transcripts, (iv) investigate the distribution of these sequences in the genome -are they found primarily in introns or in intergenic regions or randomly? If these results of such analysis point to functional relevance, this could become a stimulating hypothesis.

Author's response
We are extremely grateful for these advices by the referee. In the revised version, we expand the analysis to the additional 10 human herpesviral and 13 different bacteriophagal genomes and statistically tested the data. We also compare complementary motif occurrences in viral DNA with four randomly generated 50 kb-long "genomes". Furthermore, we have analyzed distribution of the EST hits among the different viral entries and obtained the data that hopefully somewhat support our hypothesis.

Reviewer's comments on the revised version Reviewer comments
In the revised version of their manuscript, Zabolotneva and coworkers eliminated the major misconceptions of their original manuscript and added some computational analysis that aim at demonstrating the plausibility of their idea that large mammalian genomes encode numerous antiviral microRNAs. The removal of the "teleological" aspects of the original article certainly makes the new version more palatable, and an attempt to incorporate more detailed sequence analysis is in itself laudable. However, unfortunately, problems remain. The bioinformatic analysis included in the paper is not professional. The authors give no p-values for the excess of the virus-specific motifs that they discover and do not explain the method the use to generate their random sequences. Accordingly, all the analysis is out of context and has no real meaning as there is no indication whether or not the excess of hits in viral genomes compared to random sequences and phages is statistically significant or not.

Author's response
In the present version of the paper, the excess of the virus-specific motifs is shown to be statistically significant. The calculated p-values are given in the separate table (Table 1).

Reviewer comments
It would be advisable to calculate p-values both analytically and by comparison with random sequences that would have to be generated by shuffling the actual viral genomes (preferably, trying Markov models of different orders). Under this approach, it is necessary to generate many (at least, 1000, and preferably, more) random sequences separately for each viral genome and determine where in the distribution of the number of hits is the real genome. It also would be curious to reproduce this procedure with bacteriophage genomes (very strangely, in the current version, the authors do not specify which page genomes they used).

Author's response
In the new version, we specify adenoviral, herpesviral and phage genomes in the tables 2 and 3.
As suggested by the referee, we compared human-virus sequence identities using a panel of randomly generated genomes. Using 2 nd -and 5 th order Markov model we generated random sequences by shuffling the actual adenoviral, herpesviral and bacteriophagal genomes. Under this approach, 1000 random sequences were generated separately for each investigated viral genome. We next compared numbers of BLAST hits for the existing viral genomes, and for in silico generated ones. The observed BLAST hits (total numbers of BLAST hits were found for each genome) were statistically analyzed (Table 2), and the following was found: (i) the number of hits for the existing adenoviral or herpesviral genome was mostly higher than the 95 th percentile for a set of the corresponding in silico-generated sequences (in 81% or 89% of the cases, respectively, for 5 th order Markov model), (ii) for the bacteriophages, this number was mostly below the 95% percentile and expanded it only in 38% of the cases). Again, these data confirm that human-pathogenic viruses share significantly greater structural identity with the human genome than do the bacteriophagal genomes. Importantly, this also implies that there was a kind of positive selection for either the "virus-like" sequences in the human DNA, or for the "humanized" DNA in the human-pathogenic viruses, or both.

Reviewer comments
Beyond these technical issues, the article still involves some conceptual vagueness. What is the authors' hypothesis on the origin of the putative antiviral RNA? Do they think that these sequences were acquired by insertion of virus-specific DNA or have they just emerged by chance? Both possibilities appear realistic, and the choice of the best interpretation, to a large extent, depends on the results of the statistical analysis outlined above. Regardless, it is highly desirable to be clear about the mechanistic aspects of the hypothesis.

Author's response
In the new version, we state that: "there was a kind of positive selection for either the "virus-like" sequences in the human DNA, or for the "humanized" DNA in the human-pathogenic viruses, or both." At present, we cannot be more certain about what flow in human-virus DNA interchange is the most important.

Reviewer comments
The language of the manuscript remains quite poor. The text might have been seen by a professional translator who might have removed some of the errors but many of these remain along with the overall poor style.

Author's response
Native English-speaking colleague edited the text.

Reviewer's report 2
Valerian V. Dolja (Department of Botany and Plant Pathology, Oregon State University, Corvallis, USA)

Reviewer comments
This Hypothesis article by Anastasia Zabolotneva and Anton Buzdin advances a concept according to which the eukaryotes with large (e.g., polyploid) genomes take advantage of a large supply of genetic material that is not a subject of purifying selection to evolve antiviral RNA transcripts. It is further proposed that such transcripts could activate RNAi machinery and therefore suppress the infection. This concept is well in line with the recent striking findings of the bacterial anti-phage CRISPR defense system and growing number of instances of acquisition of RNA virus sequences by eukaryotic genomes that were also proposed to function in antiviral defense (see a succinct commentary by Eugene Koonin, 2010). Although a welcome generalization, current Hypothesis appears to be rather thin on supporting evidence and lacking in specifics as it comes to the involved molecular mechanisms. Below is a laundry list of comments addressing which could, in my opinion, strengthen the case made by the authors.
1. The only bioinformatics support for the hypothesis comes from finding a substantial number of short sequences identical to human adenoviruses in the human genome. It is not clear, however, if there is any positive selection/enrichment for such sequences or, if their occurrence is purely incidental. It seems that a simple in silico experiment could provide an important insight into this issue. If the genomes of DNA phages similar in size to adenoviruses are used as a query, will there be a similar or significantly lower number of hits? Since phages do not infect humans, the latter outcome would be supportive of positive selection for the retention of human virus related sequences rather than for mere stochastic occurrence of the irrelevant sequences.

Author's response
In the revised version -done exactly as suggested by the referee (see also our reply #2 to the reviewer 1).

Reviewer comments
2. Along the similar lines, it is not specified if there are any pathways in addition to positive selection (that in itself could be insufficient) that allows for selective retention of antiviral sequences as opposed to those affecting human own genes. Again, a simple search for, e.g., sequences identical to ribosomal RNAs (outside the rRNA genes proper) could provide relevant insight.

Author's response
We omitted this type of analysis suggested by the referee because mammalian genomes (and human genome as well) contain huge numbers of pseudogenes for rRNA that significantly bias interpretation of the data.

Reviewer comments
3. Proposed hypothesis testing via comparing viral susceptibility of the closely related organisms with contrasting genome size appears to be conceptually problematic. The case in point is plant species that underwent evolutionary recent polyploidy transitions. Such plants tend to be more rapidly growing and vigorous than their diploid kin. Consequently, if the former are found to be more virus-resistant than the latter, it could be attributed to their overall vigor (and/ or increased complement of innate and acquired immunity genes) rather than to acquisition of additional antiviral sequences.

Author's response
We agree. The paper has been seriously revised and the confusing part was removed from the manuscript.

Reviewer comments
4. It is not clear why the RNAi-based antiviral response invoked in the paper is habitually called 'intercellular' immune system. Even though cell-to-cell and long-distance spread of RNAi signaling is described for plants and C. elegans, by and large, the RNAi machinery is cell-autonomous, that is, is expressed in each cell.

Author's response
Corrected through the manuscript.

Reviewer comments
The paper needs to be heavily edited against numerous typos (e.g. 'specie' throughout the text), as well as grammatical and stylistic errors (e.g., 'Theoretically, a practical implication could be...', on p. 8; an oxymoron).

Author's response
A professional interpreter edited the revised version.

Author's response
We are very thankful to the referee for his valuable criticism. Indeed, the initial version of the manuscript was greatly overlapping with the abovementioned papers. In the present version, an attempt has been made to avoid ambiguous sentences, e.g. concerning polyploid organisms. As to the novelty, to meet the referee suggestion we revised the major concept of the manuscript. We propose a hypothesis that mammalian DNAs and, in particular, human genome, encode thousands of the RNA motifs that may serve for the antiviral protection. We also presume that the evolutional success of some groups of genomic repeats and, in particular, transposable elements (TEs) may be due to their ability to provide to the host organism antiviral RNA motifs. Genomic repeat intense propagation into the genome inevitably causes bidirectional transcription of these sequences, and the resulting double-stranded RNAs may be recognized and processed by the RNA interference enzymatic machinery. Provided that these processed target motifs may be complementary to viral transcripts, fixation of the repeats into the genome may be of a considerable benefit to the host.

Reviewer comments
I also have some comments concerning the computational data presented in a new manuscript's submission. The comparison of the occurrence of virus-specific sequences to the random expectation or to bacteriophagal sequences has a very limited scientific value (if any) -of course, the genomes interacting in the course of evolution have more in common than evolutionarily unrelated or artificially chosen sequences. More relevant to the case presented could be testing by bioinformatics means the virus-host specificity of the discussed short RNA motifs present in different mammalian species: was there any co-evolution of the viral and hostacquired sequences or not?.. The positive correlation could probably strengthen the case. Particularly, this kind of viral-host sequence comparison could be done using as queries genomes of the viruses for which is already known that they are using RNAi machinery in their interaction with the hosts: human and mouse cytomegaloviruses (hCMV and mCMV), human, simian and murine rhadinoviruses (KSHV, RRV, MHV68), human and rhesus lymphocryptoviruses (EBV & rLCV).

Author's response
We added the results of some additional bioinformatical tests to the present version. We extracted from genomic databases the available mouse adenovirus and herpesvirus genomes (Murine adenovirus A and Murid herpesvirus 1) and screened them against human and mouse genomic and EST databases. For both mouse adenoviral and herpesviral genomes, the number of 21-28 nt-long hits was higher when searched through the mouse genomic and EST databases compared to the human databases. Among the identified virus-like hits presented in both human and mouse DNAs, three sequences were simple repeats represented by multiple copies in both genomes (motifs TGCTGATGCTGATGCTGATGCT-GATG, CATCCATCCATCCATCCATCC and ATTCTT TCATTCTTTCATTCTTT). Importantly, their copy numbers were very different in the mouse and human DNAs (mouse/human): 1216/194, 20384/13893 and 1120/192, respectively. Thus, a kind of positive selection for simple repetitive elements matching genomes of the viruses with the respective tropism may theoretically take place in this case. Overall, these data are supportive towards the general concept of this manuscript. We have also tested the presence of adeno-and herpesviruslike motifs in the consensus sequences of human transposable elements and found that abundant genomic Alu repeats are enriched in such elements. We thank the referee for recommending a strategy of further studies that would include subsequent comparisons of various herpesviral genomes with the DNAs of their hosts and vice versa. However, these studies go beyond the scope of this hypothesis paper and will be a matter of our further research projects that would include also a detailed analysis of coevolution of genomic repeats, viruses and their hosts for various mammalian organisms.