Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements
© Makarova et al; licensee BioMed Central Ltd. 2009
Received: 21 August 2009
Accepted: 25 August 2009
Published: 25 August 2009
In eukaryotes, RNA interference (RNAi) is a major mechanism of defense against viruses and transposable elements as well of regulating translation of endogenous mRNAs. The RNAi systems recognize the target RNA molecules via small guide RNAs that are completely or partially complementary to a region of the target. Key components of the RNAi systems are proteins of the Argonaute-PIWI family some of which function as slicers, the nucleases that cleave the target RNA that is base-paired to a guide RNA. Numerous prokaryotes possess the CRISPR-associated system (CASS) of defense against phages and plasmids that is, in part, mechanistically analogous but not homologous to eukaryotic RNAi systems. Many prokaryotes also encode homologs of Argonaute-PIWI proteins but their functions remain unknown.
We present a detailed analysis of Argonaute-PIWI protein sequences and the genomic neighborhoods of the respective genes in prokaryotes. Whereas eukaryotic Ago/PIWI proteins always contain PAZ (oligonucleotide binding) and PIWI (active or inactivated nuclease) domains, the prokaryotic Argonaute homologs (pAgos) fall into two major groups in which the PAZ domain is either present or absent. The monophyly of each group is supported by a phylogenetic analysis of the conserved PIWI-domains. Almost all pAgos that lack a PAZ domain appear to be inactivated, and the respective genes are associated with a variety of predicted nucleases in putative operons. An additional, uncharacterized domain that is fused to various nucleases appears to be a unique signature of operons encoding the short (lacking PAZ) pAgo form. By contrast, almost all PAZ-domain containing pAgos are predicted to be active nucleases. Some proteins of this group (e.g., that from Aquifex aeolicus) have been experimentally shown to possess nuclease activity, and are not typically associated with genes for other (putative) nucleases. Given these observations, the apparent extensive horizontal transfer of pAgo genes, and their common, statistically significant over-representation in genomic neighborhoods enriched in genes encoding proteins involved in the defense against phages and/or plasmids, we hypothesize that pAgos are key components of a novel class of defense systems. The PAZ-domain containing pAgos are predicted to directly destroy virus or plasmid nucleic acids via their nuclease activity, whereas the apparently inactivated, PAZ-lacking pAgos could be structural subunits of protein complexes that contain, as active moieties, the putative nucleases that we predict to be co-expressed with these pAgos. All these nucleases are predicted to be DNA endonucleases, so it seems most probable that the putative novel phage/plasmid-defense system targets phage DNA rather than mRNAs. Given that in eukaryotic RNAi systems, the PAZ domain binds a guide RNA and positions it on the complementary region of the target, we further speculate that pAgos function on a similar principle (the guide being either DNA or RNA), and that the uncharacterized domain found in putative operons with the short forms of pAgos is a functional substitute for the PAZ domain.
The hypothesis that pAgos are key components of a novel prokaryotic immune system that employs guide RNA or DNA molecules to degrade nucleic acids of invading mobile elements implies a functional analogy with the prokaryotic CASS and a direct evolutionary connection with eukaryotic RNAi. The predictions of the hypothesis including both the activities of pAgos and those of the associated endonucleases are readily amenable to experimental tests.
This article was reviewed by Daniel Haft, Martijn Huynen, and Chris Ponting.
The discovery of elaborate and versatile systems of RNA-mediated gene silencing in eukaryotes is one of the pivotal advances in biology of the last decade [1–5]. There are three major, distinct forms of regulatory small RNAs involved in eukaryotic gene silencing: small interfering (si) RNAs, micro (mi) RNAs, and PIWI-associated (pi) RNA (previously referred to as rasiRNA) . The siRNAs are derived from double-stranded RNAs of viruses and transposable elements, which are processed by Dicer, one of the essential components of the RNA-Induced Silencing Complexes (RISCs) [7–11]. Dicer cleaves long dsRNA molecules into short, 21–22 nucleotide duplexes which are subsequently unwound and the guide strand is loaded on another crucial component of RISC, the Argonaute (Ago) slicer nuclease. The Ago-siRNA complex then binds to the target mRNA which is cleaved by the PIWI domain of Argonaute (Ago), after which the mRNA fragments are released and the RISC-siRNA catalytic complex is recycled [9, 12–14].
Variant, paralogous Dicers and Argonautes are involved in the mechanisms of the other classes of small RNA such as miRNA and piRNA . Unlike the siRNAs, 21–25 nt-long miRNAs are encoded in eukaryotic genomes and are either perfectly (in plants) or imperfectly (in animals) complementary to sequences in the 3'-untranslated regions of specific endogenous mRNAs . Base-pairing of miRNAs with the target mRNAs, which is mediated by a distinct form of RISC, results either in RNA cleavage or in down-regulation of translation without cleavage . Evidence is rapidly accumulating that numerous of miRNAs in animals and plants are major players in development regulation and chromatin remodeling .
Dicer and Argonaute are the core components of RISCs. Dicer is a multi-domain protein that typically consists of a DEXD/H-type helicase domain fused with an RNA-binding PAZ domain, two RNAse III domains, and in some cases a dsRNA-binding domain . The Argonaute protein is composed of four domains including the PAZ RNA-binding domain and the PIWI family exonuclease, and performs the slicer function [9, 12, 13]. Both Dicer and Argonaute are represented by variable numbers of paralogs in eukaryotes, and different paralogs are included in RISCs with distinct functions [9, 12, 13].
Prokaryotes possess apparent functional counterparts to the miRNA system, that is, regulation of bacterial gene expression by small antisense RNAs. The best characterized of these pathways employ the RNA-binding protein Hfq for small RNA presentation and RNAse E for target degradation [15–17]. Escherichia coli appears to encode ~60 microRNA genes [18, 19], and comparable numbers of expressed, small antisense RNAs have been detected in the archaea Archaeoglobus fulgidus  and Sulfolobus solfataricus  suggesting an important role of this regulatory mechanism in prokaryotic physiology. In addition, small antisense RNAs have been shown to regulate plasmid replication and to kill plasmid-free bacterial cells by silencing specific plasmid genes .
The recently discovered major prokaryotic phage/plasmid defense system, the CRISPR associated system (CASS) [[22, 23], Waters, 2009 #566], also relies on guide RNA that apparently targets invader DNA . The hallmark of the CASS is that this system encompasses a still poorly understood mechanism for integrating fragments of bacteriophage DNA into a specific site within the CRISPR repeat cassette; at least in part, integration of these fragments is probably mediated by the Cas1 proteins that has been predicted [22, 25] and more recently experimentally demonstrated to possess DNAse activity . The unique, phage/plasmid-specific CRISPR inserts are then transcribed and processed to guide RNAs that are directed to the target DNA by the Cascade complex which (in Escherichia coli K12) consists of 5 Cas proteins and seems to a be a functional analog of the RISC . Despite general functional analogies, the molecular mechanisms of CASS and eukaryotic RNAi are distinct, and the protein components of the two systems are not homologous [22, 28].
Many archaea and bacteria do encode homologs of the major protein components of eukaryotic RNAi, in particular, Argonaute-PIWI family proteins, and the helicase and RNAse III domains of Dicer although the fusion of these domains in a single protein appears to be a eukaryotic signature . The crystal structures of Argonaute homologs from two thermophilic bacteria [30, 31] and two archaea [32, 33] have been solved, and the structures appear to be very similar to those of eukaryotic Argonautes . However the functions of the prokaryotic Argonaute homologs (hereinafter pAgo) remain obscure, despite the in vitro demonstration of the RNAse H-like ribonuclease activity (cleavage of RNA in a DNA/RNA duplex) of the pAgos from the bacteria Aquifex aeolicus  and Thermus thermophilus .
Here, we apply comparative genomics and in-depth computational analysis of Argonaute-PIWI family proteins and other proteins that are typically encoded in their genomic neighborhoods to predict the biological functions of pAgo. We present a hypothesis that the prokaryotic Argonautes are key components of a novel class of virus/plasmid defense systems.
Results and Discussion
Prokaryotic Argonaute homologs belong to two major groups based on the presence or absence of the PAZ domain
To identify all prokaryotic Argonaute homologs, we performed a PSI-BLAST search against the NCBI non-redundant protein sequence database using the PIWI domain (the most highly conserved domain in the Argonaute family proteins) sequence from the Thermus thermophilus HB27 pAgo (TT_P0026, pdb: 3DLB containing; PIWI domain sequences in amino acid positions 415–685). The search was run until convergence (after the 3rd iteration) and resulted in the identification of 100 sequences, some of which were fragmented or truncated proteins; additional searches started with some of the detected proteins showed that this sequence set represents the full complement of PIWI-domain proteins (pAgo) encoded in currently available prokaryotic genomes. For more detailed analysis, we selected 85 sequences from 80 genomes (the genomes of the bacteria Parvularcula bermudensis HTCC2503 and Halorubrum lacusprofundi ATCC 49239 encode three pAgo proteins each, and the genome of Acidobacterium capsulatum ATCC 51196 encodes two pAgos) (see Additional File 1).
PIWI domain is inactivated in numerous pAgos
The AfAgo protein, which does not contain a PAZ domain, also lacks the catalytic aspartates but has been shown to bind dsRNA [32, 40]. Structural analysis of AfAgo complexed with a siRNA-like duplex showed that in this protein a Cd2+ ion bound to the carboxy-terminal carboxylate and several amino acid residues in the middle (MID) domain are involved in the recognition of the unpaired 5' nucleotide of siRNA [32, 40]. In contrast, a structural and biochemical study of AaAgo, which contains the PAZ domain and the conserved catalytic residues, showed that this protein is an active RNAse H with a preference for a DNA/RNA hybrid as a substrate, suggesting that some pAgos employ small guide DNA molecules to cleave mRNA . The detailed study of the Thermus thermophilus pAgo corroborated the findings on AaAgo by revealing the details of interactions with the 5'-phosphorylated 21-base DNA guide strand and the DNA-guided RNA cleavage by this protein [31, 36].
Phylogenetic analysis of the Argonaute family suggests extensive horizontal gene transfer in prokaryotes
The pAgos are contextually linked to at least three distinct families of predicted nucleases
We further examined the genomic context of the pAgo genes; analysis of genomic context has been established as a powerful approach for prediction of the biological functions of prokaryotic genes using the "guilt by association" principle [41–43]. In many cases, these genes form potential operons with a variety of genes encoding uncharacterized proteins (neighbor genes were predicted to be encoded in a potential operon with pAgos if they were located upstream or downstream of the respective pAgo gene on the same DNA strand and if the intergenic distances in such an array of co-directional genes were shorter than 100 nt; see Additional File 1). We performed an in-depth analysis of the sequences of the proteins encoded in the genes co-localized with pAgos using PSI-BLAST, HHpred and CDD search (see Methods). This analysis resulted in the identification of four protein families that are predicted to be co-expressed and thus functionally linked with the pAgos.
For the C-terminal domain of xccb100_3097, we failed to detect any statistically significant similarities to known domains using CDD search or HHpred. However, PSI-BLAST search with the xccb100_3097 used as a query revealed many homologs with similar domain architectures, all of which are associated with pAgos in putative operons; moreover, several multidomain proteins (eg. GIs: 91783256, 218130589, 229435559) comprise fusions of xccb100_3097-like and PIWI domains (see the alignment of this domain in Additional File 3).
The second family of PIWI-associated proteins is typified by the mlr6203 (GI: 13475182) protein from Mesorhizobium loti. The HHpred search convincingly shows that the N-terminal domain of these proteins belongs to the Mrr family of restriction endonucleases, with the hallmark (D/E)-(D/E)XK active site [52, 53] (for example, the best hit is to pdb: 2ost, homing endonuclease from Synechocystis sp., E-value = 0.04; followed by a hit to pfam04471, Restriction endonuclease, E-value = 0.04). All experimentally characterized superfamily representatives are site-specific endonucleases that cleave dsDNA and possess an enormous variety of recognition sites [52–54]. The active site residues are conserved in all mlr6203 homologs (Figure 4B), so this domain probably is an active DNA endonuclease. As with the xccb100_3097 family proteins, no similarity to the C-terminal domain of the mlr6203 was detected in CDD and HHpred searches. However, the PSI-BLAST search identified 17 homologous proteins with the same domain architecture and predicted operon organization (see Additional File 1).
A typical representative of the third family is RHECIAT_PB0000019 (GI: 190894000) from Rhizobium etli. This protein contains an N-terminal TIR domain that was easily detected by HHpred (the best hit is to pdb: 2js7, TIR domain of myeloid differentiation primary response protein MYD88 from human, E-value of 1.1 × 10-30). The TIR domain mediates protein-protein interactions and belongs to the STIR superfamily that includes mostly eukaryotic proteins involved in diverse signaling pathways as well as a variety of poorly characterized multidomain proteins from bacteria and archaea with large genomes (that also have been implicated in transcription regulation and signaling [55–57]). Notably, TIR domains play important roles in disease and stress resistance in plants . Similarly, in mammals, TIR-domains are key components of the immune system-based antimicrobial and antiviral response, and the programmed cell death (PCD) system [59, 60]. Analysis of domain architectures led to the hypothesis that prokaryotic TIR-domain proteins also could be involved in PCD . All closely related homologs of the RHECIAT_PB0000019 protein contain the TIR domain (see Additional File 3), whereas several proteins in this family (e.g. GI: 162145848) also contain an additional N-terminal domain that belongs to the PD-(D/E)XK nuclease superfamily (a vast assemblage of nucleases that includes, among others, the restriction endonucleases) with all catalytic residues typically conserved (Figure 4B). The C-terminal domain of these proteins is not similar to any known domain, but does show a weak sequence similarity (with statistical significance difficult to demonstrate) to the C-terminal domain of the mlr6203-like family. Considering similar sizes of the corresponding domains in both families and, most importantly, the genomic association with predicted nucleases and pAgos, we strongly suspect that these domains are homologous; examination of their multiple alignment indeed shows several distinct, conserved motifs (see Additional File 3). The predicted secondary structure indicates that this is a globular domain, however, the pattern of amino acid residue conservation does not seem to suggest an enzymatic function. Given that the proteins containing this domain are found exclusively in the same neighborhoods with pAgos that lack the PAZ domain, it is tempting to speculate that this uncharacterized domain is functionally analogous to the PAZ domain, that is, involved in binding a guide nucleic acid molecule (hereinafter we refer to this domain as APAZ, after Analog of PAZ).
The fourth family of pAgo-associated proteins is linked to full-size, PAZ-domain-containing Argonaute homologs and can be typified by the protein PTH_0722 (GI: 147677057) from Pelotomaculum thermopropionicum. This protein contains a C-terminal domain that belongs to the PD-(D/E)XK nuclease superfamily (HHPred detects similarity to SfsA: Sugar fermentation stimulation protein, which contains a PD-(D/E)XK nuclease domain, with E-value = 0.022) and contains all the catalytic residues (Figure 4B); this putative nuclease is clearly distinct from and only very distantly related to the restriction endonuclease domain of the mlr6203-like family proteins. The N-terminal domain of this protein does not show similarity to any characterized domains, has a predicted predominantly α-helical structure and is present only in close homologs of PTH_0722 (see Additional File 4). In the GobsU_24486 protein of Gemmata obscuriglobus, the nuclease domain is replaced by the apparently functionally unrelated SEFIR domain of the STIR superfamily, that is only distantly related to the TIR domain, but is also involved in various signaling pathways .
Several other genomic neighbors of pAgos are worth mentioning (Figure 3). Two genes that encode PAZ-domain-containing but, apparently, inactivated pAgos (in the bacteria Pedobacter heparinus and Spirosoma lingual) are associated with predicted Sir2 family nucleases (Figure 4A). Furthermore, three long forms of pAgos (one inactivated, in the bacterium Dehalococcoides sp, and two apparently active ones in Microcystis aeruginosa and Clostridium bartletti) are associated with PD-(D/E)XK nucleases of a distinct subfamily related to Cas4 (COG1468), which is mostly represented within CASS . Most conspicuously, as noticed previously, in the archaeon Methanopyrus kandleri, the pAgo is encoded within an operon that otherwise encodes components of the CASS .
A potentially important pattern revealed by this analysis of the genomic context of prokaryotic PIWI-domain proteins is that, almost without exception, pAgos with an apparently inactivated catalytic PIWI domain are associated with a predicted nuclease in a putative operon (Figures 2, 3 and see Additional File 1). This observation suggests the possibility of functional complementarity between the nuclease activity of PIWI domains of pAgos and other nucleases, in particular, homologs of restriction endonucleases (see discussion below).
Statistical analysis of the genomic neighborhoods of pAgos reveals a significant link to phage resistance systems
Results of the Fisher Omnibus test for the genomic association of pAgo genes with four classes of phage defense/stress response systems
5.1 × 10-7
2.9 × 10-13
5.8 × 10-10
4.6 × 10-16
Hypothesis: pAgo is a key component of a novel prokaryotic immune system in which it functions either as a nuclease or as a structural subunit of nuclease complexes that utilizes guide RNAs or DNAs to degrade virus/plasmid genomes
Several convergent lines of evidence point to defense against invading mobile elements as the primary function of pAgos. (1). The analogy to eukaryotic Argonautes many of which are dedicated to the defense against viruses and transposable elements. (2). The guide-DNA-dependent nuclease activity of AaAgo and TtAgo. (3). Extensive HGT of pAgos which is best compatible with a stress-response related function. (4). Preferential location of pAgo genes in genomic neighborhoods significantly enriched in known phage-defense genes. (5). Co-localization of PIWI-domain protein genes with genes encoding other (predicted) nucleases. (6). The near perfect complementarity between the predicted nuclease and guide-binding activities of pAgos and co-localization with other putative nucleases: the inactivated pAgos that lack the PAZ domain are associated with genes encoding predicted nucleases whereas the apparently active, PAZ-containing pAgos are not (Figure 3). The latter observation suggests that pAgos function within nuclease complexes, in some cases as their catalytic subunits, and in other cases, as structural subunits interacting with the actual nucleases.
The PD-(D/E)XK superfamily nucleases, to which the predicted nucleases associated with the majority of pAgos are homologous, so far have been shown to cleave exclusively dsDNA. Thus, it seems most likely that the predicted pAgo-based defense systems directly target invader dsDNA genomes rather than mRNAs (Figure 5). On the other hand, as stated above, in vitro analyses have revealed that AaAgo and TtAgo are most active as DNA-guided ribonuclease, suggesting that RNA may be a target as well [REFS [35, 36]]. The guide molecule could be either a small RNA (with the implication that the respective nuclease cleaves a RNA-DNA hybrid) or a small DNA as suggested by the study of AaAgo [63, 64] and TtAgo [31, 36].
The proposed model for the pAgo-based phage defense shows functional analogies to both CASS and the eukaryotic RNAi (Figure 5). Given the phylogenetic affinity of a distinct family of apparently active archaeal pAgos and eukaryotic Argonautes (Figure 3), this hypothetical defense system is the probable evolutionary progenitor of the eukaryotic RNAi. The spread of RNA viruses in eukaryotes that was accompanied by the displacement of the majority of DNA viruses  could have been the driving force behind the switch of the specificity of this defense system from DNA to RNA.
The functions of the pAgos to some extent have been characterized in vitro (Yuan 2005)[31, 36] but remain to be determined in vivo. The convergence of several lines of evidence discussed here seems to strongly support the hypothesis that pAgos are key components of a novel class of immune system that employ guide DNA or RNA molecules to destroy virus and plasmid DNA or mRNA). These proposed mechanisms of action suggest functional parallels between the predicted pAgo-based defense systems and CASS, and a direct evolutionary link between the former and eukaryotic RNAi. The predictions of the hypothesis, in particular, the nuclease activity catalyzed by PAZ-domain-containing but not by PAZ-domain-lacking pAgos, the complementary activities of associated putative nucleases, and guide DNA or RNA binding by the APAZ domains are amenable to straightforward experimental validation.
All analyzed sequences were from the non-redundant protein sequence database at the NCBI. Database searches were performed using PSI-BLAST , typically, with the inclusion threshold E = 0.01, and no composition-based statistics or low complexity filtering, or the HH search program available through the HHpred server . Multiple alignments of protein sequences were constructed by combining the results obtained with the PROMALS program  and the MUSCLE program , followed by a minimal manual correction on the basis of local alignments obtained using PSI-BLAST . Protein secondary structure was predicted using the PSIPRED program .
Maximum likelihood (ML) phylogenetic trees were constructed from the alignment of PIWI domain region (only positions with less than 30% gaps were used for reconstruction – 258 altogether), by using the MOLPHY program  with the JTT substitution matrix to perform local rearrangement of an original Fitch tree . The MOLPHY program was also used to compute RELL bootstrap values.
Fisher Omnibus test
Only 45 completely sequenced genomes were used for this analysis; the complete genome information was obtained from FTP of RefSeq database (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/;). Proteins in these genomes were assigned to COGs using a modified COGNITOR program . The target sets of phages defense proteins were obtained from the following sources: restriction-modification (RM) systems related protein from REBASE ; abortive infection (ABI) related genes from the Chopin et al. review ; CRISPR systems related genes from  and toxin-antitoxin related genes from . Proteins of the RM and ABI systems were assigned to COG as indicated above, and for other systems, COG numbers have been already reported in the aforementioned papers (see the complete list of these COGs in Additional File 5).
Daniel Haft, The J. Craig Venter Institute
Draft Public Comments
"Emerging evidence about prokaryotic homologs of Argonaute (pAgo) makes it clear that these proteins are related to their eukaryotic counterparts not just in sequence and structure, but also in molecular function. They might be related as well in terms of biological process, perhaps with many or most serving a primary function of phage resistance rather than of host gene transcriptional regulation. The case made in this manuscript, as argued by the interpretation of protein domain architecture, is highly suggestive. However, the statistical test for genomic association of pAgo with other phage resistance systems is currently unconvincing in the absence of a negative control. Other possible roles for pAgos seem equally consistent with available data."
a negative control, namely, a test of the possible association of pAgos with mobile genes that are not involved in phage defense is included in the revised manuscript (see Additional File 5). As the result of this test was indeed negative, we find the statistical evidence as convincing as it can be although the final proof, of course, can only be experimental.
"One alternate possibility is that most pAgos serve as machinery for boutique host regulatory systems. Anti-sense RNA expression in bacteria has been underappreciated; its prevalence likely is still underestimated. Some antisense RNA is cis-acting, through a mechanism of transcriptional interference, but some is trans-acting, through mechanisms of dsRNA formation. Since the trans-acting antisense RNAs themselves have won only a limiting understanding, it stands to reason that mechanisms acting downstream of dsRNA formation also are incompletely understood. A role for many pAgo proteins in the control of host gene expression seems quite likely."
The possibility that some pAgos are also involved in regulation of bacterial genes is certainly interesting and not implausible. However, the data presented in this paper suggest to us that the functions in defense against mobile elements are primary.
"A second possibility for these systems, supported by their apparent high degree of lateral transfer, is that most are selfish genetic elements. By analogy to transposons, homing endonucleases encoded within inteins, and temperate phage, these systems may carry out nuclease reactions simply to mediate their own spread. Some incidental benefit to host genomes is possible; any endogenous nuclease, it may be assumed, has some potential to cleave phage DNA or RNA, as in the example of ribonuclease HIII vs. RNA phage. But that level of phage resistance capability could be regarded as secondary."
All prokaryotic defense and stress response systems are to a large extent selfish as discussed in detail for restriction-modification and toxin-atitoxin systems. We strongly suspect that this is indeed the case for the putative pAgo-centered system as well.
"The extreme selective pressures of phage/host warfare make it quite likely that the proposed role for pAgos in phage resistance in prokaryotes is at least occasionally true. The greater question is whether pAgos proteins represent a new, major player in prokaryotic resistance to phage attack, and whether most pAgos proteins have host defense as a primary role. This is a mirror to the question of whether CRISPR arrays might be co-opted to serve perform regulatory functions, given their extreme plasticity and their transcription into small RNAs – one might examine repeat arrays in after phage-free serial passage of selected strains under extreme selection."
Cooperation of pAgo with the CRISPR system cannot be ruled out but appears unlikely. Of the 780 bacterial and archaeal genomes that we analyzed for the presence of CRISPR and pAgo, 291 encoded CRISPR and 51 encoded pAgo, with the overlap of only 28 genomes. Of course, the localization of the pAgo gene within the Cas gene array in Methanopyrus kandleri is suggestive but so far this remains the only genome that shows such an association.
"Restriction enzyme systems, especially restriction/modification systems, discriminate self vs. non-self by recognizing short sequence signatures in phage that are either masked or missing in the host. CRISPR systems discriminate self from non-self by capture and expression of samples of exogenous DNA. Both abortive infection systems and toxin-antitoxin systems have the potential to shut down the host cell, in response to stress from phage infection, in order to block the phage life cycle. Each of these schemes provides a clear model of how defense mechanisms are triggered. The trickiest part of the model for pAgos in phage defense concerns the source of guide DNA or RNA. Is it DNA encoded on the host chomosome? Will it have a promoter and a terminator? It seems at least theoretically possible that CRISPR arrays themselves might be a source. If a typical CRISPR system targets phage DNA according to exact matches to spacer sequences, one might postulate a backup system in which the same small RNAs, with some tolerance for mismatches, silence phage mRNA. It therefore makes sense to ask – what fraction of pAgos-containing genomes have CRISPR systems, and is the prevalence significantly higher for any subgroup of pAgos?"
It is indeed true that we do not have any inkling of the source of the putative guide DNA or RNA that is employed by pAgo. The idea that pAgo might share the guide molecules with CRISPR is very interesting. The problem is that, as indicated above, there is no clear sign of cooperation between pAgo and CRISPR, and what is most damning for this provocative idea, is that the majority of the genomes that encode pAgo possess no CRISPR.
We attempted to search for sequence conservation and repetitive elements in the upstream and downstream regions of pAgo operons but failed to find anything suggestive. When more closely related genomes encoding pAgo become available, it will be necessary to repeat this attempt.
A reasonable view of genome organization is that some regions of a genome are more plastic than others. The more plastic regions would be expected to accumulate prophages, transposons, integrated plasmids, conjugation regions, pseudogenes, and "fitness factors" such as CASS, antibiotic resistance genes, virulence genes, and capsular polysaccharide genes, all in close proximity. In this view, genes encoding restriction systems and CRISPR systems likely would occur close to each other because both the region tolerates insertion, not because both system mediate host defense. The statistical argument, therefore, does not currently allow one to discriminate phage defense from other possible functions for these systems. If the statistical association with RM and CASS is not replicated by associations with secretion systems, pilus proteins, integrases and recombinases, plasmid partition proteins, capsular polysaccharide biosynthesis genes, etc, then it may become somewhat more convincing.
We appreciate this suggestion and sought to test the hypothesis that co-localization of pAgo genes with those for other systems of antiphage defence is a trivial consequence of the occurrence of all these genes in highly plastic regions of prokaryotic genomes. To this end, we examined the potential association of pAgo genes with typical components of the mobilome such as transposases, integrases, and various genes of apparent phage origin. As indicated in the revised text of the article and presented in detail in the Additional Files 5 and 6, there was no significant association between pAgo and the elements of the mobilome. Thus we believe that the most parsimonious interpretation of the data is that there are indeed phage defence islands in prokaryotic genomes and pAgo genes show a strong association with these islands.
Martijn Huynen, Radboud University, Nijmegen Medical Centre
The manuscript by Makarova and co-workers provides a compelling argument for the functional link between Bacterial and Archaeal Argonaute proteins and proteins that are involved in defense against "foreign" DNA.
I only have a few comments:
Studies on the value of the genomic association of genes for the prediction of functional links between proteins have gone to a great length to actually benchmark at which level of genomic association it not only becomes statistically significant, but also functionally meaningful in terms of predicting that proteins are actually involved in the same pathway. I cannot judge the level of "functional relevance" of the P-values provided in table 1.
Along the same lines: can the authors give simple numbers of how often the four protein families were discovered in the vicinity of the 100 pAgos genes.
This information is now available in the new Additional File 6for the set of 45 genomes that were analyzed using the Fisher Omnibus test.
I take it that all genomes that were included in the significance study were phylogenetically distant enough to assure that gene order conservation was not trivial?
No, we did this analysis for all available genomes, since even in some closely related genomes the location of the pAgo operons is different. In response to these concerns, we have redone the analysis for distantly related genomes only. The results have not substantially change; actually, even more significant p-values were obtained (see the new Additional File 6).
"This analysis resulted" I cannot find how this analysis was done, Fisher Ombnibus test mentioned in the methods does not require genes to be part of the same potential operon, and "predicted to be co-expressed" can thus not be concluded from it.
In the revised manuscript, the criteria for calling potential operons are given explicitly.
Chris Ponting, Oxford University
Makarova et al. have undertaken a thorough and illuminating analysis of prokaryotic Argonaute homologs. Their analysis consists first of detailed sequence analysis of PIWI domain homologs followed by investigation of putative operons. The manuscript ends with a nice demonstration that pAgo genomic regions are significantly enriched for phage defense genes. This allows them to pose an important and testable hypothesis which provides the major contribution of this paper. The manuscript is well written and its analyses are sound.
KSM, YIW and EVK are supported by intramural funds of the DHHS (National Library of Medicine, National Institutes of Health)
- Denli AM, Hannon GJ: RNAi: an ever-growing puzzle. Trends Biochem Sci. 2003, 28 (4): 196-201. 10.1016/S0968-0004(03)00058-6.PubMedView Article
- Hannon GJ: RNA interference. Nature. 2002, 418 (6894): 244-251. 10.1038/418244a.PubMedView Article
- Zamore PD, Haley B: Ribo-gnome: the big world of small RNAs. Science. 2005, 309 (5740): 1519-1524. 10.1126/science.1111444.PubMedView Article
- Siomi H, Siomi MC: On the road to reading the RNA-interference code. Nature. 2009, 457 (7228): 396-404. 10.1038/nature07754.PubMedView Article
- Ghildiyal M, Zamore PD: Small silencing RNAs: an expanding universe. Nat Rev Genet. 2009, 10 (2): 94-108. 10.1038/nrg2504.PubMedPubMed CentralView Article
- Moazed D: Small RNAs in transcriptional gene silencing and genome defence. Nature. 2009, 457 (7228): 413-420. 10.1038/nature07756.PubMedPubMed CentralView Article
- Filipowicz W: RNAi: the nuts and bolts of the RISC machine. Cell. 2005, 122 (1): 17-20. 10.1016/j.cell.2005.06.023.PubMedView Article
- Tang G: siRNA and miRNA: an insight into RISCs. Trends Biochem Sci. 2005, 30 (2): 106-114. 10.1016/j.tibs.2004.12.007.PubMedView Article
- Sontheimer EJ: Assembly and function of RNA silencing complexes. Nat Rev Mol Cell Biol. 2005, 6 (2): 127-138. 10.1038/nrm1568.PubMedView Article
- Umbach JL, Cullen BR: The role of RNAi and microRNAs in animal virus replication and antiviral immunity. Genes Dev. 2009, 23 (10): 1151-1164. 10.1101/gad.1793309.PubMedPubMed CentralView Article
- Cullen BR: Viral and cellular messenger RNA targets of viral microRNAs. Nature. 2009, 457 (7228): 421-425. 10.1038/nature07757.PubMedPubMed CentralView Article
- Carthew RW, Sontheimer EJ: Origins and Mechanisms of miRNAs and siRNAs. Cell. 2009, 136 (4): 642-655. 10.1016/j.cell.2009.01.035.PubMedPubMed CentralView Article
- Miyoshi K, Tsukumo H, Nagami T, Siomi H, Siomi MC: Slicer function of Drosophila Argonautes and its involvement in RISC formation. Genes Dev. 2005, 19 (23): 2837-2848. 10.1101/gad.1370605.PubMedPubMed CentralView Article
- Jinek M, Doudna JA: A three-dimensional view of the molecular machinery of RNA interference. Nature. 2009, 457 (7228): 405-412. 10.1038/nature07755.PubMedView Article
- Gottesman S: Micros for microbes: non-coding regulatory RNAs in bacteria. Trends Genet. 2005, 21 (7): 399-404. 10.1016/j.tig.2005.05.008.PubMedView Article
- Majdalani N, Vanderpool CK, Gottesman S: Bacterial small RNA regulators. Crit Rev Biochem Mol Biol. 2005, 40 (2): 93-113. 10.1080/10409230590918702.PubMedView Article
- Waters LS, Storz G: Regulatory RNAs in bacteria. Cell. 2009, 136 (4): 615-628. 10.1016/j.cell.2009.01.043.PubMedPubMed CentralView Article
- Zhang A, Wassarman KM, Rosenow C, Tjaden BC, Storz G, Gottesman S: Global analysis of small RNA and mRNA targets of Hfq. Mol Microbiol. 2003, 50 (4): 1111-1124. 10.1046/j.1365-2958.2003.03734.x.PubMedView Article
- Sittka A, Lucchini S, Papenfort K, Sharma CM, Rolle K, Binnewies TT, Hinton JC, Vogel J: Deep sequencing analysis of small noncoding RNA and mRNA targets of the global post-transcriptional regulator, Hfq. PLoS Genet. 2008, 4 (8): e1000163-10.1371/journal.pgen.1000163.PubMedPubMed CentralView Article
- Tang TH, Bachellerie JP, Rozhdestvensky T, Bortolin ML, Huber H, Drungowski M, Elge T, Brosius J, Huttenhofer A: Identification of 86 candidates for small non-messenger RNAs from the archaeon Archaeoglobus fulgidus. Proc Natl Acad Sci USA. 2002, 99 (11): 7536-7541. 10.1073/pnas.112047299.PubMedPubMed CentralView Article
- Gerdes K, Wagner EG: RNA antitoxins. Curr Opin Microbiol. 2007, 10 (2): 117-124. 10.1016/j.mib.2007.03.003.PubMedView Article
- Makarova KS, Grishin NV, Shabalina SA, Wolf YI, Koonin EV: A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol Direct. 2006, 1 (1): 7-10.1186/1745-6150-1-7.PubMedPubMed CentralView Article
- Sorek R, Kunin V, Hugenholtz P: CRISPR – a widespread system that provides acquired resistance against phages in bacteria and archaea. Nat Rev Microbiol. 2008, 6 (3): 181-186. 10.1038/nrmicro1793.PubMedView Article
- Marraffini LA, Sontheimer EJ: CRISPR interference limits horizontal gene transfer in staphylococci by targeting DNA. Science. 2008, 322 (5909): 1843-1845. 10.1126/science.1165771.PubMedPubMed CentralView Article
- Makarova KS, Aravind L, Grishin NV, Rogozin IB, Koonin EV: A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res. 2002, 30 (2): 482-496. 10.1093/nar/30.2.482.PubMedPubMed CentralView Article
- Wiedenheft B, Zhou K, Jinek M, Coyle SM, Ma W, Doudna JA: Structural Basis for DNase Activity of a Conserved Protein Implicated in CRISPR-Mediated Genome Defense. Structure. 2009, 17 (6): 904-912. 10.1016/j.str.2009.03.019.PubMedView Article
- Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ, Snijders AP, Dickman MJ, Makarova KS, Koonin EV, Oost van der J: Small CRISPR RNAs guide antiviral defense in prokaryotes. Science. 2008, 321 (5891): 960-964. 10.1126/science.1159689.PubMedView Article
- Oost van der J, Jore MM, Westra ER, Lundgren M, Brouns SJJ: CRISPR-based adaptive and heritable immunity in prokaryotes. Trends Biochem Sci. 2009,
- Shabalina SA, Koonin EV: Origins and evolution of eukaryotic RNA interference. Trends Ecol Evol. 2008, 23 (10): 578-587. 10.1016/j.tree.2008.06.005.PubMedPubMed CentralView Article
- Rashid UJ, Paterok D, Koglin A, Gohlke H, Piehler J, Chen JC: Structure of Aquifex aeolicus argonaute highlights conformational flexibility of the PAZ domain as a potential regulator of RNA-induced silencing complex function. J Biol Chem. 2007, 282 (18): 13824-13832. 10.1074/jbc.M608619200.PubMedView Article
- Wang Y, Sheng G, Juranek S, Tuschl T, Patel DJ: Structure of the guide-strand-containing argonaute silencing complex. Nature. 2008, 456 (7219): 209-213. 10.1038/nature07315.PubMedPubMed CentralView Article
- Ma JB, Yuan YR, Meister G, Pei Y, Tuschl T, Patel DJ: Structural basis for 5'-end-specific recognition of guide RNA by the A. fulgidus Piwi protein. Nature. 2005, 434 (7033): 666-670. 10.1038/nature03514.PubMedPubMed CentralView Article
- Song JJ, Smith SK, Hannon GJ, Joshua-Tor L: Crystal structure of Argonaute and its implications for RISC slicer activity. Science. 2004, 305 (5689): 1434-1437. 10.1126/science.1102514.PubMedView Article
- Joshua-Tor L: The Argonautes. Cold Spring Harb Symp Quant Biol. 2006, 71: 67-72. 10.1101/sqb.2006.71.048.PubMedView Article
- Yuan YR, Pei Y, Ma JB, Kuryavyi V, Zhadina M, Meister G, Chen HY, Dauter Z, Tuschl T, Patel DJ: Crystal structure of A. aeolicus argonaute, a site-specific DNA-guided endoribonuclease, provides insights into RISC-mediated mRNA cleavage. Mol Cell. 2005, 19 (3): 405-419. 10.1016/j.molcel.2005.07.011.PubMedPubMed CentralView Article
- Wang Y, Juranek S, Li H, Sheng G, Tuschl T, Patel DJ: Structure of an argonaute silencing complex with a seed-containing guide DNA and target RNA duplex. Nature. 2008, 456 (7224): 921-926. 10.1038/nature07666.PubMedPubMed CentralView Article
- Parker JS, Roe SM, Barford D: Crystal structure of a PIWI protein suggests mechanisms for siRNA recognition and slicer activity. Embo J. 2004, 23 (24): 4727-4737. 10.1038/sj.emboj.7600488.PubMedPubMed CentralView Article
- Yang W, Steitz TA: Recombining the structures of HIV integrase, RuvC and RNase H. Structure. 1995, 3 (2): 131-134. 10.1016/S0969-2126(01)00142-3.PubMedView Article
- Tolia NH, Joshua-Tor L: Slicer and the argonautes. Nat Chem Biol. 2007, 3 (1): 36-43. 10.1038/nchembio848.PubMedView Article
- Parker JS, Roe SM, Barford D: Structural insights into mRNA recognition from a PIWI domain-siRNA guide complex. Nature. 2005, 434 (7033): 663-666. 10.1038/nature03462.PubMedPubMed CentralView Article
- Aravind L: Guilt by association: contextual information in genome analysis. Genome Res. 2000, 10 (8): 1074-1077. 10.1101/gr.10.8.1074.PubMedView Article
- Galperin MY, Koonin EV: Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol. 2000, 18 (6): 609-613. 10.1038/76443.PubMedView Article
- Huynen M, Snel B, Lathe W, Bork P: Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 2000, 10 (8): 1204-1210. 10.1101/gr.10.8.1204.PubMedPubMed CentralView Article
- Imai S, Johnson FB, Marciniak RA, McVey M, Park PU, Guarente L: Sir2: an NAD-dependent histone deacetylase that connects chromatin silencing, metabolism, and aging. Cold Spring Harb Symp Quant Biol. 2000, 65: 297-302. 10.1101/sqb.2000.65.297.PubMedView Article
- North BJ, Verdin E: Sirtuins: Sir2-related NAD-dependent protein deacetylases. Genome Biol. 2004, 5 (5): 224-10.1186/gb-2004-5-5-224.PubMedPubMed CentralView Article
- Mantel C, Broxmeyer HE: Sirtuin 1, stem cells, aging, and stem cell aging. Curr Opin Hematol. 2008, 15 (4): 326-331. 10.1097/MOH.0b013e3283043819.PubMedPubMed CentralView Article
- Schwer B, Verdin E: Conserved metabolic regulatory functions of sirtuins. Cell Metab. 2008, 7 (2): 104-112. 10.1016/j.cmet.2007.11.006.PubMedView Article
- Cosgrove MS, Bever K, Avalos JL, Muhammad S, Zhang X, Wolberger C: The structural basis of sirtuin substrate affinity. Biochemistry. 2006, 45 (24): 7511-7521. 10.1021/bi0526332.PubMedView Article
- Zhao K, Chai X, Marmorstein R: Structure and substrate binding properties of cobB, a Sir2 homolog protein deacetylase from Escherichia coli. J Mol Biol. 2004, 337 (3): 731-741. 10.1016/j.jmb.2004.01.060.PubMedView Article
- Iyer LM, Makarova KS, Koonin EV, Aravind L: Comparative genomics of the FtsK-HerA superfamily of pumping ATPases: implications for the origins of chromosome segregation, cell division and viral capsid packaging. Nucleic Acids Res. 2004, 32 (17): 5260-5279. 10.1093/nar/gkh828.PubMedPubMed CentralView Article
- Finnin MS, Donigian JR, Pavletich NP: Structure of the histone deacetylase SIRT2. Nat Struct Biol. 2001, 8 (7): 621-625. 10.1038/89668.PubMedView Article
- Kinch LN, Ginalski K, Rychlewski L, Grishin NV: Identification of novel restriction endonuclease-like fold families among hypothetical proteins. Nucleic Acids Res. 2005, 33 (11): 3598-3605. 10.1093/nar/gki676.PubMedPubMed CentralView Article
- Knizewski L, Kinch LN, Grishin NV, Rychlewski L, Ginalski K: Realm of PD-(D/E)XK nuclease superfamily revisited: detection of novel families with modified transitive meta profile searches. BMC Struct Biol. 2007, 7: 40-10.1186/1472-6807-7-40.PubMedPubMed CentralView Article
- Williams RJ: Restriction endonucleases: classification, properties, and applications. Mol Biotechnol. 2003, 23 (3): 225-243. 10.1385/MB:23:3:225.PubMedView Article
- Aravind L, Dixit VM, Koonin EV: The domains of death: evolution of the apoptosis machinery. Trends Biochem Sci. 1999, 24 (2): 47-53. 10.1016/S0968-0004(98)01341-3.PubMedView Article
- Koonin EV, Aravind L: Origin and evolution of eukaryotic apoptosis: the bacterial connection. Cell Death Differ. 2002, 9 (4): 394-404. 10.1038/sj.cdd.4400991.PubMedView Article
- Novatchkova M, Leibbrandt A, Werzowa J, Neubuser A, Eisenhaber F: The STIR-domain superfamily in signal transduction, development and immunity. Trends Biochem Sci. 2003, 28 (5): 226-229. 10.1016/S0968-0004(03)00067-7.PubMedView Article
- Brikos C, O'Neill LA: Signalling of toll-like receptors. Handb Exp Pharmacol. 2008, 21-50. full_text. 183
- Palsson-McDermott EM, O'Neill LA: Building an immune system from nine domains. Biochem Soc Trans. 2007, 35 (Pt 6): 1437-1444. 10.1042/BST0351437.PubMedView Article
- Burch-Smith TM, Dinesh-Kumar SP: The functions of plant TIR domains. Sci STKE. 2007, 2007 (401): pe46-10.1126/stke.4012007pe46.PubMedView Article
- Aravind L, Koonin EV: DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res. 1999, 27 (23): 4658-4670. 10.1093/nar/27.23.4658.PubMedPubMed CentralView Article
- Noto MJ, Kreiswirth BN, Monk AB, Archer GL: Gene acquisition at the insertion site for SCCmec, the genomic island conferring methicillin resistance in Staphylococcus aureus. J Bacteriol. 2008, 190 (4): 1276-1283. 10.1128/JB.01128-07.PubMedPubMed CentralView Article
- Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998, 14 (1): 48-54. 10.1093/bioinformatics/14.1.48.PubMedView Article
- Hols P, Hancy F, Fontaine L, Grossiord B, Prozzi D, Leblond-Bourget N, Decaris B, Bolotin A, Delorme C, Dusko Ehrlich S, et al: New insights in the molecular biology and physiology of Streptococcus thermophilus revealed by comparative genomics. FEMS Microbiol Rev. 2005, 29 (3): 435-463. 10.1016/j.femsre.2005.04.008.PubMed
- Koonin EV, Senkevich TG, Dolja VV: The ancient Virus World and evolution of cells. Biol Direct. 2006, 1: 29-10.1186/1745-6150-1-29.PubMedPubMed CentralView Article
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView Article
- Soding J, Biegert A, Lupas AN: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005, W244-248. 10.1093/nar/gki408. 33 Web Server
- Pei J, Kim BH, Grishin NV: PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 2008, 36 (7): 2295-2300. 10.1093/nar/gkn072.PubMedPubMed CentralView Article
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMedPubMed CentralView Article
- McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics. 2000, 16 (4): 404-405. 10.1093/bioinformatics/16.4.404.PubMedView Article
- Adachi J, Hasegawa M: MOLPHY: Programs for molecular phylogenetics. Computer Science Monographs 27. 1992, Tokyo: Institute of Statistical Mathematics
- Felsenstein J: Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol. 1996, 266: 418-427. full_text.PubMedView Article
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, D61-65. 10.1093/nar/gkl842. 35 Database
- Makarova KS, Sorokin AV, Novichkov PS, Wolf YI, Koonin EV: Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea. Biol Direct. 2007, 2: 33-10.1186/1745-6150-2-33.PubMedPubMed CentralView Article
- Roberts RJ, Vincze T, Posfai J, Macelis D: REBASE – enzymes and genes for DNA restriction and modification. Nucleic Acids Res. 2007, D269-270. 10.1093/nar/gkl891. 35 Database
- Chopin MC, Chopin A, Bidnenko E: Phage abortive infection in lactococci: variations on a theme. Curr Opin Microbiol. 2005, 8 (4): 473-479. 10.1016/j.mib.2005.06.006.PubMedView Article
- Makarova KS, Wolf YI, Koonin EV: Comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes. Biol Direct. 2009, 4 (1): 19-10.1186/1745-6150-4-29.PubMedPubMed CentralView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.