Reviewer's report 1
Eric Bapteste, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
The scientific quality of this paper and its methodology is certain. There has been a lot of good and interesting work done here. As indicated by its title, it thus provides multiple information and one hypothesis about RNA-interference in prokaryotes. Because of this broad scope, the manuscript is quite large. In fact, several of its parts could be read on their own, depending on the reader-specific interest, this is notably the case for the part dealing with the hypothesis of a RNAi prokaryotic immune system. I would thus suggest that a shorter version of the paper, centered around this very interesting hypothesis could be proposed online (with the first part turning into Supp. Mat.), because I feel that this part is going to receive more attention anyway, and it would be unfortunate if some readers did not consider this aspect because they are scared by the overall size of the paper. But this is simply a suggestion, and the authors are more than welcome to disregard my opinion.
Author response: While we fully understand the sentiment and agree that the hypothesis on prokaryotic RNAi is of greater general interest than the detailed presentation of the protein sequence-structure analysis, we strongly feel that the latter provides a badly needed foundation for the hypothesis as well as important information in its own right. Furthermore, we tend to believe that the general spirit of online publication is to present complete results of a study (of course, there are exceptions). The reader can easily navigate between sections, so the length of a paper does not represent a particularly severe problem. Furthermore, we made certain modifications to the protein analysis part in response to similar but more specific comments of Martijn Huynen, in particular, introduced additional subheadings which, hopefully, makes this part of the paper more reader-friendly.
The seductive hypothesis of a RNAi based immune system is presented as an analogy with the eukaryotic RNAi system. The use of analogy is potentially challenging: on the one hand, it allows a powerful and elegant presentation of many complex genomic results, but on the other hand, it is questionable, since the analogy may impose an a priori model to interpret biological features, and if this model is incorrect, if the analogy does not hold, there is a risk that the genomics data receive a fairly biased interpretation. In this respect, it would be interesting if the authors discuss whether an homologous immune system could have been possible in eukaryotes and in prokaryotes, and why it is not found. Indeed, such an homologous system would be a more natural reference to interpret the data than an analogy.
Author response: We understand the epistemological concerns regarding the role of analogy in this study. However, as indicated by the reviewer himself, the analogy is strong. Moreover, this analogy is manifest at two different levels: i) the presence of inserts homologous to phage and plasmid genes in CRISPR units and ii) presence of predicted activities compatible with a siRNA-like system among cas gene products, in particular, the dicer analog. Had the analogy been false, there would be no reason whatever for this congruence. We briefly comment to that effect in the revised manuscript. The idea of homologous immune systems in prokaryotes and eukaryotes seems a little far-fetched. Nevertheless, this comment prompted us to incorporate a brief comparison of the evolutionary histories of RNAi and classical immune systems.
This being said, I do not feel that the use of the analogy was a problem here, as it is convincingly presented and argued by the authors. We could eventually question more if all the so-called CASS genes are really involved in the prokaryotic immune systemand deserve their label: some may just be present in the genomic proximity of CRISP, yet having nothing to do with the RNA interference. They might just be mobile «travelling» genes. A further study of the genomic distribution of the homologs of the CASS genes in bacterial genomes may help to clarify which genes are strongly and exclusively CRISP related and which ones can be found also in alternative locations in different genomes. Then, perhaps the «striking diversity of still poorly characterized CASS components» described on page 16 would appear less striking if some CASS categories are simply not relevantly defined, and include unrelated proteins, since maybe, the use of the analogy would had led to too relaxed definitions of CASS. In another situation, by contrast, the use of the analogy could perhaps be too strict. On page 18, the authors wonder how to identify «the slicer counterpart (p-slicer)» in prokaryotes. They explain that this identification is «less straightforward because of the diversity of predicted nucleases within the CASS». But, after all, why should there be only one p-slicer, as in eukaryotes? It is possible that prokaryotes have multiple «p-slicers».
Author response: Yes, we agree, the possibility of multiple slicers exists, and we modified the text to acknowledge this. With regard to the rest of this comment, however, we feel that the current diversity of prokaryotic genomes is already sufficient to make conclusions on the strength of the association of individual genes with CASS, and the genes are classified here accordingly, as true CASS components and loosely associated "satellites". As far as the latter are concerned, inferences on involvement with CASS functions are made only for those genes whose activities appear clearly relevant, like the RT or Argonaut.
Finally, the strong suspicion about the analogous and intricate functions of COG1518 and COG1343 (cf. page 21) could be similarly toned down. Maybe these genes do play the essential analogous role of CRISPR integrase/recombinase consistently with the analogy, but maybe they fulfill several different tasks. Perhaps the authors would like to comment more on some of these minor points.
Author response: Actually, this prediction is not based on analogy with eukaryotic RNAi systems but rather on the mutualistic association of these genes to CRISPR and the features of the proteins themselves. We dropped the "strong" suspicion but, generally, we strongly believe that this is the best possible prediction. Reference to multitasking might not be particularly productive unless there are good ideas regarding what these multiple functions might be (this is different from the above possibility that there are multiple slicers which is, indeed, compatible with the data).
Also, to go back to the CASS gene evolution, the authors mention, page 8, « the extraordinary evolutionary mobility of CASS». It is unclear to me how this statement has been tested, and how the authors have established that CASS genes are more mobile than any average gene randomly picked in the same collection of prokaryotic genomes. For this reason, I am not sure if, as claimed by the authors, on page 14 «the pol-cassette comprises a distinct evolutionary unit that is often transferred horizontally independently of the CASS-core». Does the CASS-core really have an established vertical mode of inheritanceor, as the authors stated before, a «non-uniform» distribution (cf. page 8)? This might be more strongly argued.
Author response: Several distinct issues are addressed here. With regard to the 'extraordinary mobility' of CASS, this is demonstrated by the trees in Fig. 2(more trees have been published previously in our own 2002 paper and by Haft et al.), but even more convincingly, by the persistent pattern of presence-absence of CASS in closely related species and even strains of bacteria. We consolidated the argument such that this becomes clear the first time "extraordinary" mobility comes up. It is true that we did not compare the mobility of the CASS components with that of garden-variety prokaryotic genes in a rigorous, quantitative manner. While this is doable, in principle, all methods we are aware of are open to debate, and we feel that the exercise is beyond the scope of this paper. Given the above argument, we believe that, qualitatively, it is clear that CASS is unusual in this respect. With regard to the pol-casette, we believe that the discrepancy between the topologies of the two trees in Fig. 2is quite sufficient for the statement on independent HGT. As for the "vertical mode of inheritance" of CASS, there seems to be a semantic issue here. We do not really claim vertical inheritance for CASS but neither is such a pattern necessary to detect horizontal mobility. What is required is a predominant pattern of vertical inheritance among other genes that allows us to use a species tree to detect HGT. Of course, we realize that there are substantial arguments for abandoning "tree thinking"altogether but, on balance, we still believe that a species tree conceptualized as a central trend in the evolution of gene ensembles is, at least, a useful tool for analysis of genome evolution.
These few questions show a strength of the present work which interestingly opens perspectives and suggests that some additionnal analyses should now be conducted, because the topic deserves consideration. Maybe the authors would feel like addressing some of the points below in a revised version of the current paper, or in future analyses.
Author response: These are, indeed, very interesting questions, we appreciate them. Some are for future studies but we can provide certain answers now.
Further study could iclude the following:
- Do other genomic regions harboring concentrations of nucleasescomparable to the ones around CRISP exist elsewhere in the genomes?
Author response: Hardly. As indicated in the paper, the CRISPR neighborhood is the second most prominent (i.e., the one that ranks second in the number of genes) neighborhood in prokaryotic genomes after the ribosomal superoperons, so it is quite outstanding. However, there are other, considerably small constellations of nucleases, such as the classical recBCD operon encoding repair proteins, some restriction-modification systems, and, perhaps, others that are still poorly understood and deserve investigation.
- If yes, is there more than one prokaryotic immune system definable on this analogous ground? Notably, did bacteria without CRISPR evolve a totally different immune system?
Author response: There is no evidence of that. Furthermore, as repeatedly emphasized in this paper, CASS shows extreme evolutionary volatility, apparently being lost quite easily, in a short time, on evolutionary scale. It is hardly imaginable that these bacteria evolved a distinct immune system in the short time elapsed since the loss of CASS. Of course, purely hypothetically, one could perceive the possibility that another immune system is disseminated horizontally, like CASS, and prokaryotes having both, could differentially lose one of them. However, we are unaware of any support for such a scenario. Another prominent prokaryotic defense mechanism is restriction-modification; it would be interesting to examine the relationship between RM systems and CASS, that could be a subject for a future study.
How did the psiRNA pathway arise in thermophiles(cf. page 20)? Does it result from a transfer? Was it ancestral?
Author response: Very interesting, fundamental questions, indeed. In response, we expanded the discussion of these and other aspects of evolution of CASS. The specific preponderance of CASS in thermophiles, noticed already in the 2002 paper, when we thought that this was a thermophile-specific repair system, remains a mystery. Whatever the nature of this association, it seems likely that CASS is ancestral in thermophiles (at least in hyperthermophiles).
- Could we imagine that multiples promoters exist, both sense and antisense, which would activate the transcription of CRISPR, generating even more RNAi(cf.p 24)?
Author response: In principle, existence of multiple promoters cannot be ruled out. However, the leader sequence seems to be the only natural candidate for the promoter function. The rest of the CRISPR cassette is homogeneous (repetitive), so it is unclear where an alternative promoter would be located. Further, in the two archaeal systems that have been studied experimentally (Archaeoglobus and Sulfolobus) all transcription of CRISPR loci appear to be unidirectional.
- Finally, it might be challenging, though interesting to test in vitro on bacterial cultures if, as proposed by the authors, the presence of CRISP and CASS, has really an impact on the fitness of prokaryotes in presence of viruses.
Author response: We certainly hope that the computational analyses and predictions described in this paper stimulate a lot of experimentation aimed at elucidation of the biological functions of CASS and roles of its individual components.
We greatly appreciate these insightful and stimulating comments.
On page 7: «functionally analogous» is redundant.
Author response: We see the point but do not really agree. The word "functionally" seems to add clarity.
On page 8: the sentence «the distribution of COG1518 and, by implication, CASS among prokaryotic lineages...» is too «bold» for me: even if the conclusion is correct, I am not sure one can generalize as suggested here from the case of one protein only.
Author response: Indeed, we can. Rephrased to clarify and emphasize this.
On page 15: «several other CASS gene families remain mysterious» is a mysterious sentence. I am not sure what this does really mean.
Author response: That there is no clue as to the possible functions of these proteins; modified to clarify.
On page 21: I miss the idea of the sentence starting by «In addition, and probably, more relevantly etc.» to «retroviral genomes». Could you rephrase it to explicit it a little bit more?
Author response: Rephrased – hopefully, to clarify.
On page 23: what is the criterion retained for homology between the plasmid genes, fragments of phages and the CRISPR sequences?
The following quote from the Methods addresses this issue:
"Nucleotide sequences of inter-CRISPR spacers were used as a query in MEGABLASTsearches (word size 11; e-value threshold 0.01) against GenBank; hits to virus or plasmid sequences and to distantly related prokaryotes were counted separately for each source organism."
On page 48: To me, the multiple positive correlations evoke multiple causalities and the possibility of some hidden correlations. Would you say that all the relevant combinations have been considered here?
No, we won't claim that. More complex multiple regression analysis would be required to separate correlations that reflect true causality; for the purposes of this paper, we felt it was sufficient to note the strongest correlations.
Reviewer's report 2
Patrick Forterre, Biologie Moléculaire du Gène chez les Extrêmophiles (BMGE) Institut de Génétique et Microbiologie (IGM), Université Paris-Sud, Centre d'Orsay, 91405 Orsay Cedex, France, and Biologie Moléculaire du Gène chez les Extrêmophiles (BMGE), Département de Microbiologie Fondamentale et Médicale, Institut Pasteur, Paris, France
In this very important paper, Makarova and coworkers propose a detailed mechanism for a putative procaryotic antiviral immunity system mediated by CRISPS sequences and their associated Cas proteins (the CAS system, CASS sensu the authors). Their model is based on the hypothesis that these elements represent a prokaryotic-specific antiviral mechanism analogous to the eukaryotic RNAi system. In procaryotes (Bacteria and Archaea) there is no homologs of the proteins involved in the eucaryotic RNAi system. Untill recently, it was therefore widely believed that restriction-modification mechanisms were the only defense available to procaryotes to fight viral infections. However, it has been proposed last year by several groups that procaryotic CASS could also play a significant role in fighting viral aggression in archaea and bacteria (Mojica et al. 2005, Pourcel et al., 2005, Bolotin et al., 2005). CRISPR sequences, which are transcribed but non-coding, are formed by the tandem repetition of units containing both a conserved element (similar all along a given CRISPR) and a variable element, the spacer, different from one unit to the other. The spacer sequences have strikingly no homologous sequence in databases, except for viral or plasmid sequences. Both Mojica et al. (2005) and Bolotin et al., (2005) have suggested that transcription of CRISPR sequences produce anti-sense RNA that can inhibit transcription of incoming viral (plasmid) sequences and Mojica et al. (2005) mentioned the analogy of such system with eukaryotic RNAi. However, these authors did not elaborate on the specific mechanism involved and how the cas proteins could be involved in the processing of viral RNA.
In this work Makarova and co-workers have first performed an updated analysis of cas proteins using genomic context analysis and sensitive methods (iteration approaches) to detect low level of similarity and to classify cas proteins in families and superfamilies. They were able to identify several new putative cas proteins and to define 25 superfamilies of cas proteins and 7 different types of CASS organization (named CASS1 to 7). They have also analyzed all available CRISPR repeated sequences and their putative secondary structures. More importantly, they try to predict the biological function of the cas proteins and their mechanism of action in the framework of the RNAi hypothesis. Previously, it has been suggested that cas proteins were involved in the formation and spreading of the CRISPR. For instance, Bolotin et al. Have predicted that cas proteins are acting at the DNA level by promoting cleavage, recombination and ligation. Makarova and al are the first to suggest that several cas proteins should instead interact at the RNA level, by promoting RNA degradation and RNA-RNA hybridization. They specifically suggest the existence of procaryotic homologs of eucaryotic dicer (helicase-nuclease) and splicer (nuclease). They also propose that a previously suspected DNA polymerase could be an RNA dependent RNA polymerase used to stabilize RNA/RNA hybrid by extending iRNA hybridized to their viral mRNA target. They also suggest the involvement of a reverse transcriptase in the formation of the linker sequences from viral (plasmidic) mRNA. In my opinion, all these proposals are reasonnable and very convaincing. Another prediction is that RAMP proteins recognize linker sequences of different sizes. This is supported by a correlation between the number of linker sequences and the number of RAMPs encoding genes (Fig. 6). In that case, it's not clear to me why this could not be due to the binding of RAMPs to the repeated units, since these units exhibit conserved sequences and their number (identical to the number of linkers) should be also correlated with the number of RAMPs.
Author response: That RAMPs discriminate, one way or another, between CRISPR inserts, is strongly suggested by the extreme sequence divergence of RAMPs which is hardly compatible with recognition of identical repeats. To be explicit about it, we added a clarification at the end of this section.
The search for specific secondary structure associated to the repeated units did not give convincing results and suggest for me that the dyad symmetry observed in many repeat units could be due to the binding of proteins with repeated structure (possibly the duplicated ferredoxin-like fold present in RAMPs) and not the formation of secondary structures in the transcribed repeats.
Author response: It is hard to see how one excludes the other: it stands to reason that CRISPR do form distinct secondary structures which bind to symmetrical proteins.
The model proposed (including possible variation) thus implies many predictions that could be experimentally tested. Surprisingly, to my knowledge, only one cas protein has been studied at the bench up to now (ref 70 in the manuscript). This protein turns out to have DNAse activity in vitro, but I suspect that the authors have not tested a possible RNAse activity. This is surprising because the importance of these proteins was already highlighted in 2002 by two in silico papers that in one case suggested their participation to a "mysterious DNA repair system and in the other described their association with CRISPR sequences. The present paper, with much more specfic predictions, should hopefully strongly stimulate biochemists and molecular biologists to jump onto this really exciting story. As noticed by the authors in their conclusion, if their hypothesis turned out to be correct, this prokaryotic RNAi system could be exploited to silence any gene in organisms that encode CASS. Furthermore, the experimental study of this system should help us to get new critical insights on the dynamic relationships between viruses and archaeal/bacterial populations in nature.
Finally, I would like to know if the authors have some idea about the origin of this CAS system. Why is it present in all archaeal genomes sequenced so far? Is it possible that this system originated in Archaea and was later on introduced in bacteria by LGT?
Author response: Given the horizontal mobility of CASS, we can only speculate on the point of its origin. We expand such speculation in the revised conclusion including the possibility of archaeal origin.
– In some case, the authors should be more cautious in their statement. For instance, when they talk about the pol-cassette, it might led some reader to believe that the polymerase actvity of the COG1353 protein has been experimentally validated, which is not the case.
Author response: We added a few more "predicted". However, we did not want to abandon the term 'pol-cassette' as it is descriptive and succinct.
Reviewer's report 3
Martijn Huynen, Nijmegen Center for Molecular Life Sciences University Medical Center St. Radboud p/a Center for Molecular and Biomolecular Informatics, Nijmegen, Netherlands
This paper provides a highly interesting and well documented hypothesis about a cluster of genes that E. Koonin and co-workers have discovered some time ago. By combining biological knowledge with bioinformatics methods and creative thinking the authors propose that Archaea and to a bit lesser extent Bacteria posses an RNA-interference-based immune system involving CRISPR and cas genes, that is analogous the eukaryotic RNA interference systems. Although aspects of this hypothesis have been published before, specifically with respect to CRISPR, this paper is, as far as I can tell, the first that makes the analogy between the cas genes and the RNA interference system. The idea that prokaryotic genomes would internalize pieces of foreign DNA in order to be able to defend themselves against it, thus having an immune system with a memory, would be an interesting example of Lamarckian evolution.
I do have some questions and editorial comments that I think should be addressed.
1) Do the authors have any idea why this system has the phylogenetic distribution that it does, being present in such a small genome as the nanoarchaeon, but not in e.g. the majority of Firmicutes
Author response: No mechanistic idea, unfortunate as this might be. We added some additional discussion of the ultimate origin of CASS (see the response to Patrick Forterre).
2) concerning the feasibilty of the system proposed by the authors: Is there anything known about how many fiendly DNAs a prokaryote encounters in daily life, and how does that compare to the number of different elements in a CRISPR ?
Author response: Not enough for this particular comparison. However, it is well known that phages are extremely abundant, much more so than bacteria or archaea, and in the revised manuscript, we refer to this more specifically, with the corresponding references.
3) Regarding the Lamarckian scheme: That the unique element of the CRISPR correspond to highly conserved, essential elements of phage genes suggests that selection on genetic variation also plays role here. So the scheme would be partly Lamarckian.
Author response: Probably, so. The way we state it in the text "CASS seems to come closest to a true Lamarckian mode of evolution among all known systems of heredity" is compatible with this view.
4) I am not so convinced by the argument on page 3 that the results imply that even among closely related prokaryotes the most commonly encountered phages are different. First of all, it is more a corollary of the hypothesis, but second, it could also reflect the high turnover of phages over time, rather than niche.
Author response: This is a very good idea, we now mention this possibility both in the Abstract and in the Discussion.
5) I am puzzled on the involvement of more or less randomly selected pieces of DNA from foreign DNA/RNA in exactly the same location in the secondary structure of the psiRNA (the top of the hairpin). Does this pattern occur more often?
Author response: The situation when the insert forms a stem of varying stability with parts of the repeats is common but not universal. The positions of the inserts are not exactly the same although they are, indeed, very similar, and the stems in which the inserts are involved are imperfect. Of course, the exciting possibility exists that the CRISPR inserts are specifically selected for their ability to base-pair with the repeats, however, we do not have enough data to make that claim.