- Open Access
Two novel PIWI families: roles in inter-genomic conflicts in bacteria and Mediator-dependent modulation of transcription in eukaryotes
Biology Directvolume 8, Article number: 13 (2013)
The PIWI module, found in the PIWI/AGO superfamily of proteins, is a critical component of several cellular pathways including germline maintenance, chromatin organization, regulation of splicing, RNA interference, and virus suppression. It binds a guide strand which helps it target complementary nucleic strands.
Here we report the discovery of two divergent, novel families of PIWI modules, the first such to be described since the initial discovery of the PIWI/AGO superfamily over a decade ago. Both families display conservation patterns consistent with the binding of oligonucleotide guide strands. The first family is bacterial in distribution and is typically encoded by a distinctive three-gene operon alongside genes for a restriction endonuclease fold enzyme and a helicase of the DinG family. The second family is found only in eukaryotes. It is the core conserved module of the Med13 protein, a subunit of the CDK8 subcomplex of the transcription regulatory Mediator complex.
Based on the presence of the DinG family helicase, which specifically acts on R-loops, we infer that the first family of PIWI modules is part of a novel RNA-dependent restriction system which could target invasive DNA from phages, plasmids or conjugative transposons. It is predicted to facilitate restriction of actively transcribed invading DNA by utilizing RNA guides. The PIWI family found in the eukaryotic Med13 proteins throws new light on the regulatory switch through which the CDK8 subcomplex modulates transcription at Mediator-bound promoters of highly transcribed genes. We propose that this involves recognition of small RNAs by the PIWI module in Med13 resulting in a conformational switch that propagates through the Mediator complex.
This article was reviewed by Sandor Pongor, Frank Eisenhaber and Balaji Santhanam.
The PIWI module, found in the PIWI/AGO superfamily of proteins, is a common functional denominator for a wide range of biological processes in eukaryotes. These include, but are not limited to, germline maintenance , post-transcriptional gene silencing/RNA interference (RNAi) , chromatin dynamics, regulation of transcription [3, 4], regulation of alternative splicing , DNA elimination in ciliates [6, 7] and suppression of viral infection . It acts by binding a double-stranded RNA duplex, typically consisting of a targeting RNA strand, referred to as the “guide strand”, and the targeted RNA strand complementary to the guide strand. Binding of the guide strand to the target strand results in either the silencing of specific RNA transcripts, as in the case of transposon silencing during germline maintenance [1, 7] and mRNA silencing during RNAi , or is thought to localize crucial factors for regulating processes like transcription  and alternative splicing . The PIWI module contains an RNase H fold domain with a conserved triad of residues required for nuclease activity that might participate both in processing the guide strand precursor as well as in cleaving target RNAs complementary to the guide strand [9–16]. On several independent occasions the PIWI module has lost the RNase H fold catalytic residues; these inactive versions are still capable of silencing activity by interfering with translation or facilitating degradation of guide strand-bound mRNAs by other nucleases .
While the PIWI/AGO superfamily was initially discovered in eukaryotes, orthologs were also identified in a wide range of prokaryotes spanning both the archaeal and bacterial superkingdoms [18, 19]. Despite extensive characterization of these proteins in eukaryotes, the roles of the prokaryotic PIWI (pPIWI) proteins and the nature of their potential double-stranded nucleotide targets have remained murky. Recent analysis detected association with genes encoding several distinct, predicted nucleases, and a general preference for pPIWI genes to be localized in genomic neighborhoods containing genes belonging to known phage-defense systems. This led to a proposal advocating a role for pPIWI proteins as components of novel prokaryotic systems involved in defense against invasive mobile elements . Earlier structural studies observed a tighter binding propensity for single-stranded DNA relative to single-stranded RNA guide strands in pPIWI proteins [21, 22]. They also found, in stark contrast to the eukaryotic PIWI protein, the favored double-stranded substrate for the pPIWI domains to be a DNA-RNA hybrid. These observations suggested that pPIWI proteins might act on DNA-RNA hybrids.
Given recent increase in available genome data, we surveyed the complete scope of eukaryotic and prokaryotic PIWI domains to gain a better understanding of their relationship. Here we report the discovery of two distinctive PIWI families resulting from this survey; the first novel PIWI families to be discovered in well over ten years. One of these is a previously unrecognized bacterial family predicted to be a key component of a RNA-dependent restriction system. The second family is found in the eukaryotic Med13 protein, one of four protein components of the repressive CDK8 subcomplex of the multi-subunit, transcription regulatory Mediator complex. Identification of a PIWI module in Med13 generates a new testable hypothesis regarding the transcription modulatory role of the CDK8 subcomplex.
Results and discussion
Discovery of two novel PIWI families
The PIWI module as presently defined in the Pfam database  consists of two distinct but functionally tightly coupled domains: an N-terminal three-layered α/β sandwich of the Rossmannoid type, with a four-stranded central β-sheet reminiscent of the TOPRIM domain and the β-sheet crossover occurring after the first β-strand  (see Figure 1A). This domain contributes crucial residues that bind the 5′ end of the small RNA guide strand [21, 22, 25–30]. The second domain is the core RNase H domain, which contributes additional, critical residues for guide strand-binding and when preserving the nuclease active site also cleaves the target strand. Prior structural studies on the PIWI module have labeled these two domains as the “MID” and “PIWI” domains, respectively [9, 31]; a convention we adopt henceforth.
We performed profile–profile comparisons using the HHpred program initiated with both single sequences and a HMM derived from a multiple alignment of complete PIWI modules as queries against the complete set of HMMs found in the Pfam and Interpro databases. Interestingly, we observed statistically significant relationships between the PIWI module and two distinct protein families defined by the models “domain of unknown function” DUF3893 and Med13_C (corresponding to a conserved region in the eukaryotic Mediator complex Med13 proteins) from the Pfam database. For instance, a search initiated with a pPIWI module from Mycobacterium sp. KMS (gi: 119855142) recovers the DUF3893 profile with p-value = 7×10-6; 94% probability and the Med13_C profile with p-value = 3.4×10-4; 90% probability. To further investigate this relationship, we systematically collected all proteins corresponding to the DUF3893 and Med13_C models using iterative PSI-BLAST searches. The DUF3893-containing proteins were sporadically distributed across a wide range of bacterial lineages including firmicutes, actinobacteria, α/β/γ-proteobacteria, cyanobacteria, and chloroflexi. The Med13 proteins are widely distributed across eukaryotes including most plants, fungi, animals, slime molds, and stramenopiles as well as basal eukaryotes such as the parabasalid Trichomonas vaginalis and the heterolobosean Naegleria gruberi (see Additional file 1). In certain lineages additional Med13 paralogs were identified, including those resulting from a duplication event that occurred early in vertebrates .
We then constructed multiple sequence alignments of the proteins matching these modules, used them to predict secondary structure, and checked for congruence with existing structures of PIWI modules to determine the precise boundaries of the MID and PIWI domains. This showed that the DUF3893 and Med13_C models currently present in Pfam imprecisely define the domain architectures and boundaries within these proteins, notably excluding regions from both the MID and PIWI domains. Accordingly, we emended the domain boundaries of the DUF3893 and Med13_C models to completely match the predicted structural elements of the two constituent domains (see Figure 1A). Reciprocal HHpred searches initiated with both single sequences and HMMs derived from the above alignments against a database of HMMs constructed from multiple alignments built using Protein Data Bank (PDB) chains as seeds confirmed relationships with the PIWI domain: an emended representative version of the module matching Pfam DUF3893 (gi: 228927677 from Bacillus thuringiensis) recovers the PIWI module from Archaeoglobus fulgidus, PDB: 2W42, p-value = 6.7×10-5, probability 90%). Iterative sequence searches with PSI-BLAST further confirmed this relationship: e.g. a search with an emended representative of the module matching Pfam DUF3893 (gi: 269125748 from Thermomonospora curvatae) recovers a classical pPIWI domain (gi: 295689105 from Caulobacter segnis with e-value = 9×10-15, iteration 4). Similarly, a representative of the emended Med13 module (gi: 393215315 from Fomitiporia mediterranea) recovers a classical pPIWI module from Pyrococcus furiosus in a HHpred search (PDB: 1U04, p-value = 2.1×10-4; probability 87%).
Characterization of the novel bacterial PIWI family
Structural and architectural features
The above-identified bacterial family which overlaps with the Pfam DUF3893 model displayed two unique, absolutely conserved residues: an arginine and a glutamate (see Figure 2A). Hence, we refer to this family as the pPIWI-RE family (prokaryotic PIWI with conserved R and E residues). Secondary structure predictions indicated that the pPIWI-RE family is distinguished from all previously known PIWI domains by the presence of an additional α-helical element following the initial three-stranded beta-meander characteristic of the RNase H fold (see Figures 1A, 2A). We mapped all strongly-conserved residues found in the pPIWI-RE family onto available structures of classical PIWI modules and compared those positions to those required for RNase activity or nucleic acid binding in the latter modules (see Figures 1B-C, 2A). This showed that the conserved residues in the PIWI and MID domains of the pPIWI-RE family corresponded well to the positions known to be critical for nucleic acid-binding in the cognate domains of classical PIWI modules (see Figures 1, 2A). In particular, the conserved positions in the MID domain were all clustered in the cleft that specifically binds the 5′ end of the guide strand. This suggests that, like classical PIWI domains , the pPIWI-RE is likely to recognize small guide strands by anchoring them via the 5′ end. The arginine and glutamate characteristic of the pPIWI-RE family mapped to the β-sheet extension, which is unique to the PIWI-like clade (PIWI and Endonuclease V) of the RNase H fold (see Figures 1A, 2A). We predict that these two residues form a salt bridge across this β-sheet, which probably stabilizes its tertiary structure, and maintains a conformation specific to this family that is required to recognize the guide strand. The RNase catalytic residues are retained only in a subset of the pPIWI-RE family, suggesting that similar to the classical PIWI family they include both active and inactive versions.
The classical PIWI modules are typically fused to several N-terminal RNA-binding domains. In eukaryotic PIWI proteins, in order from the N-terminus, these include the so-called “N-term” domain implicated in unwinding of the double-stranded guide and passenger strands and also guide-target duplexes  and the single-stranded RNA-binding PAZ domain which interacts with 3′ ends of guide strands. Certain classical PIWI family proteins from kinetoplastids show an OB fold domain instead of the “N-term” domain. Previously studied prokaryotic PIWI proteins display a distinct architecture: in lieu of a PAZ domain they feature the so-called APAZ (Analogous to PAZ) domain suggesting analogous functions for the two domains . Additionally, few pPIWI domains may contain extreme N-terminal fusions to predicted Sir2-domains . The large N-terminal region of the pPIWI-RE family contains a distinct, conserved globular domain that partly overlaps with the Pfam DUF3962 model. Secondary structure predictions indicate that it is likely to adopt a β-strand-rich fold. It neither showed strong congruence with the secondary structural elements of the PAZ or APAZ domain nor did it display the well-conserved sequence motifs characteristic of the PAZ or APAZ domains (see Additional file 1). Furthermore, profile-profile searches did not point to any relationship between the N-terminal region of the pPIWI-RE family and these domains. Hence, this N-terminal region is likely to contain at least one distinct globular domain, which might nevertheless function analogously to the N-terminal domains in the classical PIWI proteins in mediating additional nucleic acid contacts (see Figure 2B).
Contextual associations of the pPIWI-RE module
Given the value of contextual information in gleaning insight into the functions of genes [35, 36], we systematically collected conserved gene neighborhoods and domain fusions for the pPIWI-RE domains. Consequently, we observed two distinct genomic contexts for the pPIWI-RE genes with mutually exclusive phyletic patterns (see Figure 2B): (1) occurrence as a standalone gene (restricted to several Bacillus species, proteobacteria Magnetospirillum gryphiswaldense, Pseudomonas putida and Azotobacter vinelandii, and actinobacteria from the genera Streptomyces and Thermomonospora; Additional file 1). On rare occasions, this version of the pPIWI-RE module might occur fused to an N-terminal Zincin-like metallopeptidase domain. (2) Occurrence as part of a widely distributed three-gene neighborhood. Of the two genes that co-occur with the pPIWI-RE gene we found the first to encode a protein with a conserved restriction endonuclease (REase) fold domain by using profile-profile comparisons with the HHpred program (probability 94% using gi: 158336201 from Acaryochloris as a query). These proteins also contain a helical domain with a conserved arginine and Zinc ribbon (ZnR) domain at the N-terminus of the REase domain (see Figure 2B). Moreover, on at least four different occasions these proteins have also acquired further N-terminal HTH domains belonging to the LexA, TetR, MerR and a previously uncharacterized clade  (see Figure 2B). The second gene codes for a Superfamily II (SF-II) DNA helicase. Within SF-II it can be confidently assigned to the DinG-like clade on the basis of two unique structural features that typify them: namely, an iron-binding cysteine-rich region found after strand-2 of the helicase domain [38, 39] and a large helical region inserted between conserved helix-4 and strand-5 which precede the C-terminal P-loop NTPase fold repeat unit characteristic of helicases [40, 41]. The former domain apparently acts as an intracellular sensor of redox potential to regulate activity of the DinG helicase domains . The gene order within this triad is strictly conserved with the REase gene coming first followed by the DinG SF-II helicase and pPIWI-RE genes (see Figure 2B and Additional file 1). Furthermore, the three genes have either overlapping or very closely spaced termini suggesting that they are transcribed as a single polycistronic message.
Functional implications of pPIWI-RE coding systems: A novel RNA-dependent restriction system
The widespread but patchy distribution of the above-described pPIWI-RE containing gene-triads across numerous phylogenetically distant bacteria (Additional file 1) is consistent with this system being disseminated by horizontal gene transfer (HGT). This pattern is reminiscent of bacteriophage restriction systems that confer a selective advantage on recipients due to their role in countering bacteriophage infections . The presence of a gene coding for an REase protein without an associated methylase gene in the pPIWI-RE containing gene-triads is reminiscent of restriction systems such as the Mcr systems that target modified invading DNA . The fact that the REase gene is always the first gene in the operon implies that it would be made before any of the other products and be available to cleave DNA. Hence, like the REases from the Mcr systems, it should have some means of specifically targeting non-self DNA rather than suicidally cleaving the cellular genome upon production. DinG serves as a helicase partner for multiple nuclease domains such as the RNase T-like and RNase D-like nuclease domains (both of which belong the RNAse H fold) [45–47]. Hence, it could function as a helicase partner for either the REase or pPIWI-RE or both. Given that these gene triads are parallel to type I and type III restriction-modification (R-M) systems in that they combine REase with helicase genes [48, 49], it is conceivable that the DinG helicase plays a role comparable to the helicases that translocate the target DNA in those R-M systems. However, recent studies on DinG-like helicases, which show that it acts on RNA-DNA duplexes in vitro and R-loops (bubble-like structures forming via displacement of one strand of a DNA double helix by a complementary RNA strand ) in vivo, point to further functional complexities. DinG-like helicases are specifically involved in unwinding of R-loops during replication across active transcriptional units . Interestingly, DinG-like helicases have also been found to be components of Type-U CRISPR/Cas systems , supporting their action in the context of DNA-RNA hybrid duplexes.
Taken together, these observations allow us to propose a model that can account for the most likely activities of all three products of these gene triads (see Figure 3A). On the basis of the DinG helicase we posit that the initiating signal recognized by these systems is likely to be a DNA-RNA hybrid structure. These are known to primarily form during transcription and replication of phages  or plasmids [55, 56] and relatively infrequently during transcription of the endogenous genome . Therefore, specifically targeting these structures could provide an effective means of restricting transcriptionally active and replicating invasive genomes and their transcripts. In this system the pPIWI-RE module is likely to be deployed as a sensor for the DNA-RNA hybrid, in a manner comparable to the classical pPIWI domain for which there is accumulating evidence for preferential binding to DNA/RNA hybrids [20, 22, 29]. The catalytically active pPIWI-RE modules might additionally cleave the RNA strand of such hybrid duplexes. Recognition of the DNA-RNA hybrid by the pPIWI-RE module is likely to recruit the DinG helicase for the unwinding and/or the translocation of R-loops, which could further provide a suitable dsDNA substrate for cleavage by the REase domain. Importantly, this hypothesis of DNA-RNA hybrid-directed restriction can explain why the REase protein, which is the first to be transcribed and translated, is unlikely to act on self DNA upon its production. The diverse HTH domains, which are occasionally fused to the N-termini of the REase proteins, could either function as autoregulators of transcription of the gene triad or in providing sequence specificity during restriction.
In the case of pPIWI-RE genes occurring independently of the above-described three gene restriction system we found no evidence for the presence of related REase or DinG genes in the same genomes. A simple interpretation would be that these pPIWI-RE modules function similarly to the aforementioned versions, but instead of recruiting restriction machinery they function by themselves. It is possible in these cases they modulate gene expression by cleaving transcripts, physically interfering with transcription (an echo of the action of eukaryotic PIWI proteins), or blocking the release of transcripts from the template DNA [3, 57].
The PIWI module in eukaryotic Med13
Structural and architectural features of the MedPIWI module
Given the presence of this PIWI module in the Med13 subunit of the Mediator complex, we hereafter refer to it as the MedPIWI module. An inspection of the multiple sequence alignment of the novel eukaryotic family revealed extensive conservation at the positions crucial for nucleic acid-binding in the classical PIWI module including residues interacting with the 5′ end of the guide strand in the MID domain (see Figures 1, 4A). However, this family shows certain distinctive features: 1) absence of the first catalytic aspartate/glutamate found near the C-terminus of strand 1 of the RNase H fold’s core β-sheet. 2) The second conserved residue of the catalytic triad, located at the C-terminus of strand-4 of the RNase H fold, is absent with no identifiable compensatory residues. 3) Another charged residue contributing directly to the active site from the C-terminal segment of the final helix of the RNase H fold is also absent (see Figure 4A). 4) Its RNase H fold shows a reasonably well-conserved aspartate in the loop between strand-1 and strand-2, which is suitably positioned to contact the bound nucleic acid, based on comparisons to classical PIWI domains . 5) The MedPIWI RNase H fold also shows a near-absolutely conserved aspartate at the C-terminus of strand 2 (see Figure 4A) that is unlikely to have any role in nucleic acid substrate recognition. Taken together, these observations suggest that none of the MedPIWI modules might be catalytically active. However, they are likely to bind double-stranded nucleic acid substrates, just as the classical PIWI modules.
The MedPIWI modules are distinguished from all other PIWI modules by the presence of extensive disordered regions, often occurring as lineage-specific inserts within both the MID and PIWI domains and also in between the two (indicated by numbers in Figure 4A). This family is also distinguished by a small domain consisting of a predicted beta-hairpin followed by a single alpha-helix located immediately N-terminal to the MID domain and might be compared to the small “linker” domains observed in classic PIWI families . Beyond this domain is the Med13-N module corresponding to the Pfam model Med13_N (see Figure 4B). The conserved core of this region is predicted to adopt an α + β structure with a prominent stretch of 6–7 contiguous β strands which could adopt a barrel or sandwich-like fold (Additional file 1). This module is present in all eukaryotic Med13s except those from Entamoebidae, where it appears to have been displaced or has degenerated. Thus, the Med13-N module was likely associated with the MedPIWI even in the stem eukaryotes, and is comparable in its location, though not necessarily in function, to the N-terminal domains, such as PAZ, APAZ and that found in the pPIWI-RE family (see above). Some additional lineage-specific globular domains might be present along with an extensive disordered region in the linker connecting the Med13-N module to the rest of the protein. These include a potential Zn-binding domain with two CxC motifs (where “C” is a cysteine residue and “x” is any residue) in animals and other unrelated modules in plants and fungi (see Figure 4B, Additional file 1). The size and frequency of the lineage-specific inserts and disordered regions roughly corresponds to the total number of units comprising the Mediator complex in a given lineage . Thus, they might represent secondary adaptations for increased inter-subunit contacts within the Mediator complex.
Partners and physical interactions of Med13: functional implications for the MedPIWI module in eukaryotic transcription regulation
The Mediator complex, along with several basal or general transcription factors, is part of the Preinitiation Complex (PIC), which is needed for transcription at promoters of genes transcribed by RNA polymerase II (pol II) in eukaryotes [59, 60]. The Mediator complex has two basic forms (see Figure 3B): 1) the core Mediator complex, which is a strong transcriptional coactivator  and occupies promoters across the genome [62, 63] and 2) the Mediator-CDK8 complex, which usually has a negative regulatory role and while found to transiently associate across all promoters, associates strongly with only a subset of genes that typically show higher expression levels [62–66]. The latter complex is characterized by the addition of a four subunit subcomplex, CDK8, which, in addition to the MedPIWI-containing Med13, also contains Med12, cyclin C, and the CDK8 kinase. Negative regulation by the CDK8 subcomplex appears to utilize multiple independent, but apparently synergistic, actions of its distinct subunits (see Figure 3B). The cyclin/kinase pair of the subcomplex phosphorylates the pol II C-terminal tail disrupting the association between pol II and the core Mediator complex . It might also phosphorylate cyclin H in the TFIIH complex and inhibit activation of translation transcription by the latter complex . However, previous studies have shown that negative regulation of transcription by the CDK8 subcomplex also occurs independently of the CDK8 kinase activity: the interaction between the CDK8 subcomplex and the core Mediator acts as a modulatory “switch” that allosterically affects the core Mediator-pol II interaction [69, 70] and determines the shift between transient and stable CDK8 subcomplex promoter occupancy. This switch is believed to be dependent on Med12 and Med13 [70, 71], although the exact mode of their action remains murky. In this regard, recent studies utilizing an in vitro chromatin-based transcriptional system demonstrated that Med13 is critical for physically linking the CDK8 subcomplex to the core Mediator complex and is specifically required to repress previously activated promoters by barring re-association of a pol II enzyme with the PIC .
Given these studies our discovery of a PIWI module in Med13 provides a previously unexplored vista to investigate the mechanism of transcriptional modulation by the CDK8 subcomplex (see Figure 3B). As the MedPIWI module displays the conserved features related to binding double stranded substrates (see above, Figures 1B-C, 4A), we posit that this activity is central to the molecular switch that modulates the core Mediator-pol II interactions. We predict two plausible candidates for the substrate oligonucleotide bound by the MedPIWI modules that are consistent with published laboratory studies: 1) it is conceivable that the MedPIWI module retained the ancestral ability to bind DNA-RNA hybrid duplexes, a feature that the ancestral eukaryotic PIWI modules would have presumably possessed when they were acquired from the prokaryotic progenitors. DNA-small RNA hybrids could form close to the transcription start site (TSS) from the small RNA byproducts of polymerase stalling or backtracking [72, 73]. Indeed, such small transcripts have been detected (commonly referred to as TSSa  or tiRNA transcripts ) in several global deep-sequencing datasets across a range of animal species  and even in association with classical PIWI domains . These could either re-associate with DNA opened as part of the transcriptional bubble formed during re-initiation events or remain associated with open DNA in the wake of repeated pol II passages. This proposal has the attractive feature of explaining the preferential association of Med13 with highly transcribed genes [62–66, 70] because such genes are known to be enriched in small TSS-associated transcripts , thereby increasing the chances of formation of DNA-RNA hybrids substrates for the MedPIWI module. The observation that the CDK8 subcomplex association occurs only after initiation of at least a single round of transcription by pol II following PIC assembly  also suggests that its association might require the availability of previously-transcribed RNA byproducts. Another potential source for small RNAs that could form DNA-RNA hybrids is the small processed antisense transcripts that have been found to be associated with the promoter sites of transcriptionally active genes . 2) Alternatively, like most characterized eukaryotic PIWI modules, the MedPIWI module might bind dsRNA substrates. In this case its action can be compared to the classical eukaryotic PIWI protein AGO2, which has been shown to regulate the positioning of pol II while binding sense-antisense RNA duplexes derived from transcriptionally active genes . Interestingly, these antisense small RNA-AGO2 complexes increase in abundance concomitant with transcriptional activation upon stimuli such as heat shock . It is possible that the MedPIWI module acts in a comparable manner to associate with such promoter-derived small RNAs that could form dsRNA duplexes during active transcription.
In conclusion, we hypothesize that the modulatory switch mediated by the CDK8 subcomplex probably depends on the ability of the MedPIWI module to recognize small transcripts associated with active promoters that form either DNA-RNA or dsRNA duplexes. This binding induces a conformational change that propagates through the rest of the complex to allosterically impact the interaction of the Mediator with pol II. Binding of duplexes by the MedPIWI module might also influence the deployment of the additional layers of control that depend on the CDK8 subcomplex, such as the activity of the CDK8 kinase [67, 68] and Med12-mediated histone H3K9 SET domain methyltransferase (G9a) recruitment  (see Figure 3B). Intriguingly, in a small number of cases, association of the CDK8 subcomplex with the core Mediator results in Med13- and Med12- dependent transcriptional activation rather than repression [78, 79]. While this manuscript was under review, a study was published demonstrating the role of enhancer-associated long non-coding RNAs (lncRNAs) in facilitating this process of activation of transcription by the CDK8 subcomplex along with the core Mediator  (see Figure 3B). It was demonstrated that in animals these activating lncRNAs interact with the Med12 subunit of the CDK8 complex and cause it to catalyze Histone H3 serine 10 phosphorylation rather than the above-mentioned negative regulatory phosphorylations of Cyclin H and the RNA polymerase C-terminal tail. H3S10 phosphorylation has a positive regulatory role probably by inhibiting the repressive H3K9 methylation among other actions. We suspect that interaction with these enhancer-derived lncRNAs is unlikely to be the primary function of the MedPIWI module because it is conserved across eukaryotes and appears to be required for actions of the CDK8 complex beyond activated transcription. However, we cannot rule out that the lncRNA might interact with processed small RNAs to form duplexes that might be recognized by the MedPIWI module to regulate transcription in certain conditions.
The new PIWI families reported here also offer an opportunity to reassess the natural history of the PIWI/AGO superfamily. The pPIWI-RE family shows a relatively smaller spread across the prokaryotic tree (see Additional file 1) compared to the classical pPIWI proteins . Hence, it is possible that pPIWI-RE descended from an RNase-active classical pPIWI module in bacteria and was subsequently dispersed to diverse lineages via HGT. The multiple independent losses of the RNase H fold catalytic residues in the pPIWI-RE family are comparable to the classical PIWI modules . Thus, not just active processing of RNA, but also non-catalytic binding of duplexes containing RNA appears to have been widely used across the PIWI/AGO superfamily. Indeed, this function appears to have been the dominant theme in the case of the MedPIWI family. The phyletic patterns of Med13 are closely correlated with the three other subunits of the CDK8 complex. They are present in several basal eukaryotes and are widespread across the eukaryotic tree strongly supporting the presence of a complete CDK8 complex in the last eukaryotic common ancestor (LECA). Thus, the CDK8 subcomplex and an ancestral version of the core Mediator complex appear to have been in place by the LECA, suggesting that antagonistic regulatory interactions of these complexes was a feature of transcription regulation in the common ancestor of extant eukaryotes.
Earlier studies had indicated that at least one member of the classical PIWI family was already present in the LECA . Prior to LECA, in the eukaryotic stem lineage, this PIWI protein appears to have undergone a duplication giving rise to a version with a dedicated role in transcription regulation and a second version primarily involved as a standalone protein in diverse processes involving small non-coding RNAs. The former version appears to have functionally associated with the other emerging subunits of the CDK8 complex with a corresponding rapid divergence in sequence. At least in the latter version there appears to have been a specificity shift towards dsRNA from the likely ancestral pPIWI preference for binding DNA/RNA hybrid duplexes [20, 22, 29]. The classical PIWI family is also widely conserved across archaea , suggesting that the stem eukaryotes could have possibly inherited the ancestral PIWI protein directly from their archaeal progenitor. Given the functional connections now known or inferred across the PIWI/AGO superfamily (each of the two families discussed here and the classical PIWI proteins) to regulation of transcription, it is conceivable that even in archaea (and possibly other prokaryotes) PIWI proteins function in transcription regulation, beyond the proposed role in defense against genomic parasites. If this were the case, then the two primary eukaryotic versions merely reflect partitioning of the ancestral roles into distinct proteins. Thus, our identification of a novel eukaryotic PIWI family could also have implications for the functions of the prokaryotic PIWI domains.
The two novel families of PIWI modules described here are the first such discoveries since the initial characterization of the PIWI/Argonaute family in eukaryotes and their close prokaryotic counterparts over a decade ago [18, 82, 83]. While considerably divergent from these earlier-characterized versions, both families are predicted to bind double-stranded substrates based on the strong conservation of residues at positions corresponding to nucleic acid binding sites in the classical PIWI modules in both of the novel families (see Figures 1, 2, and 4). Moreover, their predicted functions fit within the spectrum of previously observed functional roles for different members of the PIWI superfamily. Thus, despite the considerable divergence from the classical PIWI family at the sequence level the new families appear to have maintained the characteristic ability of this clade of RNase H fold proteins to operate on RNA-containing duplexes. Nevertheless, the predicted functions of the two newly described families present some previously unobserved features. The pPIWI-RE family offers the first example for a potential RNA-dependent restriction system in prokaryotes that is distinct from the previously characterized CRISPR/Cas-type systems . In particular it presents some parallels to the Type-II CRISPR/Cas systems which combine a RNase H fold nuclease with a HNH endoDNase that is also found in several restriction systems . Thus, it emerges as the first clear example of a PIWI family member directing and coordinating a DNA- and RNA- based defensive response against genomic parasites in bacteria. This system could potentially be developed as a reagent to cleave target DNA using a RNA guide. Our prediction implicating the MedPIWI family in recognition of RNA-containing duplexes offers an entirely new mechanism for the action of the CDK8 subcomplex both in terms of the modulation of transcription at the promoters of highly expressed genes and providing the first delineation of the criterion underlying the transition from transient CDK8 subcomplex co-occupancy at sites of core Mediator occupancy to sustained CDK8 subcomplex association resulting in repressive activity  (see Figure 3B). This research also further fuels the broader emerging theme implicating ncRNAs in modulation of transcription at sites of initiation [3, 80]. This hypothesis could be investigated via a combination of ChIP-seq experiments on CDK8 subcomplex members and MedPIWI module immunoprecipitation-sequencing.
Iterative profile searches with the PSI-BLAST  and JACKHMMER  programs were used to retrieve homologous sequences in the protein non-redundant (NR) database at the National Center for Biotechnology Information (NCBI). For most searches a cut-off e-value of 0.01 was used to assess significance. In each iteration, the newly detected sequences that had e-values lower than the cut-off were examined for conserved motifs to detect potential homologs in the twilight zone. Similarity-based clustering was performed using the BLASTCLUST program (http://ftp.ncbi.nih.gov/blast/documents/blastclust.html) to cluster sequences at different thresholds. Multiple sequence alignments were built using the Kalign  and MUSCLE  programs, followed by manual adjustments based on profile–profile alignment, secondary structure prediction and structural alignments. Consensus secondary structures were predicted using the JPred program . Remote sequence similarity searches were performed using profile-profile comparisons with the HHpred program . Gene neighborhoods were extracted and analyzed using a custom PERL script that operates on the Genbank genome or whole genome shotgun files. The protein sequences of all neighbors were clustered using the BLASTCLUST program to identify related sequences in gene neighborhoods. Each cluster of homologous proteins was then assigned an annotation based on the domain architecture or shared conserved domain. A complete list of Genbank gene identifiers for proteins investigated in this study is provided in the Additional file 1. Structure similarity searches were conducted using the DALIlite program  and structural alignments were generated by means of the MUSTANG program .
Reviewer 1: prof. Sandor Pongor, International Centre for Genetic Engineering and biotechnology (ICGEB), Italy
The PIWI domain plays a role in dsRNA guided hydrolysis of ssRNA in a variety of cellular pathways involved in binding and cleaving of RNA. Ever since its discovery in the PIWI/ARGO superfamily, the PIWI module is being identified in a growing number of cellular pathways such as RNA interference, chromatin organization, germline maintenance, and was found to bind different classes of small noncoding RNAs that guide Argonaute proteins to their targets. Based on profile-profile comparisons, Burroughs and coworkers describe two new subfamilies of PIWI, both showing a residue conservation pattern characteristic of guide-strand binding but not those of catalytic activity. One of the subfamilies, PIWI-RE is found in bacteria, and the conservation is supported by similar chromosomal contexts which leads the authors to suggest that it plays a part in a novel RNA-dependent restriction system. The other subfamily, MedPIWI is found in the Med13 subunit of the Mediator complex in eukaryotes. MedPIWI shows distinctive residue conservation patterns that indicate an involvement in ds nucleic acid binding but no catalytic activity. The authors hypothesize that MedPIWI’s role may be an ssRNA-mediated activation of the conformational switch through which the CDK8 subcomplex modulates transcription at Mediator-bound promoters. Both subfamilies are widely distributed, PIWI-RE is found in firmicutes, actinobacteria, α/β/γ-proteobacteria, cyanobacteria, and chloroflexi. MedPIWI is found in plants, fungi, animals, slime molds, and stramenopiles as well as basal eukaryotes.
I find the analysis straightforward and highly convincing, and the conclusions, even though daring and imaginative, are well within the expected limits of scientific interpretation. The structure of the manuscript is logical, even though the description of two subfamilies within one article may somewhat divide the attention of the reader. In conclusion, I recommend publication of this manuscript without further changes.
Authors’ response: We appreciate the positive evaluation of our work. While the two disparate functional themes might indeed divide the reader’s attention, we sought to present it as one article to due to the common theme provided by the previously known functional features of the PIWI superfamily itself.
Reviewer 2: Dr. Frank Eisenhaber, Bioinformatic Institute, Singapore
This work is a pretty nice continuation of the series of articles by Aravind et al. gene function hypotheses/discoveries are presented in the meticulous combination of sequence-analytic findings and hints from the experimental biological literature. Starting with the serendipitous observation of two PFAM domains with unknown functions showing some HHpred-derived similarity to the PIWI/AGO model, the authors show that two divergent subfamilies of PIWI/AGO in the bacterial world and among eukaryotes do exist. Lots of additional information with regard to 3D structural details, binding properties, domain evolution, etc. is derived with the classical sequence-analytic procedures and many of these conclusions can be validated experimentally.
Given the very nicely written main text, the summary reads like an unloved extra, apparently composed after the authors were tired from putting together text and figures. I suggest to go carefully through the text and complement the summary with all the detailed conclusions about the two new subfamilies.
Author’s response: We appreciate the positive evaluation of the work presented in this article. We have now revised the summary to better incorporate more of the conclusions reached in the text. Moreover, at the behest of a similar suggestion by Reviewer #3, we have added a figure that provides a one-stop pictorial summary for the predicted functional roles of the two families.
Further, the authors mention some “KM” who analyzed the data (in “Authors” contributions”); yet, this person is not listed among the authors.
Authors’ response: We have removed this inadvertently included initial from the contributions list.
Reviewer #3: Dr Santhanam Balaji, MRC Laboratory of Molecular Biology, United Kingdom
Burroughs et al. report computational discovery of two novel families belonging to PIWI modules, first family (pPIWI) is sporadic in phyletic distribution and restricted to bacterial superkingdom, while second one (MedPIWI) is found only in eukaryotes. pPIWI is prominently encoded by operon that also contain genes that encode restriction endonuclease-like enzyme and a DinG helicase. Based on these observations, the authors propose that pPIWI is likely to act on genomic parasites such as invasive phages and selfish replication elements. MedPIWI which is also found as core conserved module of Med13, part of CDK8 subcomplex. CDK8 subcomplex is a known negative regulator of transcription. Identification of PIWI family in it suggests possible mode of action of CDK8 by targeting small RNAs in the vicinity of mediator complex bound promoters of highly transcribed genes. Hence, the discovery of MedPIWI sheds light on mechanism of transcription modulation mediated by CDK8 subcomplex. There are also detailed mechanistic models proposed by the authors for each of the two families. This work reports important discoveries that have potential wide implications from genomic conflicts in bacterial systems to transcription in eukaryotes. Discovery of PIWI family in Med13 is particularly interesting, this probably triggers wider intriguing question: are there many more (yet to be identified) RNA binding family hidden in Mediator complex subunits or associated proteins? I fully support the publication of this manuscript in Biology Direct.
Authors’ response: We appreciate the positive evaluation and detailed review of the work presented here. It is increasingly becoming clear that ncRNA binding plays a role in Mediator function. In light of this it is quite possible more RNA-interacting domains will be identified in the coming years as new structural studies on Mediator are published. However, given that the majority of Mediator complex components are rife with regions of low complexity sequence at this point other obvious RNA-binding domains remain difficult to detect. The Med8C/18/20 submodule has been shown to contain a version of the CYTH domain (LM Iyer, L Aravind BMC genomics 3 (1), 33; PMID: 12456267) which is also found in the mRNA triphosphatase. Whether these CYTH domains might have a role in RNA interaction remains unclear.
My specific points/comments:
Did authors find any more detail (in terms of functions or interactions) about the α-helical element following the three-stranded β-meander of the RNAse H fold in pPIWI-RE?
Authors’ response: This is certainly an interesting feature of the pPIWI-RE family: mapping of this helical elemenet to existing structures reveals that is could be positioned reasonably close to the nucleotide binding/catalytic active site. At the same time, it lacks any strongly conserved residues outside of a well-conserved tryptophan residue immediately N-terminal to the helix; hence, we refrain from any detailed functional speculation.
The observation of Zincin-like metalloprotease fused with pPIWI-RE is interesting although not in many instances. Is it possible that in these cases the metalloprotease domain could directly aid pPIWI-RE to target RNAs that are securely logged in ribonucleoprotein complexes?
Authors’ response: Given the infrequency of the fusion we are not sure if it is a gene annotation artifact of some kind; hence, we have not speculated in the manuscript on any concrete functional role. If this fusion were to be recovered in the future more genome sequences a role as suggested by the referee is not impossible.
It appears a bit ironic that the CDK8 subcomplex is a negative regulator of transcription but is found at mediator complex bound promoters that correspond to highly transcribed genes. Does this mean CDK8 has a direct role through MedPIWI in determining overall level of the transcripts emerging from these regions?
Author’s response: Yes, this is generally the idea we hoped to express which is consistent with the prevailing view of the CDK8 subcomplex as more of a negative modulator of transcription and not an absolute repressor of transcription. We have updated several areas of the text to try and clarify our position on this point.
There is genome wide binding data for CDK8 PMID: 16630888, it is may be useful to look at the data to propose some broader functional context for MedPIWI.
Author’s response: We have examined the ChIP-chip data from Saccharomyces cerevisiae and compared it with promoter-mapping publicly-available small RNA data set in yeast. While we observe some interesting differences in the small RNA content which maps to promoter regions occupied by different components of the Mediator complex, at this point we are unable to conclusively identify any trends that might inform the relationship between the CDK8 subcomplex and small RNA derived from promoter regions. Several issues constrain the efficacy of this analysis, chief among these is 1) the multiple levels of regulation which appear to contribute to the decision of the loaded Mediator complex to move between RNA polII active and inactive states, many of which could influence small RNA content at the promoter (see Figure 3B) and 2) the medPIWI “switch” between activity and inactivity is likely to be subtle: instead of the binary presence/absence of small RNA at a promoter it is likely to be the presence of “enough” small RNA which triggers the switch. Additionally, the required concentration of small RNA could depend on several promoter-specific contextual factors including genome sequence, local DNA structure, or presence/absence of ancillary protein domains. Some of the following additional issues could bring clarity to such an analysis: 1) ChIP-chip does not identify the precise location of the binding of Mediator components on the genome sequence, to gauge the location of the Mediator complex (and thus the sites from which potential small RNA are generated) ChIP-seq experiments would be of considerable value. 2) Existing data extracts RNA for ChIP and small RNA-seq under different growth conditions and different time points; uniformity in such conditions would remove considerable noise. 3) Recent advances in sequencing technology would yield a deeper small RNA data set that what is currently available. This is particularly important given that the absolute number of small RNAs derived from any single promoter region tends to be quite low, particularly in relation to other classes of small RNA.
Potential molecular mechanism models of pPIWI in the section “Functional implications of pPIWI-RE coding systems: A novel RNA-dependent restriction system” and the information in the last two paragraphs of “Partners and physical interactions of Med13: functional implications for the MedPIWI module in eukaryotic transcription regulation” could be synthesized in to schematic figures and this I believe would help the reading very much.
Authors’ response: We have added a figure (Figure 3) summarizing the implications of these findings.
In the “introduction” section there seems to be abrupt transition from last paragraph in the first page to first paragraph in the next page just above “results and discussion” i.e. from background information on pPIWI to reporting novel family of PIWI modules.
Authors’ response: We have added a few additional lines to the introduction in an attempt to smoothen the transition.
Cox DN, Chao A, Baker J, Chang L, Qiao D, Lin H: A novel class of evolutionarily conserved genes defined by piwi are essential for stem cell self-renewal. Genes Dev. 1998, 12: 3715-3727. 10.1101/gad.12.23.3715.
Murchison EP, Hannon GJ: miRNAs on the move: miRNA biogenesis and the RNAi machinery. Curr Opinion Cell Biol. 2004, 16: 223-229. 10.1016/j.ceb.2004.04.003.
Cernilogar FM, Onorati MC, Kothe GO, Burroughs AM, Parsi KM, Breiling A, Lo Sardo F, Saxena A, Miyoshi K, Siomi H, et al: Chromatin-associated RNA interference components contribute to transcriptional regulation in Drosophila. Nature. 2011, 480: 391-395. 10.1038/nature10492.
Halic M, Moazed D: Dicer-independent primal RNAs trigger RNAi and heterochromatin formation. Cell. 2010, 140: 504-516. 10.1016/j.cell.2010.01.019.
Ameyar-Zazoua M, Rachez C, Souidi M, Robin P, Fritsch L, Young R, Morozova N, Fenouil R, Descostes N, Andrau JC, et al: Argonaute proteins couple chromatin silencing to alternative splicing. Nat Struct Mol Biol. 2012, 19: 998-1004. 10.1038/nsmb.2373.
Mochizuki K: RNA-directed epigenetic regulation of DNA rearrangements. Essays Biochem. 2010, 48: 89-100. 10.1042/bse0480089.
Chalker DL, Yao MC: DNA elimination in ciliates: transposon domestication and genome surveillance. Ann Rev Gen. 2011, 45: 227-246. 10.1146/annurev-genet-110410-132432.
Aliyari R, Ding SW: RNA-based viral immunity initiated by the Dicer family of host immune receptors. Immunol Rev. 2009, 227: 176-188. 10.1111/j.1600-065X.2008.00722.x.
Song JJ, Smith SK, Hannon GJ, Joshua-Tor L: Crystal structure of Argonaute and its implications for RISC slicer activity. Science. 2004, 305: 1434-1437. 10.1126/science.1102514.
Rand TA, Petersen S, Du F, Wang X: Argonaute2 cleaves the anti-guide strand of siRNA during RISC activation. Cell. 2005, 123: 621-629. 10.1016/j.cell.2005.10.020.
Matranga C, Tomari Y, Shin C, Bartel DP, Zamore PD: Passenger-strand cleavage facilitates assembly of siRNA into Ago2-containing RNAi enzyme complexes. Cell. 2005, 123: 607-620. 10.1016/j.cell.2005.08.044.
Miyoshi K, Tsukumo H, Nagami T, Siomi H, Siomi MC: Slicer function of drosophila argonautes and its involvement in RISC formation. Genes Dev. 2005, 19: 2837-2848. 10.1101/gad.1370605.
Leuschner PJ, Ameres SL, Kueng S, Martinez J: Cleavage of the siRNA passenger strand during RISC assembly in human cells. EMBO Rep. 2006, 7: 314-320. 10.1038/sj.embor.7400637.
Brennecke J, Aravin AA, Stark A, Dus M, Kellis M, Sachidanandam R, Hannon GJ: Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell. 2007, 128: 1089-1103. 10.1016/j.cell.2007.01.043.
Gunawardane LS, Saito K, Nishida KM, Miyoshi K, Kawamura Y, Nagami T, Siomi H, Siomi MC: A slicer-mediated mechanism for repeat-associated siRNA 5′ end formation in Drosophila. Science. 2007, 315: 1587-1590. 10.1126/science.1140494.
Yang JS, Lai EC: Dicer-independent, Ago2-mediated microRNA biogenesis in vertebrates. Cell Cycle. 2010, 9: 4455-4460. 10.4161/cc.9.22.13958.
Djuranovic S, Nahvi A, Green R: A parsimonious model for gene regulation by miRNAs. Science. 2011, 331: 550-553. 10.1126/science.1191138.
Cerutti L, Mian N, Bateman A: Domains in gene silencing and cell differentiation proteins: the novel PAZ domain and redefinition of the Piwi domain. Trends Biochem Sci. 2000, 25: 481-482. 10.1016/S0968-0004(00)01641-8.
Aravind L, Koonin EV: Eukaryote-specific domains in translation initiation factors: implications for translation regulation and evolution of the translation system. Gen Res. 2000, 10: 1172-1184. 10.1101/gr.10.8.1172.
Makarova KS, Wolf YI, van der Oost J, Koonin EV: Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements. Biol Direct. 2009, 4: 29-10.1186/1745-6150-4-29.
Ma JB, Yuan YR, Meister G, Pei Y, Tuschl T, Patel DJ: Structural basis for 5′-end-specific recognition of guide RNA by the A. fulgidus Piwi protein. Nature. 2005, 434: 666-670. 10.1038/nature03514.
Yuan YR, Pei Y, Ma JB, Kuryavyi V, Zhadina M, Meister G, Chen HY, Dauter Z, Tuschl T, Patel DJ: Crystal structure of A. aeolicus argonaute, a site-specific DNA-guided endoribonuclease, provides insights into RISC-mediated mRNA cleavage. Mol Cell. 2005, 19: 405-419. 10.1016/j.molcel.2005.07.011.
Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al: The Pfam protein families database. Nucleic Acids Res. 2010, 38: D211-D222. 10.1093/nar/gkp985.
Aravind L, Leipe DD, Koonin EV: Toprim–a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucleic Acids Res. 1998, 26: 4205-4213. 10.1093/nar/26.18.4205.
Frank F, Sonenberg N, Nagar B: Structural basis for 5′-nucleotide base-specific recognition of guide RNA by human AGO2. Nature. 2010, 465: 818-822. 10.1038/nature09039.
Boland A, Tritschler F, Heimstadt S, Izaurralde E, Weichenrieder O: Crystal structure and ligand binding of the MID domain of a eukaryotic Argonaute protein. EMBO Rep. 2010, 11: 522-527. 10.1038/embor.2010.81.
Parker JS, Roe SM, Barford D: Structural insights into mRNA recognition from a PIWI domain-siRNA guide complex. Nature. 2005, 434: 663-666. 10.1038/nature03462.
Wang Y, Sheng G, Juranek S, Tuschl T, Patel DJ: Structure of the guide-strand-containing argonaute silencing complex. Nature. 2008, 456: 209-213. 10.1038/nature07315.
Wang Y, Juranek S, Li H, Sheng G, Tuschl T, Patel DJ: Structure of an argonaute silencing complex with a seed-containing guide DNA and target RNA duplex. Nature. 2008, 456: 921-926. 10.1038/nature07666.
Wang Y, Juranek S, Li H, Sheng G, Wardle GS, Tuschl T, Patel DJ: Nucleation, propagation and cleavage of target RNAs in Ago silencing complexes. Nature. 2009, 461: 754-761. 10.1038/nature08434.
Parker JS, Roe SM, Barford D: Crystal structure of a PIWI protein suggests mechanisms for siRNA recognition and slicer activity. EMBO J. 2004, 23: 4727-4737. 10.1038/sj.emboj.7600488.
Bourbon HM: Comparative genomics supports a deep evolutionary origin for the large, four-module transcriptional mediator complex. Nucleic Acids Res. 2008, 36: 3993-4008. 10.1093/nar/gkn349.
Boland A, Huntzinger E, Schmidt S, Izaurralde E, Weichenrieder O: Crystal structure of the MID-PIWI lobe of a eukaryotic Argonaute protein. Proc Nat Acad Sci USA. 2011, 108: 10466-10471. 10.1073/pnas.1103946108.
Kwak PB, Tomari Y: The N domain of Argonaute drives duplex unwinding during RISC assembly. Nat Struct Mol Biol. 2012, 19: 145-151. 10.1038/nsmb.2232.
Aravind L: Guilt by association: contextual information in genome analysis. Gen Res. 2000, 10: 1074-1077. 10.1101/gr.10.8.1074.
Huynen M, Snel B, Lathe W, Bork P: Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Gen Res. 2000, 10: 1204-1210. 10.1101/gr.10.8.1204.
Aravind L, Anantharaman V, Balaji S, Babu MM, Iyer LM: The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol Rev. 2005, 29: 231-262.
Pugh RA, Honda M, Leesley H, Thomas A, Lin Y, Nilges MJ, Cann IK, Spies M: The iron-containing domain is essential in Rad3 helicases for coupling of ATP hydrolysis to DNA translocation and for targeting the helicase to the single-stranded DNA-double-stranded DNA junction. J Biol Chem. 2008, 283: 1732-1743.
Rudolf J, Makrantoni V, Ingledew WJ, Stark MJ, White MF: The DNA repair helicases XPD and FancJ have essential iron-sulfur domains. Mol Cell. 2006, 23: 801-808. 10.1016/j.molcel.2006.07.019.
Singleton MR, Dillingham MS, Wigley DB: Structure and mechanism of helicases and nucleic acid translocases. Ann Rev Biochem. 2007, 76: 23-50. 10.1146/annurev.biochem.76.052305.115300.
Fairman-Williams ME, Guenther UP, Jankowsky E: SF1 and SF2 helicases: family matters. Curr Opinion Struct Biol. 2010, 20: 313-324. 10.1016/j.sbi.2010.03.011.
Ren B, Duan X, Ding H: Redox control of the DNA damage-inducible protein DinG helicase activity via its iron-sulfur cluster. J Biol Chem. 2009, 284: 4829-4835.
Aravind L, Anantharaman V, Zhang D, de Souza RF, Iyer LM: Gene flow and biological conflict systems in the origin and evolution of eukaryotes. Front Cell Infect Microbiol. 2012, 2: 89-
Bickle TA, Kruger DH: Biology of DNA restriction. Microbiol Rev. 1993, 57: 434-450.
Bourniquel AA, Bickle TA: Complex restriction enzymes: NTP-driven molecular motors. Biochimie. 2002, 84: 1047-1059. 10.1016/S0300-9084(02)00020-2.
McRobbie AM, Meyer B, Rouillon C, Petrovic-Stojanovska B, Liu H, White MF: Staphylococcus aureus DinG, a helicase that has evolved into a nuclease. Biochem J. 2012, 442: 77-84. 10.1042/BJ20111903.
Bukowy Z, Harrigan JA, Ramsden DA, Tudek B, Bohr VA, Stevnsner T: WRN Exonuclease activity is blocked by specific oxidatively induced base lesions positioned in either DNA strand. Nucleic Acids Res. 2008, 36: 4975-4987. 10.1093/nar/gkn468.
Murray NE: Type I restriction systems: sophisticated molecular machines (a legacy of Bertani and Weigle). MMBR. 2000, 64: 412-434.
Raghavendra NK, Bheemanaik S, Rao DN: Mechanistic insights into type III restriction enzymes. Front Biosci J Virt Library. 2012, 17: 1094-1107. 10.2741/3975.
Voloshin ON, Camerini-Otero RD: The DinG protein from Escherichia coli is a structure-specific helicase. J Biol Chem. 2007, 282: 18437-18447. 10.1074/jbc.M700376200.
Aguilera A, Garcia-Muse T: R loops: from transcription byproducts to threats to genome stability. Mol Cell. 2012, 46: 115-124. 10.1016/j.molcel.2012.04.009.
Boubakri H, de Septenville AL, Viguera E, Michel B: The helicases DinG, Rep and UvrD cooperate to promote replication across transcription units in vivo. EMBO J. 2010, 29: 145-157. 10.1038/emboj.2009.308.
Makarova KS, Aravind L, Wolf YI, Koonin EV: Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems. Biol Direct. 2011, 6: 38-10.1186/1745-6150-6-38.
Kreuzer KN, Brister JR: Initiation of bacteriophage T4 DNA replication and replication fork dynamics: a review in the Virology Journal series on bacteriophage T4 and its relatives. Virol J. 2010, 7: 358-10.1186/1743-422X-7-358.
Itoh T, Tomizawa J: Formation of an RNA primer for initiation of replication of ColE1 DNA by ribonuclease H. Proc Nat Acad Sci USA. 1980, 77: 2450-2454. 10.1073/pnas.77.5.2450.
Kogoma T: Stable DNA replication: interplay between DNA replication, homologous recombination, and transcription. MMBR. 1997, 61: 212-238.
Grewal SI, Elgin SC: Transcription and RNA interference in the formation of heterochromatin. Nature. 2007, 447: 399-406. 10.1038/nature05914.
Parker JS, Parizotto EA, Wang M, Roe SM, Barford D: Enhancement of the seed-target recognition step in RNA silencing by a PIWI/MID domain protein. Mol Cell. 2009, 33: 204-214. 10.1016/j.molcel.2008.12.012.
Conaway RC, Sato S, Tomomori-Sato C, Yao T, Conaway JW: The mammalian Mediator complex and its role in transcriptional regulation. Trends Biochem Sci. 2005, 30: 250-255. 10.1016/j.tibs.2005.03.002.
Malik S, Roeder RG: Dynamic regulation of pol II transcription by the mammalian Mediator complex. Trends Biochem Sci. 2005, 30: 256-263. 10.1016/j.tibs.2005.03.009.
Conaway RC, Conaway JW: Function and regulation of the Mediator complex. Curr Opinion Gen Dev. 2011, 21: 225-230. 10.1016/j.gde.2011.01.013.
Andrau JC, van de Pasch L, Lijnzaad P, Bijma T, Koerkamp MG, van de Peppel J, Werner M, Holstege FC: Genome-wide location of the coactivator mediator: binding without activation and transient Cdk8 interaction on DNA. Mol Cell. 2006, 22: 179-192. 10.1016/j.molcel.2006.03.023.
Zhu X, Wiren M, Sinha I, Rasmussen NN, Linder T, Holmberg S, Ekwall K, Gustafsson CM: Genome-wide occupancy profile of mediator and the Srb8-11 module reveals interactions with coding regions. Mol Cell. 2006, 22: 169-178. 10.1016/j.molcel.2006.03.032.
Samuelsen CO, Baraznenok V, Khorosjutina O, Spahr H, Kieselbach T, Holmberg S, Gustafsson CM: TRAP230/ARC240 and TRAP240/ARC250 Mediator subunits are functionally conserved through evolution. Proc Nat Acad Sci USA. 2003, 100: 6422-6427. 10.1073/pnas.1030497100.
Kuchin S, Yeghiayan P, Carlson M: Cyclin-dependent protein kinase and cyclin homologs SSN3 and SSN8 contribute to transcriptional control in yeast. Proc Nat Acad Sci USA. 1995, 92: 4006-4010. 10.1073/pnas.92.9.4006.
Gillmor CS, Park MY, Smith MR, Pepitone R, Kerstetter RA, Poethig RS: The MED12-MED13 module of Mediator regulates the timing of embryo patterning in Arabidopsis. Development. 2010, 137: 113-122. 10.1242/dev.043174.
Hengartner CJ, Myer VE, Liao SM, Wilson CJ, Koh SS, Young RA: Temporal regulation of RNA polymerase II by Srb10 and Kin28 cyclin-dependent kinases. Mol Cell. 1998, 2: 43-53. 10.1016/S1097-2765(00)80112-4.
Akoulitchev S, Chuikov S, Reinberg D: TFIIH is negatively regulated by cdk8-containing mediator complexes. Nature. 2000, 407: 102-106. 10.1038/35024111.
Elmlund H, Baraznenok V, Lindahl M, Samuelsen CO, Koeck PJ, Holmberg S, Hebert H, Gustafsson CM: The cyclin-dependent kinase 8 module sterically blocks Mediator interactions with RNA polymerase II. Proc Nat Acad Sci USA. 2006, 103: 15788-15793. 10.1073/pnas.0607483103.
Knuesel MT, Meyer KD, Bernecky C, Taatjes DJ: The human CDK8 subcomplex is a molecular switch that controls Mediator coactivator function. Gen Dev. 2009, 23: 439-451. 10.1101/gad.1767009.
Ding N, Zhou H, Esteve PO, Chin HG, Kim S, Xu X, Joseph SM, Friez MJ, Schwartz CE, Pradhan S, Boyer TG: Mediator links epigenetic silencing of neuronal gene expression with x-linked mental retardation. Mol Cell. 2008, 31: 347-359. 10.1016/j.molcel.2008.05.023.
Taft RJ, Kaplan CD, Simons C, Mattick JS: Evolution, biogenesis and function of promoter-associated RNAs. Cell Cycle. 2009, 8: 2332-2338. 10.4161/cc.8.15.9154.
Valen E, Preker P, Andersen PR, Zhao X, Chen Y, Ender C, Dueck A, Meister G, Sandelin A, Jensen TH: Biogenic mechanisms and utilization of small RNAs derived from human protein-coding genes. Nat Struct Mol Biol. 2011, 18: 1075-1082. 10.1038/nsmb.2091.
Seila AC, Core LJ, Lis JT, Sharp PA: Divergent transcription: a new feature of active promoters. Cell Cycle. 2009, 8: 2557-2564. 10.4161/cc.8.16.9305.
Taft RJ, Glazov EA, Cloonan N, Simons C, Stephen S, Faulkner GJ, Lassmann T, Forrest AR, Grimmond SM, Schroder K, et al: Tiny RNAs associated with transcription start sites in animals. Nat Gen. 2009, 41: 572-578. 10.1038/ng.312.
Taft RJ, Simons C, Nahkuri S, Oey H, Korbie DJ, Mercer TR, Holst J, Ritchie W, Wong JJ, Rasko JE, et al: Nuclear-localized tiny RNAs are associated with transcription initiation and splice sites in metazoans. Nat Struct Mol Biol. 2010, 17: 1030-1034. 10.1038/nsmb.1841.
Burroughs AM, Ando Y, de Hoon MJ, Tomaru Y, Suzuki H, Hayashizaki Y, Daub CO: Deep-sequencing of human argonaute-associated small RNAs provides insight into miRNA sorting and reveals argonaute association with RNA fragments of diverse origin. RNA Biol. 2011, 8: 158-177. 10.4161/rna.8.1.14300.
Carrera I, Janody F, Leeds N, Duveau F, Treisman JE: Pygopus activates Wingless target gene transcription through the mediator complex subunits Med12 and Med13. Proc Nat Acad Sci USA. 2008, 105: 6644-6649. 10.1073/pnas.0709749105.
Gobert V, Osman D, Bras S, Auge B, Boube M, Bourbon HM, Horn T, Boutros M, Haenlin M, Waltzer L: A genome-wide RNA interference screen identifies a differential role of the mediator CDK8 module subunits for GATA/RUNX-activated transcription in Drosophila. Mol Cell Biol. 2010, 30: 2837-2848. 10.1128/MCB.01625-09.
Lai F, Orom UA, Cesaroni M, Beringer M, Taatjes DJ, Blobel GA, Shiekhattar R: Activating RNAs associate with mediator to enhance chromatin architecture and transcription. Nature. 2013, 494: 497-501. 10.1038/nature11884.
Muljo SA, Kanellopoulou C, Aravind L: MicroRNA targeting in mammalian genomes: genes and mechanisms. Wiley Interdis Rev Syst Biol Med. 2010, 2: 148-161. 10.1002/wsbm.53.
Tabara H, Sarkissian M, Kelly WG, Fleenor J, Grishok A, Timmons L, Fire A, Mello CC: The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell. 1999, 99: 123-132. 10.1016/S0092-8674(00)81644-X.
Cogoni C, Macino G: Isolation of quelling-defective (qde) mutants impaired in posttranscriptional transgene-induced gene silencing in Neurospora crassa. Proc Nat Acad Sci USA. 1997, 94: 10233-10238. 10.1073/pnas.94.19.10233.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Johnson LS, Eddy SR, Portugaly E: Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics. 2010, 11: 431-10.1186/1471-2105-11-431.
Lassmann T, Frings O, Sonnhammer EL: Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res. 2009, 37: 858-865. 10.1093/nar/gkn1006.
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.
Cole C, Barber JD, Barton GJ: The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 2008, 36: W197-W201. 10.1093/nar/gkn238.
Remmert M, Biegert A, Hauser A, Soding J: HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012, 9: 173-175.
Holm L, Rosenstrom P: Dali server: conservation mapping in 3D. Nucleic Acids rRes. 2010, 38: W545-W549. 10.1093/nar/gkq366.
Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM: MUSTANG: a multiple structural alignment algorithm. Proteins. 2006, 64: 559-574. 10.1002/prot.20921.
The authors’ research is supported by the intramural funds of the US Department of Health and Human Services (National Library of Medicine, NIH).
The authors declare that they have no competing interests.
AMB collected data; AMB, LMI and LA analyzed the data; AMB and LA wrote the manuscript that was read and approved by all authors.