Identification of an ortholog of the eukaryotic RNA polymerase III subunit RPC34 in Crenarchaeota and Thaumarchaeota suggests specialization of RNA polymerases for coding and non-coding RNAs in Archaea

One of the hallmarks of eukaryotic information processing is the co-existence of 3 distinct, multi-subunit RNA polymerase complexes that are dedicated to the transcription of specific classes of coding or non-coding RNAs. Archaea encode only one RNA polymerase that resembles the eukaryotic RNA polymerase II with respect to the subunit composition. Here we identify archaeal orthologs of the eukaryotic RNA polymerase III subunit RPC34. Genome context analysis supports a function of this archaeal protein in the transcription of non-coding RNAs. These findings suggest that functional separation of RNA polymerases for protein-coding genes and non-coding RNAs might predate the origin of the Eukaryotes. Reviewers: This article was reviewed by Andrei Osterman and Patrick Forterre (nominated by Purificación López-García)


Findings
All Eukaryotes possess 3 distinct, multi-subunit RNA polymerases (RNAPs): RNA polymerase I (transcription of 16S and 23S rRNA), RNAP II (transcription of proteincoding mRNAs), and RNAP III (transcription of 5S rRNA, tRNA and some other small non-coding RNAs). Plants have two additional RNAPs involved in the transcription of small interfering RNA [1]. RNAP III has counterparts (either identical or paralogous) to all subunits of RNAP I and RNAP II [2]. In addition, RNAP III possesses the loosely bound RPC82/RPC34/ RPC31 sub-complex. This sub-complex is present in all Eukaryotes, although RPC31 is missing in two major eukaryotic lineages (Alveolates and Excavates) [3]. Transcription initiation by RNAP III requires, among others, the TBP and TFIIIB70 proteins. TBP is shared with RNAP II, and the N-terminal region of TFIIIB70 is homologous to the RNAP II factor TFIIB, whereas the C-terminal region is specific for TFIIIB70. The archaeal RNAP (aRNAP) resembles RNAP II in its subunit composition [2]. Furthermore, the aRNAP machinery employs the transcrip-tion initiation factors TBP and TFB, which are orthologs and functional counterparts to the eukaryotic TBP and TFIIB/TFIIIB70, respectively [4].
In the RNAP II and aRNAP machineries, TFIIB and TFB are thought to recruit the RNAP directly to the transcription pre-initiation complex. In contrast, RNAP III requires the RPC34 subunit to mediate the interaction between TFIIIB70 and RNAP III [5][6][7]. Both the conserved N-terminal region and the unique C-terminal region of TFIIIB70 contribute to RPC34 binding [6,8]. Given the conservation of RPC34 in all eukaryotes and its central role in the recruitment of RNAP III to the pre-initiation complex, it seems likely that RPC34 played an important role in the evolution of the RNAP III transcription system. To address this possibility, we set out to identify potential archaeal homologs of RPC34.

Identification of archaeal homologs of RPC34
Using PSI-BLAST search [9] (against the RefSeq database [10], with default parameters) with human RPC34 as the query (GI: 149640989), we detected a hit to a Cenarchaeum symbiosum (strain A) protein [GI: 118575757 with E-value = 5 × 10 -5 ] after the first iteration. Reciprocal search starting from the Cenarchaeum symbiosum sequence [GI: 118575757; 244-362 aa] identified the first eukaryotic RPC34 homolog [GI:157138209 with E-value = 4 × 10 -10 ] after the first iteration. All archaeal orthologs can be retrieved after the first iteration in the course of the same search (the complete information is available in Additional File 1). We identified apparent orthologs of RPC34 in all crenarchaeal and thaumarchaeal genomes as well as in several lineages of Euryarchaeota but not in Candidatus Korarchaeum cryptofilum OPF8, the only Korarchaeote sequenced so far (Additional File 1). None of these archaeal sequences are annotated as RPC34 homologs in the Refseq database. In agreement with the PSI-BLAST results, a Conserved Domain Database search [11] with various crenarchaeal and thaumoarchaeal sequences as queries identifies the statistically significant similarity (Evalue ~0.001) of their C-terminal domain to a profile pfam05158, RNA polymerase RPC34 subunit. A similar result was obtained using HHPRED search [12]. For the same Cenarchaeum symbiosum A query, pfam05158 (RNA polymerase Rpc34 subunit) was detected with E-value = 6.6 × 10 -23 ; in the same HHPRED search, the sequence corresponding to the structure of human RPC34 winged helix-turn-helix (wHTH) domain [PDB:2dk5] was detected with E-value = 2 × 10 -11 . The next most similar family of wHTH-domain-containing proteins was the MarR family of transcriptional regulators (pfam010470, with E-value = 1.2 × 10 -9 ). The latter observation is also consistent with the PSI-BLAST search results of the HTH region of archaeal RPC34 orthologs in which MarR family sequences were identified as the closest hits. Most likely, this relationship between RPC34 and MarR is the cause of the misannotation of some of the apparent archaeal orthologs of RPC34 as MarR family transcriptional regulators [e.g. GI:18313992]. Thus, the N-terminal region of archaeal RPC34 orthologs contains a wHTH domain, whereas the C-terminal domain is a distinct Zn-finger domain shared with most eukaryotic RPC34 sequences.
The multiple alignment of the eukaryotic RPC34 sequences and their archaeal orthologs reveals conservation of two regions ( Figure 1). In agreement with the above observations, the first region corresponds to the Nterminal wHTH domain (with all structural elements of wHTH, namely, three α-helices and two β-strands, preserved) whereas the second conserved region corresponds to the Zn-finger domain with the unique CxxC-x(3-5)-Cx(4-10)-C signature. There are substantial differences in the Zn-finger domain architectures of the euryarchaeal domains, on the one hand, and the crenarchaeal, thaumarchaeal and eukaryotic domains, on the other hand. In particular, the Zn-finger signature cysteines are not conserved in all sequences from Halobacteriales. All eukaryotic sequences contain a structured insert between the wHTH and the Zn-finger domains (according to PSIPRED [13] secondary structure prediction) that probably represents a distinct domain. Thaumoarchaeal proteins contain an extended region of low complexity N-terminal of the wHTH domain.

Phylogenetic analysis of the RPC34 family
We constructed a phylogenetic tree from the alignment of the wHTH and Zn-finger domains of the eukaryotic and archaeal RPC34 orthologs, using the MarR family wHTH domain as an outgroup ( Figure 2). Consistent with the apparent synapomorphies in the Zn-finger domain architecture (see above), the phylogenetic analysis shows that eukaryotic proteins group with crenarchaeal and thaumarchaeal sequences with reliable bootstrap support, excluding all euryarchaeal sequences (Figure 2 and Additional file 2). Moreover, the eukaryotic lineage is rooted deeply within the crenarchaeal-thaumarchaeal subtree ( Figure 2), suggesting that eukaryotic RPC34 indeed originates from an ancestor that belonged to this group of archaeal proteins.

Analysis of neighborhood of archaeal RPC34 orthologs
To gain insight into possible functions of the archaeal RPC34 orthologs, we analyzed the genomic context of the respective genes. In thaumarchaeal and crenarchaeal genomes, the RPC34 genes co-localize and are predicted to be co-transcribed with several genes for proteins involved in modification or processing of tRNA and rRNA ( Figure 3). In the majority of crenarchaea, the RPC34 gene is also potentially co-transcribed with a gene for a TFB par-alog (COG1405). Generally, archaeal genomes encode at least two TFB paralogs, so an intriguing possibility is that the crenarchaeal RPC34 ortholog interacts with a specific TFB paralog analogously to the interaction of eukaryotic RPC34 with the TFIIB paralog TFIIIB70. Genes encoding the euryarchaeal RPC34 orthologs, with the exception of those from Halobacteriales, are predicted to be co-transcribed with genes for Sm-like protein paralogs (COG1958). The Archaeoglobus fulgidus homolog Af-Sm2 has been shown to co-immunoprecipitate with RNase P RNA, and Sm-like proteins are generally believed to form ribonucleoprotein complexes [14].

Possible role of the archaeal RPC34 ortholog in transcription
The genomic context of the archaeal RPC34 ortholog as well as the analogy with the eukaryotic RPC34, suggest that these archaeal proteins might be involved in transcription of rRNA and tRNA genes. It has been shown that, in a reconstituted in vitro transcription system from Sulfolobus shibatae transcription from rRNA and tRNA promoters could be successfully initiated in the absence of the RPC34 ortholog [15]. Hence, there is no strict RPC34 requirement for recognition of these promoters and recruitment of the aRNAP. Nevertheless, a regulatory role of this protein in the transcription of structural RNAs by aRNAP appears likely. The wHTH motif might mediate protein-DNA-interactions given that the eukaryotic RPC34 was cross-linked to DNA in transcription initiation complexes [16] but, to our knowledge, RPC34 has not been reported to contribute to promoter recognition. Electron microscopy revealed the position of the RPC82/ RPC34/RPC31 sub-complex in the core RNAP III close to the "clamp" formed by the N-terminal part of the largest subunit, C1 [17]. The "clamp"-domain is conserved in all multi-subunit RNAPs, but an RNAP III-specific region is thought to be important for RPC34 binding specificity [3]. The archaeal RPC34 ortholog might similarly recruit aRNAP to the transcription pre-initiation complex via the "clamp"-domain and so enhance the transcription of structural RNAs.
The nature of the archaeal "parent" of eukaryotes is a wide open question [18,19]. Detailed comparison of individual functional systems allows partial reconstruction of the gene repertoire of this elusive entity. With respect to the transcription system, the present findings add to the other recent observations that reveal the existence of RNAP subunits and transcription factors that are specifically shared between eukaryotes and Crenarchaeota, along with either Thaumarchaeota or Korarchaeota [20][21][22][23].

Sequence analysis
Refseq database at the NCBI [10] was used for PSI-BLAST searches. Database searches were performed using PSI-BLAST [9] with default. We also used the remote homology identification servers for CDD-search [11] and HH search [12]. Multiple alignments of protein sequences were constructed by using MUSCLE program [24], followed by a minimal manual correction on the basis of local alignments obtained using PSI-BLAST [9]. Protein secondary structure was predicted using the PSIPRED program [13].
Maximum likelihood (ML) phylogenetic trees were constructed from the alignment of archaeal RPC34 orthologs (the positions used for reconstruction are shown in Figure  1), by using the MOLPHY program [25] with the JTT substitution matrix to perform local rearrangement of an The archaeal RPC34 orthologs -Phylogenetic analysis of the RPC34 family Figure 2 The archaeal RPC34 orthologs -Phylogenetic analysis of the RPC34 family. The ML tree was rooted using selected representatives of the MarR family as the outgroup (archaeal members of this group are shown in blue). A version of the tree with complete information for all the sequences used for tree construction and RELL bootstrap values is available in Additional File 2.
genes). Both type of transcription being driven by different TFB, the TFB required for the transcription of non-coding genes interacting with RPC34. It is known that transcription of tRNA or rRNA genes in vitro by archaeal RNA polymerase can occur in the absence of this protein.
It will be nevertheless interesting to test this hypothesis by checking the effect of this protein on such system, for example in competition experiment with different types of promoter and different TBP. It will be also important to test the function of these proteins in vivo using the genetic systems recently developed for Sulfolobus species. The publication of this nice short paper will certainly encourage different labs to perform this kind of experiments. On the other hand, one cannot exclude the possibility that the archaeal RPC34 and related TBP are not involved in transcription per se but play another fundamental role in archaeal RNA metabolism. Interestingly, the archaeal RPC34 homologue is present in crenarchaea and thaumarchaea, but not in korarchaea and euryarchaea, the latter containing a more distantly related homologue. This indicates that the RPC34 was present in the last common archaeal ancestor and was later on lost in euryarchaea. The authors suggest from their data that the specialization of RNA polymerase between those transcribing coding and non-coding genes might have evolved already in Archaea and was inherited by Eukaryotes from the "archaeal parents". I previously criticized this notion of archaeal parents, noticing that we don't descend from Apes. Eugene Koonin correctly pointed out that I was wrong since we are apes indeed! However, are Eukaryotes Archaea? We don't know. It might be that Archaea are reduced protoeukaryotes? In my opinion, it's still a prejudice to consider that Eukarya evolved from Archaea. I would say that the data presented in this nice paper indicate that the RPC34 protein was present in the last common ancestor of Archaea and Eukarya. As an alternative to the hypothesis proposed by the authors, it could be that this ancestor contained the ancestor of RNA polymerase III and that this protein (but not RPC34) was lost in Archaea (streamlining).