- Discovery notes
- Open Access
Unraveling the biochemistry and provenance of pupylation: a prokaryotic analog of ubiquitination
Biology Directvolume 3, Article number: 45 (2008)
Recently Mycobacterium tuberculosis was shown to possess a novel protein modification, in which a small protein Pup is conjugated to the epsilon-amino groups of lysines in target proteins. Analogous to ubiquitin modification in eukaryotes, this remarkable modification recruits proteins for degradation via archaeal-type proteasomes found in mycobacteria and allied actinobacteria. While a mycobacterial protein named PafA was found to be required for this conjugation reaction, its biochemical mechanism has not been elucidated. Using sensitive sequence profile comparison methods we establish that the PafA family proteins are related to the γ-glutamyl-cysteine synthetase and glutamine synthetase. Hence, we predict that PafA is the Pup ligase, which catalyzes the ATP-dependent ligation of the terminal γ-carboxylate of glutamate to lysines, similar to the above enzymes. We further discovered that an ortholog of the eukaryotic PAC2 (e.g. cg2106) is often present in the vicinity of the actinobacterial Pup-proteasome gene neighborhoods and is likely to represent the ancestral proteasomal chaperone. Pup-conjugation is sporadically present outside the actinobacteria in certain lineages, such as verrucomicrobia, nitrospirae, deltaproteobacteria and planctomycetes, and in the latter two lineages it might modify membrane proteins.
This article was reviewed by M. Madan Babu and Andrei Osterman
It was recently shown that Mycobacterium tuberculosis contains a small protein, Pup (Rv2111c), that is covalently conjugated to the ε-NH2 groups of lysines on several target proteins (pupylation) such as the malonyl CoA acyl carrier protein (FabD) . Mycobacterium, like most other actinobacteria, also possesses an archaeal-type proteasome that contains an AAA+ ATPase and two distinct NTN hydrolase-type peptidases . Pupylation of FabD was shown to result in its recruitment to the mycobacterial proteasome and subsequent degradation analogous to eukaryotic ubiquitin-conjugated proteins. This remarkable conjugation reaction was found to be dependent upon another mycobacterial protein, the proteasome accessory factor (PafA) [1, 3]. Unlike ubiquitin and related ubiquitin-like proteins (UBLs), which are conjugated to target lysines by means of successive trans-thiolation reactions involving their C-terminal glycine residue, Pup was shown to be conjugated via the γ-carboxylate of the terminal glutamate [1–3]. Based on this the discoverers of pupylation suggested that the conjugation process might involve a different biochemistry, but did not specify what this reaction might be .
Using sensitive sequence analysis methods we show that PafA, the protein required for pupylation, belongs to the glutamine synthetase fold and predict that it is likely to catalyze an ATP-dependent peptide ligase reaction.
Results and discussion
Phyletic patterns, genome organization and evolutionary relationships of Pup and PafA
To understand better the pupylation process we investigated both Pup and PafA using sensitive sequence profile searches with the PSI-BLAST program and HMMer package. Pup was previously detected only in actinobacteria . Our searches recovered Pup orthologs in all major actinobacteria lineages including the basal bifidobacteria and also sporadically in certain other bacterial lineages, such as nitrospirae, deltaproteobacteria (e.g. Plesiocystis), planctomycetes (e.g. Rhodopirellula) and the verrucomicrobia-chlamydia clade (e.g. Methylacidiphilum). The Pup proteins were all between 50–90 residues in length and a multiple alignment shows that they all contain a conserved motif with a G [EQ] signature at the C-terminus [Additional file 1]. Thus, all of them are suitable for conjugation via the terminal glutamate or the deamidated glutamine (as shown in the case of the Mycobacterium Pup ). The conserved globular core of Pup is predicted to form a bihelical unit with the extreme C-terminal 6–7 residues forming a tail in the extended conformation [Additional file 1]. Thus, Pup is structurally unrelated to the ubiquitin fold and has convergently evolved the function of protein modifier. Similar searches with the PafA protein of Mycobacterium showed that it had a phyletic pattern closely mirroring that of Pup; though in several lineages there were two paralogs of PafA (Fig. 1A and [Additional file 1]). PafA homologs (both, if two are present) and Pup are genomic neighbors in all bacterial lineages, with the Pup gene invariably being adjacent to one of the PafA genes (Fig. 1A). With the exception of the deltaproteobacterium Plesiocystis and the planctomycete Rhodopirellula, genes for the three proteasomal subunits are also associated with this conserved gene neighborhood (Fig. 1A). This suggests that in most currently available genomes with these genes there is a strong functional linkage between Pup, PafA and the archaeal-type proteasome, recapitulating the experimentally observed situation in M. tuberculosis [1, 3].
PafA was earlier reported as a protein with no relationship to known protein domains [1, 3]. A search with the Saccharopolyspora PafA homolog (SACE_2254; gi: 134098823) recovered γ-glutamyl-cysteine synthetase-2 (γ-glutamyl-cysteine ligase-2; GCS2) from Saccharopolyspora with borderline statistical significance (gi:134100361; expect-value = 0.08). Interestingly, this alignment completely spanned the GhExE signature (where 'h' is a hydrophobic residue and 'x' any residue), which is absolutely conserved in both PafA and the GCS2 families and forms part of the Mg2+ and ATP binding active site of the latter enzymes (Fig. 1B and 2). To further explore the evolutionary affinities of the PafA family we prepared a multiple alignment and used an HMM derived from this alignment for an HHpred profile-profile comparison search against a library of HMMs derived from non-redundant PDB structures as seeds. This search recovered the GCS2 HMM (based on PDB: 1r8g) as the highly significant best hit (p-value= 10-5), with an alignment spanning the entire length of the GCS2 catalytic domain and matching all key conserved motifs (Fig. 2; see below). Thus, the PafA family appears to be a member of the glutamine synthetase (GS) fold to which GCS2 belongs [4, 5]. While all known members of the GS fold catalyze ATP-dependent phosphotransfer reactions, they belong to either of two distantly related superfamilies: 1) The carboxylate-amine/ammonia ligases, which catalyze a two step ligase reaction involving phosphorylation of a carboxylate group (usually γ-carboxylate of glutamate) followed by ligation of the amino group of an amino acid (GCS1 and GCS2) or ammonia (glutamine synthetases) with the formation of an amide linkage (Fig. 1C) . 2) The guanido kinases, which phosphorylate the guanido group of arginine or creatine [7, 8]. Given that the GhExE is a distinctive signature only seen in the first superfamily, it became clear that PafA is a member of the carboxylate-amine/ammonia ligase superfamily.
To better understand the affinities of the PafA family within this superfamily and the functional implications of this relationship we first defined the conserved core shared by all carboxylate- amine/ammonia ligases using characterized structures. We generated a structural alignment of the glutamine synthetase, GatB and GatE proteins, which catalyze the in situ synthesis of glutamine or asparagine on Q-tRNA or N-tRNA charged with glutamate and aspartate respectively, and two families of γ-glutamyl-cysteine synthetases (GCS1 and GCS2) using the MUSTANG program. This alignment showed that despite several large family-specific inserts, the entire superfamily shared 6 conserved strands, typically in a 231465 arrangement, with at least two universally conserved helices occurring C-terminal to strands 3 and 6, respectively (Fig. 1B). These strands form a saddle-shaped structure with the active site located on the concave face and the conserved helices packing against the convex face. The structural alignment also revealed that the core strands 1, 2, 3, 4 and 6 contributed key catalytic residues to the active site in all members of this superfamily. The predicted secondary structure of the PafA family revealed the presence of equivalents of all conserved strands of this ligase superfamily (Fig. 1B, 2). Further, a comparison of motifs on equivalent strands showed that (Fig. 1B, 2): 1) the PafA family contains a GhExE on the core strand-1 which is equivalent to the Ex [EH] motif present in the first strand of all characterized superfamily members. 2) PafA shares with the rest of the superfamily conserved acidic residues on core strands 2 and 3, which are involved in contacting Mg2+ and/or ATP. 3) In core strand-4 PafA contains a [HQ] x [NH] motif that is equivalent to the [HD] x [NH] motif that is present in all previously characterized members of this superfamily. This motif is critical for interacting with both the phosphate on the intermediate and a metal ion in the active site . 4) In core strand-6 PafA displays a motif of the form [QH]×4D that corresponds to the motif Ex [RK]×2D seen in the equivalent strand of other members of the superfamily. The first conserved polar residue in this motif is located close to the active site metal and ATP. 5) Additionally, the PafA family shares with all carboxylate- amine/ammonia ligases, excluding the GatB and GatE families, a conserved arginine in core strand-5 and another arginine in the long loop N-terminal to this strand (Fig. 1B, 2). These arginines project into the active site surface and are likely to act as "arginine fingers"  in stabilizing the hyper-charged intermediate during phosphotransfer or participate in binding one of the substrates. Thus, the PafA family possesses all the features needed to function as an ATP-dependent carboxylate-amine ligase, like other members of this superfamily.
Functional and evolutionary implications of PafA as a carboxylate-amine ligase
The above observation together with the experimental evidence and genomic context strongly imply that PafA is the Pup ligase, and catalyzes the ligation of the γ-carboxylate of the terminal glutamate (or glutamine deamidated to glutamate) of Pup to the ε-NH2 group of a lysine on the target protein (Fig. 1C). Many enzymes of the carboxylate-amine ligase superfamily, including GCS1 and GCS2, function as dimers. Hence, in light of the frequent presence of two PafA paralogs in most organisms, we propose that the Pup ligase is typically a heterodimer. However, in cases like Mycobacterium, with a single PafA gene, it is likely to be a homodimer. In several actinobacteria (e.g. Arthrobacter, Streptomyces) this gene neighborhood also includes two Fkbp-type peptidyl prolyl isomerases and a DeoR-family transcription factor (Fig. 1A). The former association suggests that prolyl isomerases might have an accessory role in pupylation of certain substrates. The associated DeoR transcription factor might regulate expression of the pupylation and protein degradation system by sensing a small molecule. Some actinobacterial Pup-proteasome gene neighbhorhoods contain another conserved protein typified by Corynebacterium cg2106 (PBD: 2p90), which is also found in archaea, frequently in the neighborhood of the proteasomal ATPase subunit. Most bacteria and archaea encode two cg2106 paralogs and sequence profile searches revealed that they are orthologs of the eukaryotic chaperone PAC2 required for proteasome assembly . Cg2106 forms a trimeric torroid, suggesting that it might provide a scaffold for assembly of proteasomal peptidase subunits. As none of the other eukaryotic proteasomal chaperones have orthologs in archaea or bacteria, this protein is likely to represent the ancestral chaperone of the proteasome (Additional file 1). In both Plesiocystis and Rhodopirellula, we find no linkage between Pup/Pup ligase and genes for proteasomal subunits; instead they are linked to a gene for a membrane protein (Fig. 1A). Interestingly, these Pup ligases contain a remarkable insertion of 4 trans-membrane segments immediately C-terminal to the core strand-4 [Additional File 1]. Based on available structures of members of the GS fold these TM helices are predicted to stick out of the core fold without distorting it and are likely to anchor these Pup ligases to the cytoplasmic face of the cell membrane. Hence, in these organisms pupylation of membrane-associated proteins might have a regulatory role.
Given that the best hits for Pup ligases in profile-profile comparisons is the widely distributed GCS2 family, and the fact that the γ-glutamyl-cysteine synthetases catalyze a very similar reaction to pupylation, it is likely that the Pup ligase emerged in the actinobacterial lineage from a GCS2 precursor. We carried out multiple sequence profile searches with different starting points of carboxylate-amine/ammonia ligase superfamily to identify additional members. As a result we recovered two more previously uncharacterized families of these ligases [Additional file 1]. The first of these families is comprised of large proteins containing an N-terminal transglutaminase-like papain fold domain fused to a C-terminal domain of the carboxylate-amine/ammonia ligase superfamily (E.g. Mycobacterium tuberculosis Rv2566, gi: 15609703). Proteins of the second family (E.g. Clostridium perfringens CJD_1902, gi: 182624943) are similarly sized to GCS2 and are found in conserved gene neighborhoods encoding a glutamine amidotransferase-like thiol peptidase (in proteobacteria) or an Aig2-family γ-glutamyl cyclotransferase (in firmicutes) . In neither of these cases small, conserved ORFs reminiscent of Pup are encoded in their gene neighborhoods. This observation, in conjunction with their domain fusions and gene-neighborhoods, suggests that they are likely to mediate peptide formation reactions in the context of synthesis of glutathione or related peptide secondary metabolites rather than conjugating proteins. Hence, pupylation appears to be a rather distinctive reaction, despite the shared biochemistry, that has emerged from a superfamily that otherwise specializes in cofactor (glutathione) or amino acid (glutamine) biosynthesis. In this respect it is reminiscent of the emergence of ubiquitination from precursors likewise involved in cofactor (molybdopterin and thiamine) and amino acid (cysteine) biosynthesis [12–14]. Thus, remarkably similarly covalent protein modifications by peptides or amino acids appear to have convergently evolved on at least 3 distinct occasions in unrelated folds of enzymes: 1) Ubiquitination in the Rossmanoid E1 fold and the distinct E2 fold ; 2) Pupylation in the GS fold and 3) Bacterial and eukaryotic N-end rule arginyl or leucyl ligation in the acetyltransferase fold .
Materials and methods
Gene neighborhoods were determined using a custom script that uses completely sequenced genomes or whole genome shot gun sequences to derive a table of gene neighbors centered on a query gene. Then the BLASTCLUST program  is used to cluster products across the neighborhoods and establish conserved co-co-occurring genes. These conserved gene neighborhoods are then sorted as per a ranking scheme based on occurrence in at least one other phylogenetically distinct lineage ("phylum" in NCBI Taxonomy database), complete conservation in a particular lineage ("phylum") and physical closeness on the chromosome indicating sharing of regulatory -10 and -35 elements. Profile searches were conducted using the PSI-BLAST program with a default profile inclusion expectation (E) value threshold of 0.01 . Profile-profile comparisons were performed using the HHpred program . Multiple alignments were constructed using the Kalign program  followed by manual adjustments based on structural alignments generated using MUSTANG . Protein secondary structure was predicted using a multiple alignment as the input for the JPRED program .
M. Madan Babu, MRC-LMB, University of Cambridge, Cambridge CB22QH, United Kingdom
In this manuscript, Lakshminarayan Iyer, Maxwell Burroughs and L Aravind report an important study that sheds light on the potential catalytic mechanism of how Pup, a small protein, gets post-translationally added to substrates. In Mycobacterium tuberculosis , it was recently shown that PafA was a factor that was important for such a modification to occur. Though this was known, the catalytic mechanism of how this is achieved remains unknown. In this work, using a combination of comparative genomic analysis, sequence and structure comparisons, the authors reveal that PafA is a distant evolutionary relative of the gamma-glutamyl-cysteine synthetase/glutamine synthetases. By a systematic comparison of available sequences of homologs and their structures, they identify critical residues that are important for function. Using these observations, the authors predict that PafA is likely to catalyze an ATP dependent ligation of the gamma-carboxylate of glutamate of Pup to lysines of the substrates.
They also show that Pup-conjugation is likely to be present sporadically outside actinobacteria.
In summary, this is an exciting and timely work, reporting a significant finding. Therefore, I would strongly support publication of this work in Biology Direct .
1. In instances where you do not find Pup proteins could it be a gene prediction error? Do you predict short ORFs that maybe missed by conventional gene prediction programs?
Authors' response: In all organisms where a PafA gene is found a gene encoding Pup is found as its neighboring gene. Being a small protein it has not been annotated in several actinobacteria and Rhodopirellula. We have translated some of these as examples and include it in additional file 1.
Andrei Osterman, Burnham Institute, La Jolla, CA, United States
The manuscript "Unraveling the biochemistry and provenance of pupylation: a prokaryotic analog of ubiquitination" by L. M. Iyer, A. M. Burroughs and L. Aravind conveys the most spectacular bioinformatics-based discovery. Everything about this short article is truly amazing, starting from its very modest size and as-a-matter-of-fact style of presenting a genuine intellectual breakthrough. Authors brilliantly combined comparative genomics, structural bioinformatics and biochemical reasoning to discover a novel enzymatic mechanism of tremendous biological importance. Although we are already quite spoiled by the scale of insightful functional inferences produced by bioinformatics and comparative genomics, this study constitutes a quantum leap into an entirely new dimension. The predicted mechanism is so obviously elegant that its "reduction to practice", which will inevitably follow very soon, will hardly add much to the story. In addition to establishing a new enzymology, the authors provided solid evidence that the physiological role of pupylation, at least in some bacteria, extends beyond tagging proteins to proteasomal cleansing. This observation opens a new line of studies that will likely follow. Finally, this paper provides an excellent tutorial in advanced bioinformatics and the most compelling illustration of its impact in biological discovery.
1. Do you believe that there is a specific terminal Gln deamidase working in TB? If yes, any candidates?
Authors' response: Gene neighborhoods do not reveal any candidates for glutamine deamidation. Given that the deamidation reaction is related to that proposed to be catalyzed by the Pup ligase it is possible that in cases where a terminal glutamine is found it first deamidates it before proceeding with the ligase reaction. Alternatively a non-specific amidase might be involved.
Pearce MJ, Mintseris J, Ferreyra J, Gygi SP, Darwin KH: Ubiquitin-Like Protein Involved in the Proteasome Pathway of Mycobacterium tuberculosis. Science. 2008
Pearce MJ, Arora P, Festa RA, Butler-Wu SM, Gokhale RS, Darwin KH: Identification of substrates of the Mycobacterium tuberculosis proteasome. Embo J. 2006, 25: 5423-5432. 10.1038/sj.emboj.7601405.
Festa RA, Pearce MJ, Darwin KH: Characterization of the proteasome accessory factor (paf) operon in Mycobacterium tuberculosis. J Bacteriol. 2007, 189: 3044-3050. 10.1128/JB.01597-06.
Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008, 36: D419-425. 10.1093/nar/gkm993.
Lehmann C, Doseeva V, Pullalarevu S, Krajewski W, Howard A, Herzberg O: YbdK is a carboxylate-amine ligase with a gamma-glutamyl:Cysteine ligase activity: crystal structure and enzymatic assays. Proteins. 2004, 56: 376-383. 10.1002/prot.20103.
Abbott JJ, Pei J, Ford JL, Qi Y, Grishin VN, Pitcher LA, Phillips MA, Grishin NV: Structure prediction and active site analysis of the metal binding determinants in gamma -glutamylcysteine synthetase. J Biol Chem. 2001, 276: 42099-42107. 10.1074/jbc.M104672200.
Fritz-Wolf K, Schnyder T, Wallimann T, Kabsch W: Structure of mitochondrial creatine kinase. Nature. 1996, 381: 341-345. 10.1038/381341a0.
Zhou G, Somasundaram T, Blanc E, Parthasarathy G, Ellington WR, Chapman MS: Transition state structure of arginine kinase: implications for catalysis of bimolecular reactions. Proc Natl Acad Sci USA. 1998, 95: 8449-8454. 10.1073/pnas.95.15.8449.
Ahmadian MR, Stege P, Scheffzek K, Wittinghofer A: Confirmation of the arginine-finger hypothesis for the GAP-stimulated GTP-hydrolysis reaction of Ras. Nat Struct Biol. 1997, 4: 686-689. 10.1038/nsb0997-686.
Ramos PC, Dohmen RJ: PACemakers of proteasome core particle assembly. Structure. 2008, 16: 1296-1304. 10.1016/j.str.2008.07.001.
Oakley AJ, Yamada T, Liu D, Coggan M, Clark AG, Board PG: The identification and structural characterization of C7orf24 as gamma-glutamyl cyclotransferase. An essential enzyme in the gamma-glutamyl cycle. J Biol Chem. 2008, 283: 22031-22042. 10.1074/jbc.M803623200.
Iyer LM, Burroughs AM, Aravind L: The prokaryotic antecedents of the ubiquitin-signaling system and the early evolution of ubiquitin-like beta-grasp domains. Genome Biol. 2006, 7: R60-10.1186/gb-2006-7-7-r60.
Rudolph MJ, Wuebbens MM, Rajagopalan KV, Schindelin H: Crystal structure of molybdopterin synthase and its evolutionary relationship to ubiquitin activation. Nat Struct Biol. 2001, 8: 42-46. 10.1038/87531.
Xi J, Ge Y, Kinsland C, McLafferty FW, Begley TP: Biosynthesis of the thiazole moiety of thiamin in Escherichia coli: identification of an acyldisulfide-linked protein – protein conjugate that is functionally analogous to the ubiquitin/E1 complex. Proc Natl Acad Sci USA. 2001, 98: 8513-8518. 10.1073/pnas.141226698.
Suto K, Shimizu Y, Watanabe K, Ueda T, Fukai S, Nureki O, Tomita K: Crystal structures of leucyl/phenylalanyl-tRNA-protein transferase and its complex with an aminoacyl-tRNA analog. Embo J. 2006, 25: 5942-5950. 10.1038/sj.emboj.7601433.
BLASTCLUST program. [ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html]
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Soding J, Biegert A, Lupas AN: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005, 33: W244-248. 10.1093/nar/gki408.
Lassmann T, Sonnhammer EL: Kalign – an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 2005, 6: 298-10.1186/1471-2105-6-298.
Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM: MUSTANG: a multiple structural alignment algorithm. Proteins. 2006, 64: 559-574. 10.1002/prot.20921.
Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ: JPred: a consensus secondary structure prediction server. Bioinformatics. 1998, 14: 892-893. 10.1093/bioinformatics/14.10.892.
Work by LMI and LA is supported by the intramural funds of the National Library of Medicine at the National Institutes of Health, USA.
The authors declare that they have no competing interests.
LMI and LA were involved in the discovery process and writing the paper. AMB was involved in initiating interest in the project and preparing the alignments. All authors read and approved the final manuscript.