Detection of novel ASRT homologs
The 3D structure of ASRT (PDB: 2II7) shows that the entire protein is comprised of a single globular domain with an eight-stranded β-sandwich fold [5]. All previously reported homologs of ASRT are similarly sized single-domain proteins (completely congruent to the PFAM alignment DUF1362 [6]) from various prokaryotes. To identify potential previously undetected homologs we initiated a series of sequence profile searches with known representatives of ASRT from different bacteria. A PSI-BLAST search with the Thermotoga maritma ASRT ortholog (PDB: 1NC7) against NR database detected segments of putative secreted amidases from certain chloroflexi with e-values of borderline significance (e = 0.013) and then converged. These alignments corresponded to regions that did not overlap with the amidase catalytic domain and did not correspond to any previously described domains. These segments in the amidases were also independently detected in a search with the ASRT alignment as a HMM against the NR database using the recently released HMMER3 program (e = .024). Despite their borderline e-values, the alignment with the ASRT starting point extended throughout the entire globular domain in the latter and showed conservation of structurally critical hydrophobic residues. To test this relationship we performed a search with the above-detected sequence segments from the amidases against a library of HMMs derived from PDB structures as seeds using HHpred, which performs profile-profile comparisons, and recovered the Anabaena ASRT and its Thermotoga orthologs as the best hits with significant e-values (E = 10-3). Likewise, a reciprocal PSI-BLAST search initiated with one of these regions detected in the amidases (residues 373-471 of gi:148657162; Roseiflexus sp. RS-1) identified Anabaena ASRT and its Thermotoga orthologs (e-values = 7*10-3, iteration 2) and in subsequent iterations retrieved all known ASRT homologs. This search also detected multiple repeats of the segment related to ASRT in the Roseiflexus amidase-domain-containing protein. In further transitive searches using representatives of these additional segments as query (e.g. 148270795, 679-789), we retrieved, in addition to the previously identified homologs, sequences from actinobacteria such as Tropheryma whipplei. Reciprocal PSI-BLAST searches with representatives of the segments detected in these actinobacterial sequences (e.g. gi:28493092) retrieved several other actinobacterial sequences together with representatives of sequences found in the initial searches (see Additional file 1 for details). None of these newly found sequences was recovered by the PFAM model DUF1362 or has been previously reported as being related to ASRT. The secondary-structure prediction for the newly detected sequences with significant relationship to ASRT suggested an all-β fold congruent with that determined for ASRT and it T. maritima ortholog. Taken together, these investigations indicated that ASRT and its cognates from other prokaryotes belonged to a larger superfamily of homologous domains, which might occur in more than one copy and in combination with other domains in a polypeptide (see below). Hereinafter, we refer to this domain as the ASRAH (for A nabaena sensory ahodopsin- associated omology) domain.
Classification and structural analysis of ASRAH domains
Based on clustering with BLAST scores and domain architectural features the ASRAH domains can be divided into 3 families with distinctive phyletic patterns (see Additional file 1, Table S1 and Fig. 1 for details). The first of these, defined by the stand-alone, intracellular versions of the domain similar to ASRT, are found in addition to Thermotoga and Anabaena in some actinobacteria, proteobacteria, firmicutes, Chthoniobacter flavus (verrucomicrobia), the extremely thermophilic Dictyoglomus [7] as well as the archaeon Natrialba magadii. The second family is present in secreted proteins, always fused to an amidase domain, and is predominantly found in photosynthetic bacteria such as Roseiflexus, Chloroflexus and Herpetosiphon. The third family, also secreted, is found mainly in actinobacteria and is defined by unique N- and C-terminal conserved extensions flanking the ASRAH domains. A multiple sequence alignment of the ASRAH domain was generated by editing a preliminary alignment produced by the KALIGN program based on PSI-BLAST HSPs and secondary structure predictions (Fig. 2). Examination of the structures reveals that key structural elements of the Anabaena representative have been mutated to obtain its crystal structure. As consequence there are major distortions in this structure and it is unlikely to represent the native condition of the ASRAH domain. Hence, we continued further analysis using the Thermotoga ASRT structure determined as part of the Structural Genomics initiative (1NC7). Superposition of the secondary structure of this protein onto the alignment of the ASRAH superfamily shows that the sequence conservation is comprised predominantly of hydrophobic residues defining the core that stabilizes the β-sandwich. However, there are a few residues that distinguish this fold from other previously characterized β-sandwich folds. These include: 1) a well-conserved tryptophan, usually following a polar residue, present at the start of the first strand. This tryptophan appears to be central to a hydrophobic interaction required to hold the two β-sheets of the sandwich together (Fig. 2). 2) A nearly absolutely conserved asparagine located at the end of the second β-strand. The asparagine forms two hydrogen bonds with the backbone carbonyls of the residues 2 and 4 positions downstream from it. Further, there are no gaps in this part of the alignment suggesting that this conserved asparagine helps in stabilizing the characteristic tight turn between strand 2 and 3 of the structure.
An examination of the structure of the ASRAH domain suggests that the β-sandwich is composed of two internal repeats of four strands each - the first, third and fourth strand of each repeat are in one sheet of the β-sandwich while the second is in the opposite sheet. This topology distinguishes the ASRAH domain from other structurally comparable β-sandwich domains such as those found in the β-galactosidase [8] (shows evidence for independent duplication and lacks the C-terminal strand relative to the ASRAH β-sandwich) and the carbohydrate binding module CMB4 from the thermostable xylanase from Rhodothermus marinus [9] (which has additional N-terminal strands and displays a circular permutation of the last strand relative to the ASRAH β-sandwich; see Fig. 2). The structure shows that the solo ASRAH domain from Thermotoga exists as a tetramer with each unit containing a deep cleft that could potentially form a binding-pocket on one face of the sheet. In two of the monomers in the structure we observed that the pocket contains a 1,2-ethanediol molecule (Fig. 2), but contains considerable room for a much larger molecule such as a pentose or a larger sugar. Interestingly, the binding cleft is lined by two residues from the second strand containing the SHEshChhN signature (where "h" is a hydrophobic and "s" a small residue), which is nearly absolutely conserved in the ASRT-like family. The H and the C from this signature are seen to contact the 1,2-ethanediol in the structure, suggesting that they might be key to binding a ligand in this family. While this strand is well-conserved in the other families, these families possess their own distinctive residues instead of the H and C. This suggests that though they are likely to contain a similar cleft as the ASRT-like family, they might bind distinct ligands. Further, the walls of this cleft are formed in part by the dimer interface (Fig. 2). In line with this, we found that the ASRAH domains in the other families always occur either as dyads (the actinomycete family, Fig. 1) or as tetrads (the amidase associated versions, Fig. 1), suggesting that they might all have the potential to form homo- or hetero- tetrameric structures.
A ligand-binding pocket on one face of the sheet is particularly common in other β-sandwich domains, especially those involved in carbohydrate binding - e.g. the CMB4 of xylanases and the DOMON domain. However the ASRAH domain differs from the former in having its predicted-ligand pocket on the opposite face relative to the CMB4 domains and in being topologically very distinct from the ten-stranded DOMON domains [10].
Domain architectures and gene neighborhoods point to a carbohydrate-binding role for the ASRAH domain
We sought to obtain further evidence for the ligand-binding role of the ASRAH domains based on their domain architectures and predicted operonic associations. The predicted gene neighborhoods of the solo versions of the ASRAH domain show multiple associations with genes encoding various carbohydrate metabolism proteins. One gene neighborhood conserved across many phylogenetically distant bacteria, including Thermotoga, combines a gene for the solo ASRT-like ASRAH proteins with genes encoding a sugar isomerase, a sugar aldolase and a sugar kinase. Another widely distributed bacterial conserved gene neighborhood combines the solo ASRT-like gene with a cluster of genes encoding an ABC transporter system specific for monosaccharide (Fig. 1). ASRT-like genes are also frequently combined in operons with genes encoding other potential sugar metabolism-related enzymes such as amylases (e.g. Sinorhizobium), pyruvate decarboxylase/oxidase, a sugar/sugar acid dehydratase/racemase and a FAD-dependent oxidoreductase related to the glycolate oxidase (Fig. 1). Interestingly, in the actinobacterium Rubrobacter xylanophilus two paralogous copies of ASRT orthologs are each associated with either the isomerase, aldolase and kinase gene neighborhood or the pyruvate decarboxylase/oxidase, dehydratase/racemase neighborhood. Thus, these linkages strongly support the ASRT-like domains regulating distinct sugar metabolism process in cis. The amidase domain found fused to the ASRAH domain is of the N-acetylmuramoyl-L-alanine amidase, which is consistent with a potential role for the ASRAH domain in binding elements of peptidoglycan. The cell-surface versions found mainly in the actinomycetes are predicted to be lipoproteins because their N-terminal region contains a conserved signal peptide sequence followed by an absolutely conserved cysteine reminiscent of the "lipobox" [11]. They additionally contain an internal cysteine, just N-terminal to the ASRAH domain (Fig. 2), which could form the site of a second distinct lipid modification. Their predicted gene-neighborhood context, which is conserved in practically all actinomycetes encoding such an ASRAH domain protein, shows a tight linkage to a transcription factor of the WhiB family, a glycosyltransferase and an inactive version of a zincin-like metallopeptidase. In some organisms the glycosyltransferase and the ASRAH domain containing protein are fused into a single polypeptide (Frankia sp. CcI3) supporting the functional linkage between these proteins. Thus, these cell-surface proteins also display contextual connections that are compatible with them binding sugars. It is likely that they are lipid-modified themselves and form an actinobacteria-specific cell-surface structure by binding a polysaccharide created by the linked glycosyltransferase or recruiting the same enzyme to a precursor polysaccharide.
Thus, the contextual evidence, combined with the structural evidence for ligand-binding, favors the ASRAH domain being a sugar-binding domain, in the manner of many other β-sandwich structures. This is not unexpected for the secreted or cell-surface versions of the ASRAH domain because it is consistent with the function of many other extracellular β-sandwich structures in binding cell-surface polysaccharides, which are parts of the peptidoglycan, glycoprotein and other polymeric carbohydrate layers that decorate prokaryotic cell surfaces. However, the intracellular standalone ASRAH domains are somewhat functionally unexpected because they appear to represent a novel regulatory mechanism - a potential sugar-sensor that might influence the function of other enzymes or sugar transporters encoded by the conserved gene-neighborhood via physical interactions with them depending on sugar concentrations. In light of this, we propose that the unusual linkage to the rhodopsin, which is observed only in the cyanobacterium Nostoc, might represent a novel mechanism in which the behavior of a light sensor (ASR) is influenced via interaction with an intracellular sensor of a sugar or a related metabolite (ASRT). It is plausible that this interaction might have a role in phototaxis in response to intracellular nutrient status.
General discussion
Our investigation demonstrates that ASRT belongs to a superfamily of domains predicted to bind small molecules, most likely sugars, in both extracellular and intracellular locations in various bacterial and few archaeal proteins. On one hand the presence of sugar binding capability in a β-sandwich scaffold is hardly unprecedented for domains found in extracellular or cell-surface proteins. Nevertheless, the identification of such a potential sugar-binding element in a novel cell-surface lipoprotein is of interest especially given its presence across actinobacteria, including several pathogens such as Tropheryma, Actinomyces and Thermobifida. On the other hand, the identification of such a function in an intracellular context is of interest especially because the tetramer-forming standalone versions are predicted to regulate a diverse range of sugar metabolism operons or even the light sensory behavior in a cyanobacterium. This observation points to a previously unreported mechanism of regulation by a potential standalone small-molecular sensor that probably occurs at the level of protein-protein interactions rather than via the sensor domain of a one-component system transcription factor. In this respect such a regulatory process is closer to the allosteric regulation by small molecules.
While "sideways" ligand binding via one of the exposed sheets has been observed in several β-sandwich scaffolds (e.g. DOMON domain and other carbohydrate-binding domains [12]), the exact mode of binding is not established for all such folds. The analysis of the ASRAH domain presented here identifies the binding site for one more of these β-sandwich scaffolds. The proposed binding site for this superfamily of domains reinforces an observation that presents an interesting evolutionary conundrum: though several β-sandwich scaffolds bind ligands in a "sideways" fashion, their topologies greatly differ from each other. This leads to the question as to whether there was repeated convergent evolution of the "sideways" binding mode in various β-sandwiches or whether it represents an ancestral binding mode for the β-sandwich scaffolds, which was preserved despite the extensive topological rearrangements occurring as consequence of duplication of internal units or accretion of additional strands. Hence we hope that these observations would also contribute to the more general understanding of evolution of ligand binding in β-sandwiches.
Comments
View archived comments (2)