Polymorphic toxin systems: Comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics

Background Proteinaceous toxins are observed across all levels of inter-organismal and intra-genomic conflicts. These include recently discovered prokaryotic polymorphic toxin systems implicated in intra-specific conflicts. They are characterized by a remarkable diversity of C-terminal toxin domains generated by recombination with standalone toxin-coding cassettes. Prior analysis revealed a striking diversity of nuclease and deaminase domains among the toxin modules. We systematically investigated polymorphic toxin systems using comparative genomics, sequence and structure analysis. Results Polymorphic toxin systems are distributed across all major bacterial lineages and are delivered by at least eight distinct secretory systems. In addition to type-II, these include type-V, VI, VII (ESX), and the poorly characterized “Photorhabdus virulence cassettes (PVC)”, PrsW-dependent and MuF phage-capsid-like systems. We present evidence that trafficking of these toxins is often accompanied by autoproteolytic processing catalyzed by HINT, ZU5, PrsW, caspase-like, papain-like, and a novel metallopeptidase associated with the PVC system. We identified over 150 distinct toxin domains in these systems. These span an extraordinary catalytic spectrum to include 23 distinct clades of peptidases, numerous previously unrecognized versions of nucleases and deaminases, ADP-ribosyltransferases, ADP ribosyl cyclases, RelA/SpoT-like nucleotidyltransferases, glycosyltranferases and other enzymes predicted to modify lipids and carbohydrates, and a pore-forming toxin domain. Several of these toxin domains are shared with host-directed effectors of pathogenic bacteria. Over 90 families of immunity proteins might neutralize anywhere between a single to at least 27 distinct types of toxin domains. In some organisms multiple tandem immunity genes or immunity protein domains are organized into polyimmunity loci or polyimmunity proteins. Gene-neighborhood-analysis of polymorphic toxin systems predicts the presence of novel trafficking-related components, and also the organizational logic that allows toxin diversification through recombination. Domain architecture and protein-length analysis revealed that these toxins might be deployed as secreted factors, through directed injection, or via inter-cellular contact facilitated by filamentous structures formed by RHS/YD, filamentous hemagglutinin and other repeats. Phyletic pattern and life-style analysis indicate that polymorphic toxins and polyimmunity loci participate in cooperative behavior and facultative ‘cheating’ in several ecosystems such as the human oral cavity and soil. Multiple domains from these systems have also been repeatedly transferred to eukaryotes and their viruses, such as the nucleo-cytoplasmic large DNA viruses. Conclusions Along with a comprehensive inventory of toxins and immunity proteins, we present several testable predictions regarding active sites and catalytic mechanisms of toxins, their processing and trafficking and their role in intra-specific and inter-specific interactions between bacteria. These systems provide insights regarding the emergence of key systems at different points in eukaryotic evolution, such as ADP ribosylation, interaction of myosin VI with cargo proteins, mediation of apoptosis, hyphal heteroincompatibility, hedgehog signaling, arthropod toxins, cell-cell interaction molecules like teneurins and different signaling messengers. Reviewers This article was reviewed by AM, FE and IZ.

in intra-specific and inter-specific interactions between bacteria. These systems provide insights regarding the emergence of key systems at different points in eukaryotic evolution, such as ADP ribosylation, interaction of myosin VI with cargo proteins, mediation of apoptosis, hyphal heteroincompatibility, hedgehog signaling, arthropod toxins, cell-cell interaction molecules like teneurins and different signaling messengers. Reviewers: This article was reviewed by AM, FE and IZ.

Background
Production and deployment of "chemical armaments" is one of the most common strategies in inter-organismal conflict. Such molecules, namely toxins or antibiotics, are observed at practically every level of biological organization ranging from multicellular organisms like animals and plants, through bacteria, all the way down to intra-genomic selfish elements [1][2][3][4]. These molecules span an entire biochemical spectrum from diffusible small molecules (e.g. antibiotics) to some of the largest proteins in the biological world (secreted bacterial toxins) [5,6]. Beyond their natural roles, these molecules have considerable significance as biotechnological reagents, biodefense agents, therapeutic targets, and therapeutics against numerous disease-causing agents [1, 2,4,6,7]. Traditional toxicology has now been joined by genomics and sequence analysis in uncovering the enormous biochemical diversity across life forms of such molecules and of the systems that synthesize and traffic them. This diversity is seen both in the structure and action of systems involved in synthesis of diffusible antibiotics and proteinaceous toxins [5,6]. It is becoming increasingly clear that proteinaceous toxins are a common feature of biological conflicts at every organizational level [7]: 1) In antagonistic interactions between different multicellular eukaryotes, such as the castor bean ricin, Aspergillus sarcin and various snake venom proteins [2,3,8,9]. 2) Action by multicellular organisms against their pathogens (e.g. anti-microbial peptide toxins and defensive RNases such as RNaseA and RNase L [10][11][12][13]). 3) Action of pathogenic and symbiotic bacteria directed against their hosts (e.g. the cholera toxin and the shiga toxin [4,14]). 4) Interspecific conflict in bacteria [15]. 5) Conflict between bacterial sibling strains of the same species, namely contact dependent inhibition systems and related secreted toxins [16][17][18][19]. 6) Inter-genomic conflicts between cellular genomes and selfish replicons residing in the same cell (e.g. classical bacteriocins and plasmid addiction toxins [20]). 7) Intra-genomic conflicts between selfish elements and the host genome (restriction-modification systems [21] and genomic toxin-antitoxin systems [22][23][24]).
Studies in the past decade are pointing to certain unifying themes across the proteinaceous toxins deployed in each of these distinct types of biological conflict. The most prominent theme is the use of enzymatic toxins that disrupt the flow of biological information by targeting nucleic acids and proteins [7]. Thus, several toxin domains are nucleases targeting genomic DNA, tRNAs and rRNAs, nucleic acid base glycosylases, nucleic acidmodifying enzymes, peptidases that cleave key protein targets, and protein-modifying enzymes that alter the properties of proteins, such as components of the translation apparatus [4,6,7,17,18,25]. A secondary theme seen across toxins from phylogenetically diverse sources is the presence of domains that disrupt cellular integrity by forming pores in cellular membranes [26,27]. Genomic analysis has also revealed that the richest source of proteinaceous toxins is the bacterial superkingdom, wherein several systems involved in most of the levels of biological conflict enumerated above are encountered [4,6,17,18,21,22,25].
It is also becoming apparent that inter-and intra-specific and inter-and intra-genomic conflicts in prokaryotes have resulted in an intense arms race with respect to proteinaceous toxins. There is evidence for multiple episodes of escalation of the conflict in terms of the evolution of immunity proteins, followed by alterations in the toxins to evade the action of the immunity proteins [15,17,18,24,28]. Another major evolutionary theme seen in secreted proteinaceous toxins is the exploration of several alternative secretory mechanisms for their effective trafficking and delivery to potential targets. In particular, bacteria display at least eight distinct secretory mechanisms over and beyond the ancestral Sec (or Type II) system that is shared with the other branches of life (Table 1). Both the T2SS and alternative secretory mechanisms have been repeatedly coopted for trafficking toxins [15,17,18,29,30]. In addition to the T2SS, examples of other widely utilized secretory pathways that have been frequently coopted for trafficking of toxins include three distinct systems dependent on ATPase pumps: 1) ABC ATPase-dependent Type I system, which has been adapted for the delivery of the large RTX toxins [31]; 2) the FtsK-like ATPase-dependent type VII (ESX) system of Gram-positive bacteria, which has been recruited for delivering several toxins, including those frequently deployed in intraspecific conflict [17,32,33]; 3) the plasmid conjugation apparatus-derived type IV system [34], which is also dependent on FtsK-related ATPases [33]. On the other hand some of the other alternative The toxin is predicted to be packaged into the phage head as in phage transduction systems 1: Note only fused to toxins exported by the SEC-dependent pathway in Amoebophilus asiaticus; 2. Note only fused to toxins exported by the SEC-dependent pathway in Microscilla marina; 3: Note only fused to toxins exported by the SEC-dependent pathway in Acetivibrio cellulolyticus; 4: Note only fused to toxins exported by the SEC-dependent pathway in Caldicellulosiruptor species; 5: Note in firmicutes, the export pathway is only present in Veillonella and Selenomonas species, also referred to as the Negativicutes species; 6: Certain bacterial lineages within the β,E,γ-proteobacteria, planctomycetes, verrucomicrobia, cyanobacteria and bacteroidetes have solo WXG domains that have a distinct YueA-like ATPase with 3 HerA/FtsK domains of which only the middle one is active. These appear to be mobile versions of T7SS.
secretory mechanisms appear to be primarily utilized in trafficking toxins rather than any other function: 1) The type III system based on the flagellar basal body-like apparatus [35]; 2) the two-partner or Type V system which resembles the porins [36,37]; 2) the type VI [38,39]; 3) Photorhabdus virulence cassette (PVC)-type secretory system [40,41]. Both T6SS and the PVC-SS utilize caudate bacteriophage tail-derived proteins as an "injection syringe" and distinct AAA + ATPases to recycle the injection apparatus in an ATP-dependent manner after a single use [39]; 4) TcdB/TcaC-like export pathway [42]; 4) the PrsW-like peptidase-dependent system export system [43]. Depending on the secretory pathway, toxins might either be directly injected into target cells (e.g. T6SS delivered toxins) or diffuse into the surrounding medium (e.g. certain T2SS or T7SS toxins) or be anchored on the surface of producing cells to be delivered upon contact with the target cell (e.g. T5SS and certain T2SS, T6SS and T7SS delivered toxins). Additionally, these prokaryotic toxins might also display further adaptations that allow their processing subsequent to their secretionthese include the presence of "pretoxin domains" that might be sites for proteolytic processing or in-built peptidase domains that cleave off the toxin domain to facilitate its delivery into the target cell [17,20] (Table 1). The selective pressures related to the above-described adaptations for trafficking, processing and delivery appear to have been instrumental in shaping the domain architectures of plasmid-encoded bacteriocins and prokaryotic toxins deployed in inter-and intra-specific conflicts [17,20]. Consequently, most toxin proteins have N-  (Table 1). Proteins are denoted by their gene name, species abbreviations and GI (Genbank Index) numbers separated by underscores. (C) General gene-neighborhoods template for polymorphic toxin operons. Individual genes are represented as arrows pointing from the 5′ to the 3′-end of the coding frame. Genes are labeled by their domain architectures. The gene neighborhood is labeled by the gene name, species abbreviation and GI number of the SUKH gene marked with an asterisk. Toxins are colored pink, immunity proteins orange, and other trafficking related proteins grey. For species abbreviations refer to supplementary material.
terminal domains involved in secretion and/or cell surface anchorage, central domains involved in adhesion or presentation to target cells and C-terminal domains that bear the actual toxin activity (Figure 1, Table 1). These might be occasionally combined with further processing-peptidase or pre-toxin domains [17,18,20]. These stereotypic architectural features strongly distinguish such toxins from those involved in intra-genomic conflicts, such as those from classical toxin-antitoxin systems and restriction-modification systems, even though certain domains with toxin activity might be common across these different systems [17,22,28]. Hence, domain architectural analysis considerably aids in the detection of new toxins involved in inter-organismal conflicts and the delineation of specific domains associated with each of the above-listed trafficking related roles. This has led The alignment of MCF1-SHE domain is shown with predicted catalytic residues marked with blue asterisks. For all alignments in this study, proteins are denoted by their gene name, species abbreviations and GI (Genbank Index) numbers separated by underscores. Secondary structure assignments are shown above the alignment, where the blue arrow represents the β-strand and the red cylinder the α-helix. Poorly conserved inserts are excluded in the alignment and replaced by the length of the inserts. Columns in the alignment are colored based on their amino acid conservation at consensus shown below the alignment. The coloring scheme and consensus abbreviations are as follows: h, hydrophobic (ACFILMVWY), l, aliphatic (LIV) and a, aromatic (FWY) residues shaded yellow; b, big residues (LIYERFQKMW), shaded gray; s, small residues (AGSVCDN) and u, tiny residues (GAS), shaded green; p, polar residues (STEDKRNQHC) shaded blue; and c, charged residues (DEHKR) shaded magenta. Absolutely conserved residues are shaded red.
to an exciting discovery in the past two years, namely the identification and characterization of an extremely widespread system of secreted toxins, primarily involved in intra-specific conflict between related strains of prokaryotes [16][17][18][19]. These toxin systems are found in practically all major bacterial lineages and also a small number of archaea. Toxin proteins of these systems are as a rule multi-domain and display a bewildering diversity in terms of domains possessing toxin activity [17,18]. An important feature of these proteins is the tendency to vary their toxin domains through a process of recombination that might replace an existing toxin domain by a distinct one encoded by standalone cassettes, while retaining the rest of the protein's architecture (i.e. parts related to trafficking and delivery) intact. As a consequence these toxins might be termed polymorphic toxins and encompass the so called contact dependent inhibition (CDI) systems that were recently described in proteobacteria [17,44,45]. Further, these systems typically possess a chromosomally linked immunity protein that helps in protecting cells against their own toxin. These systems might also display several more chromosomally linked or distantly located immunity proteins that could serve as a potential line of defense against toxins delivered by "non-self" strains. The presence of immunity proteins is a key feature that distinguishes the polymorphic toxins from conventional toxins whose primary targets are in distantly related organisms (hence, no "self" immunity is required). Thus, these polymorphic secreted toxins could play a central role in "self versus non-self" or kin recognition in bacteria and thereby have an important role in regulating intraspecific altruistic and cooperative behavior [17,18].
Our studies on the toxin domains of these polymorphic toxin systems have uncovered a remarkable array of nucleases and deaminases that are likely to target different cellular nucleic acids [17,18]. Our preliminary investigations also uncovered some other toxin domains in these systems with alternative modes of action, such as protein AMP/UMPylating enzymes, ADP-ribosyltransferases and peptidases. Interestingly, we observed that several of the toxin and processing peptidase domains from polymorphic secreted toxins are also present as toxin domains of conventional toxins deployed in inter-specific conflict, such as against eukaryotic hosts by pathogenic or symbiotic bacteria [46][47][48][49][50][51][52][53][54]. In a similar vein, we observed that both the polymorphic toxins deployed in intra-specific conflicts and toxins used in inter-specific conflict often rely on similar secretory mechanisms, such as the T5SS, T6SS and T7SS [17,18]. These observations suggested that both types of secreted toxins have been "constructed" in course of evolution from a common pool of domains and consequently possess similarities in their domain architectures. We also observed that several domains seen in secreted prokaryotic toxins and their immunity proteins have been transferred to eukaryotes and their viruses, and have contributed to the provenance of major regulatory molecules in the development of multicellular animals, RNA-editing, DNA-mutagenesis and virus-host interactions [17,18]. Thus, the evolutionary and functional significance of domains found in prokaryotic toxin systems extends beyond the mechanisms and dynamics of intraorganismal conflict.
Our previous studies on the polymorphic toxins focused on identifying and characterizing the diversity of toxin domains that operate on nucleic acids, in particular nucleases and deaminases, and characterizing some of the most prevalent immunity proteins, such as those with the SUKH and SuFu domains. We also reported a preliminary characterization of the major secretory systems involved in toxin trafficking and processing peptidases. Here, we build on our previous studies to systematically characterize novel domains in polymorphic toxin systems, with a particular focus on those involved in toxin activity, immunity and maturation of toxins. Consequently, we report herein a greatly expanded repertoire of toxin domains and immunity proteins directed against them. Thus, we also considerably extend their structural and mechanistic diversity to include a diverse array of peptidases, ADP ribosyltransferases, glycosyltransferases, kinases, membrane perforators and domains with several other activities. Even in terms of toxin acting on nucleic acids we report numerous previously unrecognized nucleases and deaminases. This expanded repertoire of toxin domains also helps to better understand the commonalities between the polymorphic toxin systems and the classical secreted toxins deployed against distantly related organisms. This comprehensive characterization also provides a handle to investigate the ecological significance of such secreted toxin systems in prokaryotes. Our analysis also uncovered novel features regarding the secretory systems that traffic these toxins. The detailed analysis of these toxin systems and their immunity proteins further pointed to several additional examples of domains from them being acquired by eukaryotes and their viruses. Thereby we greatly widen the contributions of components of these systems to the evolution of several eukaryotic regulatory systems. We present a comprehensive inventory of intra-specific polymorphic toxin systems and related components from toxin systems deployed in inter-specific conflicts. This database is likely to serve as an useful reference for future studies on this enormously significant group of proteins.

Results and discussion
Search strategy to identify new toxins and immunity proteins In order to identify novel polymorphic toxins we adopted a strategy of matching diagnostic domain-architecture and gene-neighborhood templates, similar to what we had done earlier to identify novel type II toxin-antitoxin systems [22]. In the case of polymorphic toxins the domain architecture template is defined by the presence of multi-domain proteins, wherein the C-terminal-most domain has toxin activity, while the N-terminal-most domains are associated with trafficking (Table 1, Figure 1). The central domains might be involved in adhesion, presentation or processing. One of the most common features of this central region is the presence of RHS (Recombination hot spot)/YD or filamentous hemagglutinin (FilH) repeats which form extended fibrous or filamentous structures that help in displaying the C-terminal toxin domain on the cell-surface [17,18,37,45,55,56]. With the above domain-architecture template (Figure 1), we identified an initial set of exemplars, which were used in sequence similarity searches to identify homologs that were similar over most of their length but differing in their C-terminal-most domainsa hallmark of polymorphic toxins ( Figure 1B). This enabled us to precisely define the boundaries of the C-terminal toxin domains and use them as seeds in iterative sequence profile searches with the PSI-BLAST and JACKHMMER programs. These searches allowed us to recover both standalone toxin domain cassettes and examples where they are combined with other types of N-terminal trafficking, presentation and processing domains, distinct from those found in the starting queries. This process was used transitively to detect further toxin domains and full length toxins. As a result, we were able to not only capture other polymorphic toxins but also identify cases where these toxin domains might be used as the active domains of other secreted toxins that are deployed against more distantly related organisms (e.g. T3SS or T4SS delivered host-directed toxins). To further understand the sequence and structure affinities of toxin domains, we also used their multiple alignments in profile-profile comparisons with the HHpred program to recover distant homologs and determine their protein fold. Additionally, detailed domainarchitecture analysis of the associated domains in the case of the full length toxins allowed us to delineate the domains involved in the other processes mentioned above.
In terms of gene-neighborhood templates (Figure 1), we exploited the fact that the polymorphic toxin genes are accompanied by several solo toxin cassettes and genes for immunity proteins and in some cases genes encoding trafficking components (e.g. T6SS or PVC-SS). Hence, we systematically extracted the genomic neighborhoods for all detected toxin-encoding genes from complete genome sequences or assembled CONTIGs and subjected them to gene-neighborhood analysis. Matches to the above template allowed us to distinguish the classical polymorphic toxins from related toxin systems that are deployed against more distantly related organisms. A combination of the gene-neighborhood analysis with the domain architecture analysis also allowed us to determine the trafficking mechanisms of full-length toxins in the majority of cases. Further, this genomic analysis also led to the recovery of potential immunity proteins associated with the polymorphic toxins. The identification of novel immunity proteins utilized the fact that the immunity protein gene/s are invariably adjacent to the toxin gene in an operon and typically encode a small single domain protein (Figure 1). We confirmed novel immunity proteins by initiating sequence searches with them and using the newly detected homologs in gene-neighborhood analysis to check if they showed any co-occurrence with toxin genes. The geneneighborhood analysis of the newly identified immunity proteins also helped recover any loci that might have been missed in the initial toxin-centric analysis and also pointed to certain novel types of loci comprised primarily of multiple immunity genes (See below).
As a result of the above searches, we were able to assemble a comprehensive inventory of toxins and immunity proteins, which we provide as a resource accompanying this article (Table 2, 3 and Additional File 1). For the sake of systematic nomenclature we adopted the following convention: 1) The toxin domains are labeled 'Tox' followed by the name of the superfamily they belong to. Thus, a toxin domain of the restriction endonuclease (REase) superfamily would be labeled Tox-REase.
2) The domain might be further distinguished by a numeral if there are multiple distinct toxin families within a given superfamily, e.g. Tox-REase-1, Tox-REase-2 and so on. 3) In the case of certain highly divergent families, each with their own structurally distinct features, such as those belonging to the HNH/EndoVII nuclease fold, each family of toxin domains might receive a separate label, e.g., Tox-HNH, Tox-AHH, Tox-LHH or Tox-NucA that identifies the specific family of nucleases. 4) Novel toxins that could not be unified with any previously known superfamily are labeled as 'Ntox' followed by a number, e.g. Ntox1, Ntox2 etc. (we identified a total of 50 such novel, monophyletic toxin groups in this study). 5) The immunity proteins were similarly named according to their superfamily. Thus, immunity proteins of the SUKH, SuFu and LRR superfamilies are respectively labeled as Imm-SUKH, Imm-SUFU or Imm-LRR. 6) Novel immunity proteins that could not be unified with any known superfamily were labeled as Imm followed by a number, e.g. Imm1, Imm2 etc. (we detected 73 such immunity proteins in this work).
In the initial section we present the results of the above analysis from a domain-centric viewpoint by laying out the main conserved domains we identified in toxins (Table 2), immunity proteins (Table 3) and                 Table 3 Phyletic distribution and associated toxins of Immunity proteins associated with polymorphic toxin systems (Continued)
2. Each toxin in column3 that is present in a gene neighborhood along with the corresponding immunity protein in column 1 in the toxin-immunity gene order is marked by a superscript letter, so as to identify the phyletic pattern of this association in column 4.
some novel features associated with trafficking (Table 1). In course of discussing the conserved domain families, we describe key features relating to their domain architectures and gene-neighborhoods, and present the relevant functional inferences derived from them. In the following sections we explore the general features of the domain architecture and gene-neighborhood networks, phyletic distribution, relationships between various proteinaceous toxin systems, ecological implications and the evolutionary connections between components of these toxin systems and eukaryotic and viral functional systems.

Peptidase domains in polymorphic toxins and related proteins
Peptidase domains from these systems can be functionally categorized into 1) those that are involved primarily in processing toxin proteins; 2) those that function both in processing and as toxins; 3) those that function mainly as toxins. Autoproteolytic processing by diverse peptidases has been long recognized in classical secreted toxins deployed by pathogenic bacteria against their hosts [49,51,54]. For example, the Vibrio cholera RTXA peptide ligase toxin, clostridial glucosyltransferase toxins and certain Yersinia toxins are autoproteolytically processed by intrinsic caspase-like thiol peptidase domains, which are induced by small molecules such as GTP and inositol hexakisphosphate in the host cytoplasm [49,52,57]. Similarly, we presented evidence that the HINT autopeptidase domains are likely to be an important player in the autoproteolytic release of several polymorphic toxins (Figure 2A) [17]. In toxins of several pathogens, peptidase domains have also been characterized as bearing the actual toxin activity. Examples include the Yersinia pestis YopT papain-like peptidase domain that triggers actin depolymerization in host cells by cleaving the C-termini of Rho GTPases [50] and the Bacillus anthracis lethal factor that disrupts signaling cascades by cleaving the N-termini of several MAPK kinase [48]. However, to date peptidase domains have not been systematically characterized in classical polymorphic toxin systems. In polymorphic toxins, peptidases acting in either of the above three functional categories can be distinguished mainly based on their location within the polypeptide. Those involved in autoproteolytic processing are mostly located either at the Nterminus or prior to the C-terminal toxin domain in the multi-domain toxin proteins ( Figure 1). The toxin versions invariably occur at the C-termini. Those which might occur at both of these locations can be inferred as functioning as either toxins or processing proteins depending on their position in the polypeptide. In addition to these categories, there are inactive peptidase domains that might serve as peptide-binding modules involved in anchorage and interactions of toxins. We discuss below the previously unrecognized peptidase domains that we identified in polymorphic toxin systems and also discuss their connections to related peptidase domains in other toxin systems (Table 2).
Domains identified as being primarily auto-processing peptidases ZU5 superfamily domains functions as processing autopeptidase in toxins The ZU5 (Zona pellucida 5) domain was first identified as an autoproteolytic domain in the PIDD protein which forms the core of the PIDDosome, a protein complex in animals providing a platform for recognizing molecular patterns that are associated with loss of genomic integrity and genotoxic stress [58]. It is a major player in p53-induced apoptosis and activation of NF-κB pathway in response to DNA damage and its assembly involves multiple autoproteolytic cleavages mediated by its two ZU5 domains [59]. Our structural comparisons with the DALIlite program and sequence profile searches revealed that the ZU5 domain is homologous to the GPS domain involved in autoproteolytic cleavage of the polycystin-1 and certain G-protein-couple receptors [60], and the autoproteolytic domain of the nuclear pore Nup96/98 proteins [61]. All these domains are characterized by the presence of a C-terminal CxH motif which forms their thiol autopeptidase active site (Additional File 1). Accordingly, we include all these domains in the ZU5 superfamily. Our iterative sequence searches identified ZU5 domains in several potential polymorphic toxins: They are typically located at the N-terminus of large proteins with central RHS repeats ( Figure 2B). In polymorphic toxins, the ZU5 domain is most frequently associated with the SpvB and β-propeller domains suggesting that it might be functionally coupled to the TcdB/TcaC-like export pathway [42,62]. Its N-terminal location is notably different from the previously observed HINT autopeptidase domains of polymorphic toxins which are instead found at the C-terminus close to the toxin domain [17] ( Figure 2B). This suggests that the autoproteolytic activity of the two peptidases have distinct functionsthe ZU5 autopeptidase most likely cleaves the toxin at the base of the filamentous structure in order to release it at the cell surface during its extrusion by the TcdB/TcaC system. In contrast, the Cterminally located HINT autopeptidase is likely to be critical for the release of just the toxin domain, probably upon contact with the target cell. In the classical polymorphic toxins ZU5 autopeptidases are found in association with a diverse array of nuclease and peptidase toxin domains ( Figure 2B). Related ZU5 domains are also found in several other large bacterial cell surface proteins, which additionally contain diverse adhesion modules and other enzymatic domains, such as glycohydrolases, lipases and phosphodiesterases (Additional File 1). Thus, ZU5 autoproteolytic processing might be a more general feature among bacterial surface proteins that are deployed for the degradation or remodeling of extracellular biopolymers and matrices.
PrsW peptidase family defines a novel secretion pathway to release C-terminal toxin domains The PrsW family of membrane-embedded peptidases is prototyped by the enzyme catalyzing site-1 cleavage of anti-σ W factor RsiW in Bacillus subtilis [43]. Most representatives bear eight transmembrane helices and four conserved motifs ( Figure 2C), which show distant relationship to several other peptidase families like CPBP and APH-1 [63]. Given that the active site of the PrsW is located within the membrane-spanning helices ( Figure 2C), it is likely that they also form a transmembrane conduit for the simultaneous extrusion and processing of the toxin. We first recognized the PrsW domain as being a potential processing peptidase in polymorphic toxins on account of its N-terminal fusion with a novel deaminase toxin domain of the DYW clade (gi: 320532150) [18]. Further analysis revealed that Nterminal PrsW domains are associated with a diverse array of toxin domains, including several distinct versions of the restriction endonuclease superfamily ( Figure 2C), mainly in Gram-positive bacteria. These toxin domains are typically connected by a short linker to the core membrane-spanning PrsW domain. However, in certain cases the toxin domain might be connected via a long filamentous structure formed by RHS repeats to the N-terminal PrsW domain (e.g. in a Streptomyces violaceus protein with a novel toxin domain (Ntox9; gi: 307326465). Thus, the PrsW domain might be used to autoproteolytically process polymorphic toxins both of the soluble secreted type (one with short linkers) and of the filamentous contact dependent type (with RHS repeats). In archaea (e.g. Pyrococcus horikoshi PH0065) and fungi (e.g. Aspergillus fumigatus; gi: 146324562), the PrsW peptidase domains are respectively fused at their N-termini to another PrsW-like peptidase (DUF2324 in PFAM), or a ceratoplatanin domain that is found in secreted phytotoxic virulence factors of fungal pathogens [64]. It is conceivable that in these examples the PrsW domain has been recruited for the processing of potential N-terminal toxins that are used against more distantly related organisms or plant hosts.
In several bacteria the PrsW domain is fused to intracellular signaling domains such as the PilZ domain which recognizes cyclic diguanylate, cyclic nucleotide binding domains, phosphopeptide-binding FHA domains and Zn-ribbon domains [65] (Additional file 1). These versions can be clearly distinguished both in terms of their sequence relationships and domain architectures from those associated with toxin domains. These are more likely to function as signaling peptidases that cleave proteins in conjunction with signals sensed by the associated domains.
Peptidase domains that function both in auto-processing and as toxins Caspase-like peptidases As noted above, peptidases of the caspase-like superfamily [66] (also known as "clan CD" [67]) were originally identified as processing peptidases of diverse hostdirected toxins (e.g. RTX toxins) of pathogenic bacteria [49,57]. Likewise, some of these domains were identified in certain large bacterial surface proteins where they might function as autoproteolytic processing domains [52]. Other secreted bacterial members of this fold, such as the clostripains have been implicated in proteolytic processing of surface proteins, whereas the gingipains act as virulence factors that cleave host proteins [47]. In this study we obtained evidence based on domain architectures and gene neighborhoods that the caspase-like peptidase domains occur both as potential processing peptidases (typically internal domains) and as toxin domains (the C-terminal-most domain) in polymorphic toxins from bacterial lineages such as bacteroidetes, gammaproteobacteria and actinobacteria ( Figure 2D). Architectural analysis clearly shows that the caspase domain toxins might be delivered via the T7SS, PVC-SS, TcdB/TcaC-like export pathway, in addition to the T2SS ( Figure 2D). Versions of the caspase-like domain that are likely to function as processing peptidases of polymorphic toxins usually occur just upstream of a distinct C-terminal toxin domain, in a position similar to the HINT autopeptidase domains in other polymorphic toxins (Figure 2A), suggesting that they might similarly aid in the autoproteolytic release of the toxin domain. Architectural analysis suggests that the caspase-like peptidase might be nearly as prevalent as the HINT peptidase in proteolytic processing of polymorphic toxins (Additional File 1). Certain other toxin proteins have an array of repeats of the caspase-like domain upstream of the C-terminal toxin domain (e.g. a protein from Streptomyces flavogriseus with ADP-ribosyltransferase and MCF peptidase toxin domains; gi: 357410654; see below) ( Figure 2D), suggesting that their processing might involve multiple autoproteolytic events to release multiple cleavage products. Some of the caspase domain repeats in these proteins lack the catalytic residues and might merely play a structural or peptide-binding role.

Papain-like peptidases
Papain-like peptidase domains, which constitute the most diverse and widespread superfamily of thiol peptidases, have been previously recorded as the toxin domains of both exotoxins and those delivered into the host cells by various pathogenic bacteria. Examples of the former include the Streptococcus pyogenes exotoxin SpeB, while those of the latter include the Pseudomonas syringae AvrPphB toxin, which cleaves the plant serine/ threonine kinase PBS1, and the Pasturella multocida toxin PMT [68][69][70]. We found evidence for domains belonging to multiple distinct clades of the papain-like superfamily in polymorphic toxin polypeptides. The first of these, the Tox-PL1 (Tox-papain-like-1) family was recovered as a previously unknown conserved domain in several predicted polymorphic toxins, usually secreted by way of the T7SS (i.e. with N-terminal WxG domains) and TcdB/TcaC-like system (N-terminal SpvB domain) in actinobacteria, and bacteroidetes. Examination of its multiple alignment revealed a conserved NC-H-DxQ signature ( Figure 3A), which is reminiscent of the conservation pattern seen in papain-like peptidases [53,71,72]. This relationship was confirmed via profileprofile comparisons with the HHpred program that significantly recovered papain-like peptidases (p = 10 -5 ; 95% probability). In a subset of the predicted polymorphic toxins Tox-PL1 is the only catalytic domain, and occurs at the extreme C-terminus of the toxin polypeptide, suggesting that it is the toxin domain ( Figure 3C). In other cases it occurs in internal positions in polypeptides bearing a diverse set of toxin domains [18], or in the middle of an array of filament-forming RHS repeats ( Figure 3C). In these cases it is likely to function as an autoprocessing peptidase that releases associated toxin domains comparable to the HINT and caspase-like peptidases [17]. In Shewanella we observed a protein combining a SopD domain [73] with a C-terminal Tox-PL1 domain, which is encoded by a gene embedded within a T3SS operon. Given that Shewanella is known to suppress the growth of competing distantly related bacteria and infect eukaryotic hosts [74], it is possible that this protein might be used as a toxin delivered by the T3SS in such conflicts. In diverse bacteria we observed a distinctive architecture of Tox-PL1, wherein it is fused to the MuF domain ( Figure 3C), which we had previously characterized as a DNA-packaging protein of bacteriophages utilizing the portal-terminal system [75]. Geneneighborhood analysis indicated that these are encoded by prophage remnants that also include the terminase, portal protein and capsid protein genes ( Figure 3D). Additionally, several of these neighborhoods might encode proteins with previously noted bona fide toxin domains that operate on nucleic acids (e.g. the HNH nuclease; Figure 3) [17,18]. Hence, we propose that these gene neighborhoods represent a novel phage-derived secretory mechanism, distinct from the previously identified T6SS and PVC-SS that utilizes a capsid packaginglike mechanism. It is conceivable that in these systems the toxins encoded by associated genes are loaded into a capsid-like structure that is then delivered to target cells. Here, the Tox-PL1 domain might be involved in processing proteins either during the assembly of the secretory structure or the release of toxins into target cells.
The second major family of papain-like peptidases with potential processing as well as toxin functions are those belonging to the OTU family [53,76] ( Figure 3E). These enzymes have been studied mainly in eukaryotes, where they function as deubiquitinating enzymes (DUBs) [77]. We found evidence for a diverse set of OTU peptidase domains in potential polymorphic toxins delivered by the T7SS (with N-terminal WXG domains) in actinobacteria and via T2SS in the Acanthamoeba endosymbiont Odyssella thessalonicensis [78]. In these bacterial lineages they occupy positions suggestive of both processing and toxin functions ( Figure 3E). Additionally, we found related OTU-like peptidases in large proteins resembling polymorphic toxins in several endo-symbiotic/parasitic bacteria of animals and amoebozoans, such as Amoebophilus, Waddlia and Wolbachia. However, in these organisms their gene-neighborhoods suggest that they are unlikely to be polymorphic toxins used in intraspecific conflicts; rather, they are likely to be used against their host. In several cases, the OTU-like domains of these intracellular bacteria occur at the extreme Cterminus of large proteins with several domains, including repeats forming extended structures such as the Sel1, ankyrin and TPR repeats ( Figure 3E). This suggests that they might be deployed similar to the classical polymorphic toxin, but within the host cell. In other proteins from the same group of bacteria they might occur as internal domains accompanied by several other potential toxin domains ( Figure 3E), such as GIMAP GTPase, lipase, latroxin-C and Tox-MCF1-SHE (see below). The preponderance of these OTU-like peptidase domains in intracellular bacteria suggests that they might function as toxins that suppress the Ub-dependent anti-pathogen mechanisms of their eukaryotic hosts due to DUB activity [79,80]. Indeed, a comparable role was originally proposed for the OTU-like peptidases in chlamydiae [53,76]. However, their presence in free-living bacteria (e.g. diverse actinobacteria) indicates that a subset of these OTU-like peptidase proteins might function as either as processing-peptidases that autoproteolytically process polypeptides or as conventional toxin domains that cleave proteins in rival cells.

PVC secretory system-type metallopeptidase domains
The "Photorhabdus virulence cassette" or PVC-SS was originally identified as a prophage-derived secretory system in Serratia entomophila, where it delivers toxins that confer a strong anti-feeding activity against the infected grass grub beetle larvae [41] and in Photorhabdus, where it extrudes toxins that destroy insect hemocytes by inducing actin condensation [40]. This system is typified by several caudate phage-derived gene products, such as the tail sheath protein and gp19 (these two form the tail tubule), gp25 (forms the baseplate), and a distinct clade of AAA + ATPases that are related to CDC48 [81]. Thus, the PVC-SS parallels the T6SS in being derived from the tails of prophages, but differs from it in terms of the associated AAA + ATPase, which in the case of T6SS is a member of the ClpB clade of AAA + ATPases (ClpV) [39,81,82]. Hence, these two systems represent independent prophage-based innovations that have recruited distinct sets of AAA + ATPases to facilitate recycling of the injection apparatus after it has been deployed. We observed in our recent studies that several toxin domains closely related to those found in polymorphic toxins are secreted via the PVC-SS across most major bacterial lineages and certain euryarchaea ( Figure 4). Our preliminary analysis of these toxin proteins secreted via the PVC-SS revealed that they contained a conserved metallopeptidase domain that occurred N-terminal to the toxin domain [17,18]. A more detailed analysis in course of this study indicated that this metallopeptidase domain is a pervasive feature of the PVC-SS and provides an excellent marker to identify novel toxins secreted via this system. Accordingly, we term it the PVC-metallopeptidase ( Figure 4). This domain is characterized by a highly conserved HExxHxxQ-E signature and profile-profile comparisons using HHpred recovered several zincin-like metallopeptidases as the best hits (e.g. PDB: 2vqx, 1u4g, 3cqb; p < 10 -5 ; >90% probability). A multiple alignment based on these hits suggests that the PVC-metallopeptidase adopts a similar structure with three beta-strands and three alpha helices, with the conserved histidines on the second helix and glutamate on the third helix forming the Zn-dependent active site [83] ( Figure 4A, B).
Our analysis of the domain architectures of PVCmetallopeptidase proteins affirmed their general resemblance to the classical polymorphic toxins: the strongly conserved metallopeptidase domain occupied the Nterminal region, followed in each protein by highly variable C-termini, each of which usually corresponded to a different family of toxin domains. Thus, they appear to have evolved through a recombination process comparable to that of the polymorphic toxins, which combined a "constant" N-terminal peptidase with variable C-terminal toxin domains ( Figure 4C). This positional polarity of the PVC-metallopeptidase domains with respect to the associated toxin domains resembles that of the HINT, PrsW, caspase-like and papain-like peptidases, indicating that they are likely to act as autoproteolytic domains that release the toxin after or during its export by the PVC-SS [17,18]. The C-terminal toxin domains associated with the PVC metallopeptidases span an extraordinary diversity and include numerous, structurally unrelated nucleases, nucleic acid deaminases, peptidases, pore-forming domains and several other enzymatic domains ( Figure 4C). There are multiple toxins with the PVC architecture in several bacteria and archaea (e.g. Halogeometricum borinquense; Additional File 1), with a high diversity of C-terminal toxin domains similar to those found in conventional polymorphic toxins. Our analysis also showed that the PVC toxins are not limited to pathogenic or symbiotic bacteria but are abundant in several free-living bacteria (e.g. the cyanobacterium Microcoleus chthonoplastes and Nitrosococcus oceani) and archaea (e.g. Halogeometricum borinquense). This suggests that the PVC-SS toxins are not exclusively used against host but might also be used in inter-bacterial conflicts, just like the T6SS [15,30,39]. However, a notable proportion of the PVC-SS dependent systems, unlike conventional polymorphic toxin systems, lack adjacent genes encoding immunity proteins ( Figure 4D). This might imply the activity of PVC toxins is primarily directed against distantly related organisms.
In addition to the above cases, we observed instances where a second PVC-metallopeptidase domain occurred at the extreme C-termini of proteins in a position comparable to the toxin domain ( Figure 4C). Consistent with this, domain architecture and gene-neighborhood analysis showed that the PVC-metallopeptidase indeed also occurs as a toxin domain of certain polymorphic toxins, preceded by an array of RHS repeats (e.g. a protein from the verrucomicrobium Pedosphaera parvula; gi 223934413; Figure 4C). Similarly, the PVC-metallopeptidase domain might occur as a C-terminal domain fused to a T6SS phage base-plate/tail polypeptide (e.g. Burkholderia sp.; gi: 78060725) ( Figure 4C). These examples suggest that in addition to its predominant role in autoproteolytically processing PVC toxins, this metallopeptidase might take on the role of a peptidase toxin in several cases.
The MCF1-SHE domain: A possible novel serine peptidase shared by polymorphic toxins and secreted effectors?
We initially identified this domain as a conserved region shared by certain predicted polymorphic toxins (e.g. Caci_8529 from the actinobacterium Catenulispora acidiphila) and PVC-SS toxins (e.g. Hoch_1384 Haliangium ochraceum). Iterative sequence profile searches with the PSI-BLAST program recovered homologous regions in proteins from a diverse group of bacteria and the mimivirus (L389, gi: 311977774) prior to convergence. These proteins include the MCF1 (makes caterpillars floppy) [84] and FitD entomotoxins, respectively from Photorhabdus luminescens and Pseudomonas fluorescens [85][86][87], and the phytotoxin of Pseudomonas syringae HopT1-1 which is secreted via the T3SS [88,89]. A multiple alignment of this domain revealed that its core comprises of two kinked helices, predicted to form a hairpin ( Figure 2E). The predicted kinks in the two helices are respectively associated with a conserved serine and a HxxxE motif and are likely to face each other. Accordingly, we named this domain the MCF1-SHE domain for the first characterized protein that bears it and the conserved triad of residues. While this domain does not resemble any previously known domain, the above catalytic triad suggests that it could potentially function as a novel serine peptidase. In several cases its occurrence at the extreme C-termini of polymorphic toxin proteins points to a potential toxin function for the MCF1-SHE domain ( Figure 2E). Consistent with this, it is also found in several secreted proteins of both extracellular pathogens such as Edwardsiella and Xenorhabdus, and intracellular bacterial and viral pathogens such as Legionella, Coxiella burnetii and Yersinia pseudotuberculosis and the mimivirus ( Figure 2E). In particular it appears to have expanded in legionellae, where up to four distinct MCF1-SHE toxin paralogs might be present per organism. This phyletic pattern suggests that MCF1-SHE proteins might be both toxins in intra-specific conflict and also important effectors that have dispersed through lateral transfer across phylogenetically diverse pathogens. Certain domain architectures of the MCF1-SHE domain are consistent with the predicted peptidase role, although in a different capacity. It often occurs just upstream of several toxin domains, such as the ADP ribosyltransferase domains related to those found in the Pseudomonas syringae HopU1 phytotoxin ( Figure 2E). In these cases, it could function as a potential processing peptidase that releases the C-terminal toxin. Similarly, in actinobacteria, it is embedded in gigantic proteins (>10,000 amino acids in length) with other peptidase domains such as the anthrax-lethal factor metallopeptidase, caspase-like and OTU domains (e.g. gis: 345002682, 326780819).
Other peptidases that function predominantly as toxin domains of polymorphic toxin proteins Besides the above discussed domains, we uncovered several other peptidase domains that are clearly predicted to function as toxin domains rather than as processing peptidases on the basis of their domain architectures ( Table 2). In addition to classical polymorphic toxin systems and PVC-SS delivered toxins, these peptidase toxin domains are also found in several host-directed effectors of pathogenic bacteria. However, it should be noted that outside of these toxin systems, related peptidase domains might perform other unrelated functions.

Papain-like peptidases
Several of the peptidases predicted to function as the toxin domains of classical polymorphic and PVC-SS delivered toxins belong to a number of distinct clades from the papain-like superfamily (Figure 2, 4): 1) The NlpC/P60 cladepeptidases of this clade were first recognized as enzymes that cleaved peptide bonds in peptidoglycan and are nearly universally distributed across bacteria and also found in several bacteriophages [71]. We recovered such peptidase toxins in proteins such as Hoch_2166 from the myxobacterium Haliangium (gi: 262195395, Figure 4C); by analogy to other members of the NlpC/P60 clade they are predicted to function by degrading cell-walls of target cells. 2) The Tox-transglutaminase domain (Tox-TGase) -In addition to toxins from free-living bacteria, this transglutaminase domain is also found in toxins delivered by different secretory systems of parasitic bacteria, where they appear to be directed against the host cells. In particular, it is the toxin domain of T3SS effectors directed against plants, such as AvrPphE Pseudomonas syringae (gi: 30231092) and related effectors of Ralstonia, Xanthomonas and Acidovorax, in RTX toxins directed against animal hosts (e.g. Vibrio caribbenthicus RtxA; gi: 312885249) and in a novel secreted effector of Legionella pneumophila (lpg2408; gi: 52842617). These enzymes might either catalyze a conventional thiol peptidase reaction or act as transglutaminases that mediate crosslinking of proteins via a transglutaminase reaction [53]. Alternatively, they could catalyze polyamination of target glutamine, as has been observed in the case of the Bordatella pertussis transglutaminase that modifies the mammalian RhoA GTPase [90].
3) The Tox-PL-C39 domainthese peptidase domains are related to the C39/ ComA-like peptidase domains that cleave the leaderpeptides of certain proteins secreted by ABC transporters such as the bacteriocins ( Figure 4C) [91,92]. 4) Papain-like peptidases Tox-PL2 and Tox-PL3these are novel peptidase domains that we identified in this study and the former is prototyped by the toxin domain of a polymorphic toxin from Sorangium cellulosum (gi: 162456110, Figure 2A) and the latter by a polymorphic toxin from Prevotella sp. (gi: 260911294, Figure 2B). Thus far, such peptidase domains are not found outside of polymorphic toxin systems and are typified by a C-H-D catalytic triad. 5) We also detected a toxin domain with a papain-like peptidase belonging to the classical ubiquitin C-terminal hydrolase (UBCH/UBHYD) clade associated with the PVC-SS in the plant pathogenic bacterium Burkholderia gladioli (gi: 330820326, Figure 4C). Similar UBCH domains are also found in potential toxins secreted by a variety of other bacterial endosymbionts of amoebae such as Simkania negevensis, Waddlia chondrophila, Amoebophilus asiaticus and Protochlamydia amoebophila and giant nucleocytoplasmic DNA viruses that infect them (Additional File 1). These predicted toxins display no associated immunity proteins suggesting that like the OTU domains of pathogens and endosymbionts, they are likely to function as DUBs that deubiquitinate eukaryotic target proteins [79].

Metallopeptidases
Beyond the toxin versions (as opposed to autoproteolytic processing versions) of the PVC-metallopeptidase domain described above, we recovered several other distinct clades of the Zincin-like metallopeptidase superfamily that are predicted to function solely as toxin domains in classical polymorphic and PVC-SS toxin proteins. These include: 1) The anthrax lethal factor-like metallopeptidase (ALF-MPTase) domains [48] that are found primarily among PVC-SS delivered toxins (e.g. Hoch_1736 from Haliangium; gi: 262194969, Figure4C).

2)
The HopH1-like metallopeptidase domain ( Figure 5A)-this domain is also found in several plant- directed T3SS-delivered effectors, such as Pseudomonas syringae HopH1 (gi: 28867816), and the animal-directed T3SS effectors such as Citrobacter rodentium and enteropathogenic and enterohemorrhagic Escherichia coli NleD that blocks apoptosis of mammalian cells [93,94]. 3) We also identified five smaller families of previously unknown zincin-like metallopeptidases (Tox-MPTase1-5) that are exclusively found in polymorphic toxins from phylogenetically diverse of bacteria ( Figure 5A). In general terms they are similar in size and distantly related to the Wss1-like desumoylating metallopeptidase of eukaryotes [95]. All of these are typically associated with N-terminal RHS repeats and at least in the case of a polymorphic toxin with a Tox-MPTase4 domain from E. coli, it might be delivered via the T6SS.

Other miscellaneous peptidases
Beyond these, we also recovered domains in PVC-SS and polymorphic toxins belonging to the L,D-peptidase, pyroglutamyl-peptidase [96] and YabG peptidase families [97]. Of these, the L,D peptidase domain is a distinct thiol peptidase domain with a β-barrel catalytic domain that is unrelated to the papain-like peptidases ( Figure 5B) [98,99]. It has been shown that the classical cell-wall associated LD-peptidase domain catalyzes a transpeptidase reaction that cleaves the peptide bond between L-Lys3-D-Ala4 in peptidoglycan while concomitantly forming a crosslinking peptide bond between the COOH group of L-Lys3 and the NH2 group of the D-isoasparagine linked to the E-NH2 group of Lys3 from an adjacent chain [98]. Cell-wall associated L,D-peptidases are found in most major lineages of bacteria and are likely to play a role in the remodeling of peptidoglycan especially in face of antibiotics that inhibit cross-linking. Polymorphic toxins with L,Dpeptidase domain are distinguished from the typical cell-wall associated L,D peptidases by their distinct architecture with RHS repeats and genomic organization with linked immunity proteins. It is likely that the toxin L,Dpeptidases act by hydrolyzing L-Lys3 crosslinks with Damino acids, thereby compromising the integrity of the cell-wall.
The bacteriophage APSE of the endosymbiont Hamiltonella defensa, which protects aphids and other sapfeeding insects against parasitoid wasps, encodes several distinct toxins [100,101]. We noted that one of these (APSE305; gi: 211731800) displays an architecture similar to the conventional polymorphic toxins with a potential novel C-terminal toxin domain ( Figure 5C). Analysis of this domain revealed that it is widely distributed in several other proteobacteria and is characterized by three motifs respectively bearing a [SGxH] signature, a conserved D or N and an absolutely conserved C (Additional File 1). Secondary structure prediction revealed that this domain is characterized by an α/β fold that is likely to be similar to the Rossmannoid three-layered sandwich adopted by the caspases and the flavodoxinlike fold. The absolutely conserved H, D/N and C are predicted to lie at the ends of the three successive strands of this structure and are likely to comprise the catalytic triad of the peptidase active site. Accordingly we named this domain Tox-HDC and predict that it might function as a thiol peptidase or a transglutaminase. Proteins bearing this predicted toxin domain are particularly common in both intracellular (e.g. Coxiella burnetii) and extracellular (e.g. Xenorhabdus nematophila and Photorhabdus luminescens) pathogens and typically lack associated genes coding for immunity proteins. Thus, these toxins appear to be primarily directed against distantly related targets such as eukaryotes.
In conclusion, at least 23 distinct clades of peptidases belonging to several structurally unrelated superfamilies have been recruited as toxins, and are often shared between polymorphic toxins and host-directed effectors from diverse plant and animal pathogens. This suggests that several of these peptidase domains have evolved considerable substrate flexibility in targeting both eukaryotic and bacterial proteins.

Inactive transglutaminase domains in polymorphic toxins
In course of the current study we observed that several polymorphic toxin proteins with several distinct types of C-terminal toxin domains displayed a N-terminal transglutaminase domain ( Figure 5D). However, closer examination of the multiple alignment of these transglutaminase domains revealed that one or more of the conserved residues (a C, H, and D), which constitute the catalytic triad of their papain-like peptidase active site, were lost [53] ( Figure 5D). This suggests that they lack peptidase activity. Domain architectural analysis showed that these inactive transglutaminase domains are always located immediately after a N-terminal signal peptide or TM helix and are followed by an array of RHS repeats that constitute the filamentous part of the toxin. Occasionally, they might be adjacent to domains of the immunoglobulin superfamily (the so called "bacterial Ig" type domains; Figure 5D). This position suggests that, unlike the abovedescribed active peptidase domains, these inactive transglutaminases have no role in toxin or processing activity. Instead, they might simply serve in anchoring the toxin on the cell surface by binding peptides.

Identification of further toxin domains in polymorphic toxins and related proteins that operate on nucleic acids
In our earlier study we had shown that majority of toxin domains in polymorphic toxin systems operate on nucleic acidsnucleases and base deaminases [17,18]. In this study we were able to further extend the diversity of toxin domains that act on nucleic acids via the discovery of additional nucleases and deaminases that were not previously recognized (Figures 6,7,8,9). We observed that the divalent cation-dependent nucleases among polymorphic toxins are frequently drawn from ancient nuclease folds, namely the HNH/EndoVII, REase and URI endonuclease folds [102][103][104][105][106][107]. Additionally, we present evidence below that representatives of few other potential cation-dependent enzymatic domains might function as nuclease domains in polymorphic toxins. Interestingly, the PIN domains, which are major divalent cation-dependent nucleases in the toxin-antitoxin systems [22,108], do not appear to be utilized in the polymorphic toxins and related systems. Toxin nucleases that utilize divalent cations can catalyze the direct hydrolysis of the phosphodiester bond and as a result attack both DNA and RNA. However, the metal-independent nucleases can only act as RNases as their endonucleolytic action involves the formation of a cyclic 2'-3' phosphate that does not require metal-dependent direction of a hydrolytic attack [107]. Such RNases belong to many distinct folds, several of which appear to have emerged only in course of the diversification of toxin domains of polymorphic toxins, bacteriocins and classical toxin-antitoxin systems [17,22,28,107,109,110]. While we were able to unify several of the metal-independent RNases, which were previously considered to be unrelated, into a single monophyletic assemblage, there are still several distinct toxin domains that likely to represent novel metalindependent RNases (see below; novel toxins). This structural diversity of metal-independent RNases and the repeated emergence of several such nuclease domains among different toxin systems suggest that there are some fundamental constraints in the evolutionary innovation of nuclease domains. It appears that the independent emergence of multiple residues for metalchelation and acid-base catalysis to constitute an active site that can support hydrolytic cleavage of nucleic acids is a far less likely event than the emergence of a metalindependent active site that utilizes the innate reactivity of RNA to facilitate an internal attack with the formation of 2'-3' cyclic phosphates. We briefly describe below the newly recovered toxin domains that act on nucleic acids.

Novel toxins with the HNH/EndoVII nuclease domain
In our earlier studies we found nuclease toxin domains belonging to eight distinct clades of the HNH/EndoVII fold among the polymorphic toxin systems [17,18]. Of these, nucleases belonging to the classical HNH and NucA clades widely occur beyond the polymorphic toxins across diverse sub-cellular systems, such as, DNA repair/ recombination, restriction-modification (R-M) and environmental nucleic acid degradation systems [103,106,111]. In contrast, the GH-E, DHNNK, WHH, LHH and AHH domains appear to have arisen in and remained largely restricted to polymorphic toxin systems. The NGO1392 clade appears to have arisen in the bacterial polymorphic toxin systems, but was transferred to eukaryotes where it might have assumed a role in DNA repair [17]. In this study we recovered six more clades of HNH domain nucleases that appear to have primarily diversified among bacterial polymorphic and related PVC-SS-associated toxins. Keeping with the earlier nomenclatural system, we named five of these novel clades on the basis of the conserved motifs that characterized them as the SHH, HHH, GHH, GHH-2 and EHHH clades of HNH domains ( Figure 6). The sixth of these is related to the version of the HNH domains found in the restriction enzyme SphI [112] and the animal CIDE (CAD/DFF40) protein involved in nucleolytic DNA fragmentation during apoptosis [113], and is termed HNH-CIDE (Table 2). Architectural analysis indicated that the novel HNH clades occur both as potential diffusible toxins (mainly in Gram-positive bacteria) and as contact-dependent toxins borne at the tip of long filamentous structures (proteobacteria, bacteroidetes, planctomycetes and certain Gram-positive bacteria; Figure 6). Representatives of the SHH clade have been transferred to crustacean (e.g. Daphnia; gi: 321474287) and tailed bacteriophages (e.g. Bacillus phage SPbeta; gi: 9630134). The former transfer is consistent with occurrence of an effector with a SHH nuclease domain in the eukaryotic endosymbiont, Simkania (gi: 338732338).
The CIDE protein was previously known only from metazoans with no known representatives from other eukaryotes; hence, its origin remained mysterious [114]. The identification of the HNH-CIDE toxin domains suggests that this nuclease domain first arose in context of bacterial conflicts and was laterally transferred to animals early in their evolution. In animals, its innate cytotoxic action appears to have been channelized as an effector of apoptosis. Our searches also showed that the C-terminal domain of teneurin and Odd Oz proteins from the animal lineage (metazoans + choanoflagellates) contain an inactive version of a HNH domain belonging to the GHH clade ( Figure 6E). While presence of RHS repeats in these proteins related to those in bacterial RHS proteins has been previously recognized [115], the relationship of their C-terminal domain to a specific bacterial toxin domain has not been hitherto reported. Teneurin/Odd Oz proteins function as developmental regulators with a potential role in cell-surface adhesion in diverse processes such as cell migration, neuronal path finding and fasciculation, gonad development, and basement membrane integrity [115][116][117]. The region of these proteins spanning the inactive GHH nuclease domain has been described as being cleaved off and amidated at the C-terminus in vertebrates to give rise to a peptide with possible neuromodulatory activity [118]. This region in tenurin-2 is also the ligand for latrophilin-1, which is also the receptor for another molecule, latrotoxin, whose origins also lie among the bacterial toxins (see below) [116]. Hence, it is conceivable that the RHS portion of these proteins participates in cellular adhesion, while the cleaved off inactive GHH domain act as a diffusible signal. It would be of interest to investigate if this inactive GHH domain might bind nucleic acids upon being taken up by target cells. Our detection of the GHH domain in the Teneurin/Odd Oz proteins establishes that they have emerged from the single transfer of a specific type of a complete bacterial polymorphic toxin gene followed by its fusion to EGF repeats of animal provenance ( Figure 6E).

Novel restriction endonuclease fold domains in polymorphic toxins
In our earlier study we had identified toxin domains in polymorphic toxins belonging to a previously uncharacterized clade of the REase fold (REase-1) [17]. Further analysis revealed that there are nine additional, previously unknown clades of the REase fold that are present exclusively as toxin domains of a diverse group of polymorphic toxins (Figure 7; numbered serially REase-2-REase-10). Their domain architectures and gene-neighborhoods indicate that they are secreted by means of the T2SS, T5SS, T7SS, TcdB/TcaC and the PrsW-type peptidase-dependent system in different bacterial lineages. Of these, at least four distinct versions, namely REase-2, REase-3, REase-5 and REase-6 are coupled with a PrsW peptidase, suggesting that a notable diversification of these nucleases appears to have happened in the context of these systems (Figure 7). Many of the REase toxins secreted via the other systems have central RHS repeats (e.g. REase-9; Figure 7). These architectures suggest that REases might function both as diffusible and contact-dependent toxins. Tox-REase-8 is primarily found in the arthropod endosymbiont Wolbachia and the Acanthamoeba endosymbiont Amoebophilus and is usually associated with arrays of ankyrin repeats ( Figure 7G). These lack associated genes for immunity proteins and are likely to be deployed against targets in the host cellsthis represents the first instance of a REase domain effector being used by endosymbionts of eukaryotes. Representatives of Tox-REase-8 are found in the genomes of arthropods, such as the crustacean Daphnia, several mosquitoes, ants and beetles, and the placozoan Trichoplax. This suggests that Tox-REase-8 has been repeatedly transferred to diverse animals from their Wolbachia-like endosymbionts. Beyond conventional polymorphic toxin systems, REase-9 is also found in a Parachlamydia effector (PUV_01770, gi: 338174171) that might target nucleic acids in its host Acanthamoeba. All ten clades of REase toxins have an active site that closely conforms to the classical REase active site with a D-[EQ]XK signature in the core strands that constitute the metal-chelating site [103]. The majority of characterized members of this fold act on DNA targets; hence, it is conceivable that these toxins also attack the genome of the target cells through endonucleolytic cleavage.

URI domain nuclease toxins
The URI domain was first identified as a conserved metal-dependent endonuclease domain catalyzing the cleavage of the 3′ side of a damaged DNA base during nucleotide excision repair by UvrC, and mediating sitespecific insertion of certain introns [102,119]. Similar nuclease domains have also been found in certain REases, such as R. Eco29kI, and the transposase module of Penelope-like non-LTR retroelements [104]. In this work we identified, for the first time, URI domain nucleases in polymorphic toxins that are present in bacteria from most major bacterial lineages ( Figure 8A, Table 2) that are usually secreted via T2SS, T5SS, TcdB/ TcaC and T6SS. The Tox-URI domains can be divided into two major clades, with the second clade being particularly divergent (Additional File 1). A version of the Tox-URI domain belonging to the first clade has also been transferred to fungi, where it occurs as an intracellular domain fused to an ABC ATPase transporter (e.g. Neurospora crassa NCU06946; gi: 164424641; Additional File 1). Given this architecture, it is conceivable that they function in degradation of nucleic acids taken up by these fungi. Interestingly, certain URI domain toxins belonging to the second clade are present in distantly related intracellular symbionts/pathogens of Acanthamoeba, such as the Simkania negevensis (gi: 338731950), Odyssella (gi: 344925485) and Rickettsia belli (gi: 91206213). Analysis of the gene-neighborhoods of these toxins suggests that they have adjacent genes encoding immunity proteins (Additional File 1), suggesting that these toxins are likely to be used in intra-conflict rather than being directed against the host. Along with the above-described Otu peptidase toxins from Odyssella, these URI domain toxins represent relatively rare examples of polymorphic toxins deployed in intraspecific conflict by endo-symbiotic/parasitic bacteria. Other than the versions from intracellular bacteria, the URI domain toxins are typically associated with filamentous RHS repeats.
All the above metal-dependent nuclease domains are shared by polymorphic toxin systems with R-M systems, but are apparently absent among classical toxinantitoxin systems [22,28]. However, the versions found in the polymorphic toxins differ from those in classical R-M systems in lacking a complex array of associated DNA-binding domains [120]. Hence, we suspect that the versions of these nuclease domains deployed by the polymorphic toxin systems might have lower target sequence specificity than those deployed in R-M systems. Further, those from the former systems are under selection imposed by the physical interactions with cognate immunity proteins. It appears that these factors might eminently disallow exchange of nuclease domains between polymorphic toxin and R-M systems.

The competence nuclease (ComI) domain
This nuclease domain is prototyped by the secreted 17 kDa competence nuclease ComI of Bacillus subtilis, which is a major determinant of DNA uptake when the bacterium becomes capable of transformation prior to stationary phase [121]. We recovered related nucleases as toxin domains of polymorphic toxins from actinobacteria (e.g. gi: 296130766 from Cellulomonas flavigena) and proteobacteria (e.g. gi: 326318161 from Acidovorax avenae; Figure 8B). This domain could not be unified with any previously known fold observed among nucleases. A multiple alignment of this domain showed that it contained a central dyad of two acidic residues (usually a DE motif ) followed by a third conserved acidic residue a few positions downstream (Additional File 1). These residues could potentially form a divalent cationchelating site, suggesting that the ComI nuclease is likely to be the fourth metal-dependent nuclease superfamily among the toxin domains. Interestingly, the B.subtilis competence nuclease is physically associated with the 18 kDa product of the adjacent ComJ gene, which acts as its inhibitorthe interplay between the ComI nuclease and its inhibitor ComJ has been suggested to be important for optimal digestion of incoming DNA, so as to facilitate transformation [121]. The structure of this operon with a nuclease followed by its inhibitor is reminiscent of the polymorphic toxin systems with the toxin gene followed by the immunity protein. Consistent with this, ComJ homologs occurs as an immunity protein for polymorphic toxins with the ComI nuclease domain in several proteobacteria. Hence, it is possible that these key components of the Bacillus DNA uptake system have evolved from a toxin-immunity gene pair.

ParB domain toxins
We recovered several polymorphic toxins with Nterminal filamentous regions formed by RHS or filamentous haemagglutinin repeats and C-terminal ParB toxin domains ( Figure 8C). The ParB domain is the subject of much confusion: based on a study, which claimed to demonstrate both endo-and exo-DNase activity in the ParB protein [122], required for maintenance of the plasmid RK2, the domain was labeled as a nuclease domain. However, it should be noted that this study was based on entirely erroneous assumptions that the RK2 ParB domain was related to nucleases such as the staphylococcal nuclease and RuvC [122]. In contrast, other members of the ParB superfamily, such as sulfiredoxin, have been convincingly demonstrated to possess metal-dependent phosphotransferase activity that utilizes ATP to form a phosphoryl ester of sulfinate generated from the active site cysteine of the peroxiredoxins [123]. Through sequence profile searches we were able to demonstrate that DndB is a member of the ParB superfamily. DndB negatively regulates the formation of the unusual DNA phosphorothioate modification, in which the nonbridging oxygen in the phosphodiester linkage of DNA is replaced by a sulfur atom in a sequence-specific manner [124]. Hence, it appears that even this member of the ParB superfamily, comparable to sulfiredoxin might hydrolyze a phoshoryl ester linked to a sulfur center. The convincingly inferred metal-dependent phosphotransfer activity of the ParB superfamily implies that in principle certain representatives might also be able to catalyze nuclease activity through a comparable hydrolysis of a phosphodiester bond. Hence, it is conceivable that, even though the ParB domain was considered a nuclease for the wrong reasons, this activity might be still valid for some representatives of the superfamily. This is also consonant with the earlier recovery of ParB domains in nucleases encoded by certain R-M like systems [103,125]. The predominance of nuclease domains among the toxin domains of polymorphic toxin systems also supports a potential nuclease function for the ParB toxin domains. Examination of the multiple alignment of the ParB domains from polymorphic toxins suggests that they possess a strongly conserved DGHHR motif that is predicted to form part of their highly conserved metalbinding active site (Additional File 1). In addition to the classical ParB toxin domains, we recovered a second large group of toxin domains typified by that found in Neisseria gonorrhoeae NGK_2271 (gi: 194099761), which could be united using profile-profile comparisons with the ParB domain (HHpred probability 93%; p = 2x10 -6 match to 1vz0 Thermus ParB). While being rather divergent from the classical ParB domains, they display a motif with a conserved arginine that is equivalent to the DGHHR motif in the former. Additionally, they display a conserved N-terminal serine that is absent in the classical ParB domains. Hence, we termed this distinct family of ParB-related domains as Tox-ParBL1 ( Figure 8C). In addition to the bacterial polymorphic toxins, Tox-ParBL1 domains are also found in several eukaryotes such as kinetoplastids, and several metazoans, fungi, plants, stramenopiles and ciliates (Table 2 and Additional File 1). Thus, this example represents an independent acquisition by eukaryotes of a ParB-related domain from the polymorphic toxin systems, distinct from the sulfiredoxins.

The JAB domain
We detected two distinct clades of the JAB domain superfamily as the potential toxin domain of several classical polymorphic toxins ( Figure 8D). The JAB domain has been previously shown to be a peptidase that specifically targets the C-termini of ubiquitin-like proteins (UBLs) either as a DUB or as a processing enzyme [126][127][128]. All previously identified prokaryotic JAB domains are intracellular proteins. Most representatives of them are components of systems utilizing UBLs in biosynthetic pathways or protein modification. As these toxin genes are accompanied by immunity proteins they are likely to be used in intraspecific conflict rather than against eukaryotic targets. Hence, the presence of the JAB domain among the toxin modules of classical polymorphic toxins was unexpected, because most of the bacteria in which they are present lack systems with conjugated or processed ubiquitin-like proteins [126]. However, based on contextual information from domain architectural analysis it was recently proposed that a subset of the JAB domains (i.e. those belonging to the RadC clade) are more likely to function as nucleases that cleave DNA, rather than as peptidases [18]. The two clades of JAB domains found among the polymorphic toxins, like RadC, are rather divergent with respect to those that act on UBLs, and do not conserve the residues lining the tunnel that accommodates the UBL tail in the peptidase versions (Additional File 1). This suggests that, as previously proposed for RadC, the toxin JAB domains might function as nucleases rather than as peptidases. Of the two clades Tox-JAB-1 is found in only in the bacteroidetes lineage associated with N-terminal RHS repeats ( Figure 8D). Tox-JAB-2 is more widely distributed across proteobacteria, bacteroidetes and few firmicutes which partly overlaps with the "domain of unknown function", DUF4329 from the PFAM database ( Figure 8D). Versions of Tox-JAB-2 are also present in several NCLDVs, such as iridoviruses, mimiviruses and algal viruses, and Xanthomonas phages (e.g. phage OP1). These latter versions are secreted proteins and could potentially function as phage-encoded virulence factors.

The Het-C hydrolase domain
The Het-C domain was first identified as a major player in the phenomenon of fungal vegetative incompatibility [129], wherein it mediates programmed cell death upon interaction with incompatible hyphae. Subsequently, a version of the Het-C domain encoded by Pseudomonas syringae was shown to be required for the infection of fungal hyphae by this bacterium, by exploiting the mechanism of hetero-incompatibility [130]. In our analysis we recovered Het-C domains in systems related to the polymorphic toxins that utilize PVC-SS (e.g. gi: 148657895 from Roseiflexus; Figure 4C). Profile-profile comparisons using an alignment of the Het-C domain ( Figure 8E) revealed hits with borderline significance (p = .001; 50% probability) to a group of α-helical hydrolases sharing a common a fold, including zinc-dependent phospholipase C [131] and the S1-P1 nucleases [132]. The predicted secondary structure for the Het-C domain was also compatible with the α-helical fold seen in those hydrolases and examination of the multiple alignments revealed that the two possessed a comparable set of conserved active site residues ( Figure 8E). This includes four conserved histidines and 3 acidic residues (D/E) suggesting that the Het-C domain possess a metal-dependent active site similar to that seen in the phospholipases and S1-P1-like nucleases. Indeed, secreted versions of this domain with both phospholipase and nuclease activity are known from different bacteria [132]. This suggests that the Het-C domain might also possess either metaldependent nuclease or phospholipase activity, and that this activity is likely to be critical for the apoptotic and toxin action of this domain in fungi and bacteria.

Barnase-EndoU-colicin E5/colicin D-RelE like nuclease fold: A large assemblage of metal-independent RNases
In our earlier study we had recovered the EndoU domain as a metal-independent RNase frequently found in polymorphic toxin systems. We had further shown that the EndoU fold is marked by a potential duplication of a core helix-β-sheet element that constitutes its active site [17]. In another earlier study we had unified the colicin E5 and colicin D RNase domains with the RNase domain of the RelE toxin that is found in classical toxinantitoxin systems [133]. A comparison showed that the core structural element in EndoU, Colicin E5, colicin D and RelE is a similar strand-β-sheet unit ( Figure 9A). Transitive structure-comparison searches using the DALIlite program confirmed that these RNase domains are indeed related as they preferentially recovered each other (with Z > 3.5). Further, these DALIlite searches showed that they could be united with several other metal-independent RNase domains, namely the RNase toxins and other secreted RNases from fungi, such as sarcin, RNaseT and RNase U2, and the bacterial RNases prototyped by barnase (Z > 3.5; Figure 9A; this latter group is described as the microbial RNase fold in the SCOP database [134]). We term the common structural unit shared by all the representatives of the aboveunified assemblage the BECR (Barnase-EndoU-Colicin E5/D-RelE) fold. The common structural unit, which constitutes the catalytic domain of the BECR fold RNases contains a N-terminal helical segment that is followed by a sheet formed by 4-stranded meander ( Figure 9A). In several cases the 4 th strand is followed by an additional short 5 th strand that is differentially positioned in various versions of this fold. Furthermore, the location of the active site residues is often comparable across these enzymes and our sequence analysis revealed that many of these RNases (including EndoU, colicin E5/D and some clades of RelE) share a conserved alcoholic residue (S/T) in the 4 th strand that contributes to the active site ( Figure 9A).
In addition to the EndoU clade, our sequence comparisons indicated that several of the newly recovered BECR fold toxin domains from polymorphic toxin systems belong to other previously defined clades in this fold, such as barnase, colicin E5, and colicin D clades ( Figure 9B-F). While the classical RelE endoRNase domain is common in type-II toxin-antitoxin systems, we observed only a single instance of it being used as a toxin domain in the polymorphic toxins (gi: 357015358 from Paenibacillus elgii). However, using secondary structure prediction combined with profileprofile comparisons we also discovered distinct, previously unrecognized clades of RNases displaying the BECR fold ( Figure 9G): these include the clades 1) Ntox7 (e.g. y1701, gi: 22125595 from Yersinia pestis); 2) Ntox19 (NMW_1482, gi: 254673263 in Neisseria meningitidis); 3) Ntox35 (typified by NGMG_00731; gi: 291044920 from Neisseria gonorrhoeae); 4) Ntox36 (typified by the toxin domain of gll0213; gi: 37519782 from Gloeobacter violaceus); 5) Ntox47 (typified by the toxin of rhs2; gi 366079994 from Salmonella enterica); 5) Ntox48 (e.g. gi:251789613 from Dickeya zeae); 6) Ntox49 (gi:59801914 in Neisseria gonorrhoeae; 7) Ntox50 (gi: 254804532 in Neisseria meningitidis). Together with previously characterized clades, these seven novel clades are extensively represented among the toxin domains of classical polymorphic toxins and in some cases related toxins delivered by the PVC-SS (Figures 4 and 9). This observation suggests that the BECR fold has supplied one of the most extensive radiations of RNase toxins, which cuts across mechanistically distinct systemsthe polymorphic and related secreted toxins and the classical toxinantitoxin systems. Examination of the predicted active site residues among the newly characterized clades pointed to each clade acquiring their own unique features. For example, Ntox35 has acquired two conserved N-terminal histidines in addition to the conserved S/T from the Cterminal strand. Ntox50 and Ntox19 instead have a single N-terminal histidine, similar to one observed in several members of the colicin E5/D clade [110], accompanied by a second C-terminal histidine found at the position usually occupied by the conserved S/T of the BECR fold (Additional File 1). The presence of two histidines in the above three clades is reminiscent, though not equivalent in terms of secondary structure context, to those seen in the EndoU clade, suggesting a comparable reaction mechanism in all these versions of the fold. In contrast, Ntox36 lacks any conserved histidine; instead it displays other clade-specific conserved residues; e.g. an asparagine in the N-terminal region. Most of these enzymes, especially those with two conserved histidines are likely to utilize a metalindependent mechanism similar to that observed in RNa-seA (see below) [107]. This is supported by the generation of cleavage products with 2'-3' cyclic phosphate termini in several biochemically characterized members of these RNases (e.g. XendoU). Some members of the EndoU clade have been shown to require Mn 2+ for effective catalysis of RNA cleavage [135]; however, given that they still produce 2'-3' cyclic phosphates, it is likely that this metal is required for stabilization of the hypercharged transition state rather than the actual phosphoesterase activity.
Interestingly, we observed that one RNase of the BECR fold related to the colicin E5/D clade is also found consistently associated with the flagellar operon across firmicutes (e.g. gi: 28211324 from Clostridium tetani; Additional file 1). It would be of interest to investigate if this RNase is delivered by the flagellar system or alternatively functions to regulate flagellar gene expression as a RNA-processing enzyme. RNases of the Ntox50 clade have also been acquired by bacteriophages such as Clostridium phage phiC2 (gi: 134287339) and might be used in conflicts with the host or other phages. Likewise Ntox19 has been acquired by the giant Acanthamoebainfecting mimivirus and is also found in potential effectors secreted by the Acanthamoeba endosymbionts Parachlamydia and Odyssella.

Novel toxin domains which are likely to function as nucleases
Our systematic analysis of the polymorphic toxin systems recovered a total 50 distinct novel toxin domains that could not be unified with any previously known domain ( Table 2; Additional file 1). Only a small minority of these domains contain at least one experimentally characterized member. Their sequence conservation patterns, together with the preponderance of nucleases among polymorphic toxins, suggest that most of these novel toxin domains are likely to be nucleases. Indeed, their conservation patterns suggest that these novel toxin domains include both potential metal-dependent and independent enzymes ( Table 2; Additional file 1). The C-terminal toxin domain of the originally characterized contact-dependent inhibitor protein CdiA from Escherichia coli was demonstrated to possess RNase activity [44]. We observed that the E.coli CdiA-C domain is widely distributed across polymorphic toxins from diverse bacteria. We also uncovered this domain in the Photorhabdus PalA protein, which lacks an associated immunity protein but is encoded in a pathogenicity island adjacent to the Mcf gene whose product is a toxin directed against the caterpillar host [87]. In light of this, it is possible that E.coli-CdiA-C domain in PalA might be directed against the host as an accessory toxin. Examination of the E.coli-CdiA-C domain shows that it possesses an all β fold that lacks any conserved residues typical of metal-dependent nucleases. Hence, it is likely to be a metal-independent RNase and probably defines a novel structural theme among them.
We uncovered an uncharacterized toxin domain that is found in polymorphic toxin systems from a wide range of bacteria and several potential effectors delivered by endo-symbiotic/parasitic bacteria (e.g. Wolbachia, Ehrlichia, Odyssella, Rickettsia and Legionella). It is also found at the C-terminus of a group of eukaryotic proteins typified by the plant protein EDA39 and we accordingly call it the Tox-EDA39C domain (Additional File 1). This domain is characterized by two highly conserved histidines respectively in the N-and C-terminal halves of the proteins that are likely to comprise its active site. This conservation pattern is reminiscent of the catalytic residues seen in the RNase A domain [136], and might represent a novel metal-independent RNase that catalyzes a reaction similar to that of RNase A. The presence of this domain in several eukaryotic lineages, such as plants, fungi, oomycetes and Dictyostelium, suggests that it might have been acquired by eukaryotes from bacterial endosymbionts and could have been recruited as a potential RNase used in anti-pathogen defense. Ntox43 is typified by the toxin domain of the recently described RhsT from Pseudomonas aeruginosa, which has been shown to translocate to the host cytoplasm and mediate an inflammatory response [46]. This toxin, like Tox-EDA39C, has two conserved histidines suggesting that it might also function as a RNase A-like metal-independent nuclease (Additional File 1). Hence, we predict that RhsT is likely to activate the inflammosome via cleavage of specific RNAs. Although proteins with Ntox43 display architectures are similar to classical polymorphic toxins, none of them are associated with adjacent genes for immunity proteins. This suggests that they are likely to be used primarily against eukaryotic hosts. At least four other toxin domains identified by us (Ntox18, Ntox19, Ntox22, Ntox26, Ntox30) are likely to be novel metal-independent endo-RNases that utilize a two histidine-dependent mechanism to catalyze transestrification and formation of a 2'-3' cyclic phosphate like RNase A ( Table 2).
We observed that the RES domain (PFAM: PF08808), whose function was previously unknown, is another toxin domain that is found in polymorphic toxin systems. Interestingly, it is also found in classical toxin-antioxin systems, where it is typically paired with a distinctive antitoxin (previously labeled as a domain of unknown function, DUF2384 in the PFAM database). Hence, we predict that the RES domain is likely to be a novel RNase domain shared by different toxin systems. Examination of the alignment of the RES domain revealed two conserved arginines, a glutamate and a serinethis configuration does not appear likely to support a metal-binding active site; however, these residues are suitable for catalyzing a distinct metal-independent RNase reaction. Ntox24 is characterized by a single conserved histidine, and, like the RES domain, versions of this toxin domain are additionally found in what appear to be novel type-II toxin-antitoxin systems associated with a previously uncharacterized family of antitoxins (e.g. gi: 139439131). The toxin domain from the CdiA protein from Enterobacter cloacae (Ntox21) shows universally conserved residues, including a single histidine and two aspartates, but could not be unified with any other known domain. It is conceivable that Ntox24 and Ntox21 act as metalindependent endoRNases comparable to the Colicin E3 nuclease domain [137], which is also found in polymorphic toxin systems (Tox-ColE3) [17]. Our detection of Tox-ColE3 in these systems also helped in emending the proposed active site of these RNases. Based on structural analysis it was previously proposed that the active site of these enzymes corresponds to D55, H58 and E62 in the structure of colicin E3 (PDB:2xfz) [137]. However, our analysis indicated that H58 is not conserved across all members; instead we found that a second histidine, corresponding to H72 in Colicin E3, is conserved throughout the fold. Thus, it is possible that the above types of RNases use a single histidine in conjunction with an acidic residue that initiates cleavage by inducing the 2'OH to attack the phosphodiester backbone of RNA [137]. In contrast, examination of the multiple alignments of the novel toxins revealed potential metalchelating sites in Ntox29 (conserved histidines and aspartates); hence, it could potentially function as a novel metal-dependent nuclease. For the remaining Ntox domains, while the active site residues could be identified based on conservation, the nature of catalysis remains unclear.

Deaminases
Other than the nuclease domains, deaminases are the most common toxin domains that operate on nucleic acids in polymorphic toxin systems. As we had extensively characterized the toxin deaminases form these systems in our earlier study [18], we do not consider them in detail here. However, in this study we recovered two additional clades of deaminases that were not previously detected ( Figure 10A). The first of these was found in giant proteins with a toxin-like architecture from the alphaproteobacterial endosymbionts of the genus Wolbachia, which reside in the cells of two dipterans, namely Culex (gi: 190571717; WPa_1346) and Drosophila (gi: 42520377, WD0512). These proteins contain two toxins at their C-termini, of which the Latrotoxin-CTD (see below) is the terminal toxin and the deaminase Nterminal to it ( Figure 10). An examination of their gene neighborhoods revealed that they lacked accompanying genes encoding immunity proteins. Hence, it appears that these proteins, while resembling the classical polymorphic toxins, are primarily directed against host nucleic acids. The deaminase domains from these proteins are extremely divergent, but structure prediction based on a multiple alignment with a comprehensive set of deaminase domains showed that they belong to the "Helix-4 division" of the deaminase superfamily in which the 5intervening 4 th helix of the core domain causes strands 4 and 5 to be parallel to each other [18]. Thus, they are united with other deaminases of this division such as TadA/Tad2, ADAR/TAD1 and the AID/APOBEC-like deaminases. However, unlike most members of this division the newly characterized deaminase domains have a CXE signature in their first active site motif, as opposed to usual HXE seen in this division (Additional File 1). These newly detected versions add to the earlier identified deaminases belonging to the Helix-4 division among host-directed toxins of alphaproteobacterial endosymbionts/parasites, such as those from the Wolbachia endosymbiont of the lepidopteran Cadre cautella and from the Orientia and Rickettsia species infecting diverse eukaryotes [18]. This suggests that modification of nucleic acids by these fast-evolving deaminase toxins related to the eukaryotic AID/APOBEC-like proteins might be a widely used strategy by endosymbionts to alter host physiology. In particular, the presence of such highly divergent versions of deaminases in Wolbachia infecting diverse arthropods hints that they could be attractive candidates for mediating failure of paternal chromosome condensation via its mutagenic action [138]. The second novel clade of deaminases are toxin domains of classical polymorphic toxins from proteobacteria and actinobacteria, which might be delivered via diverse secretory mechanisms such the T2SS, T5SS, T6SS, T7SS and the TcdB/TcaC system (prototyped by gi: 162451789, sce3516 from Sorangium cellulosum; Figure 10A and Additional File 1). These deaminases usually have a HAE signature in their first active site motif but belong to the "C-terminal hairpin" division of the deaminase superfamily, which is characterized by a C-terminal βhairpin following the 3 rd -helix of the conserved core. Given their predominance in free-living bacteria, unlike the former deaminases, they are likely to be deployed in intraspecific conflict rather than against eukaryotic hosts.

Other catalytic toxin domains in polymorphic toxin systems
Other than the peptidase and nucleic acid cleaving or modifying toxins we uncovered several other less frequent catalytic domains that function as toxins in polymorphic and related secreted toxin systems ( Table 2). These display a wide range of activities and are likely to elicit their cytotoxic activity by attacking several independent aspects of cellular function. We briefly outline these toxin domains and their possible modes of action.

Domains catalyzing modifications of proteins
The previously characterized DOC domain, which has been observed in several host-directed effectors (e.g. Xanthomonas AvrAC), is found in several polymorphic toxins [22,139,140] (Figure 2D). This is a proteinmodifying toxin domain, which transfers AMP or UMP from nucleotide triphosphates to serines or threonines on target proteins [139,140]. Another toxin domain that we recovered in polymorphic-toxin-related systems utilizing the PVC-SS showed a specific relationship to the serine/threonine kinase domain found in lantibiotic synthetases [141] ( Figure 4C). The "eukaryote-type" kinase domain in the lantibiotic synthetases phosphorylates serine/threonine residues in the lantibiotic precursors to prime them for the generation of the thioether linkages. Lantibiotic synthetase-type kinase domains have been shown to possess generic S/T kinase activity [142], suggesting that the toxin versions might carry out their action by phosphorylation of proteins on S/T residues in target cells. A comparable protein-modifying toxin domain (gi: 291451822, from Streptomyces albus, Figure 4C) is a glycosyltransferase, related to the Clostridium difficile toxin B, which has been shown to glycosylate the hydroxyl group of threonine 37 in the switch I region of the small GTPase RhoA [143]. Given the conservation of the Mg2 + −binding DXD signature, which is critical for catalyzing the transfer of UDPlinked sugars, in versions of this domain found in toxin polypeptides detected in our study, it is likely that it functions in a similar fashion by glycosylating serines or threonines in specific proteins in target cells. In addition to its presence in classical polymorphic toxins with N-terminal RHS repeats and PVC-SS delivered toxins, we observed that related glycosyltransferase domains are also found in effector proteins delivered by various intracellular bacteria. In the endoparasite Legionella pneumophila it is present in a toxin delivered via the T4SS (gi: 307610704) and in the aphid endosymbiont Hamiltonella defensa (gi: 238899322) it might be deployed as a toxin against the parasitoid wasps that attack the host aphids [144]. A distinct proteinmodifying toxin domain is typified by the CNF domain of the uropathogenic E. coli cytotoxic necrotizing factors 1 and 2 and the dermonecrotic toxins of Bordetella. These domains display a 4-layered sandwich fold, with an active site histidine and cysteine, and catalyze the deamidation or transglutamination of a specific active site glutamine in the small GTPases, like RhoA, Rac and CDC42, in the cells of their eukaryotic host [140]. We recovered CNF domains in potential proteobacterial polymorphic toxins ( Figure 10B) with N-terminal filamentous regions (Yersinia sp. yenC1, gi: 109391485) as well as those fused to phage-tail VgrG domains of the T6SS (e.g. 345371919 from E.coli).
We also encountered several distinct clades of ADP ribosyltransferases (ARTs) among the toxin domains of polymorphic and related toxin systems ( Figure 10C) [145]. The ART superfamily can be divided into two major clades depending on the conservation pattern of the three key active site residues associated with the three conserved motifs, respectively from the N-terminus, central region and C-terminus of the domain. These are the R-S-E clade and the H-Y-E clade, named after their respective conserved active site residues [146][147][148]. Proteinmodifying ART domains have been extensively studied in the context of the host-directed toxins of diverse bacteria. Members from the R-S-E clade include the cholera toxin, which modifies a specific arginine in a mammalian Gα subunit, the Bordetella pertussis toxin which modifies cysteine, the Clostridium botulinum C3 toxin that modifies asparagine, and the Photobacterium luminescence toxin which modifies glutamine in target proteins [145,148]. The H-Y-E clade includes the Corynebacterium diphtheria, Vibrio cholix and Pseudomonas aeruginosa exotoxin A toxins, which modify diphthamide in the translation GTPase eEF-2, and the polyADP ribsosyl transferases (PARP/PARTs) [146,149,150]. We found multiple R-S-E clade ART domains in classical polymorphic toxin systems. One type of R-S-E clade ART toxin domains, observed in certain polymorphic toxins (e.g. gi: 221200352 from Burkholderia multivorans), is also seen in the T3SS effectors of Pseudomonas syringae, namely hopO1-1/2/3, a Legionella pneumophila T4SS effector (gi: 307611385), a novel Protochlamydia amoebophila effector (pc1346; gi: 46446980), and Pseudomonas aeruginosa exoT (gi: 347302423). Such ART toxin domains are also found in a remarkable group of giant proteins from actinobacteria (e.g. 345002682; Streptomyces sp.; Figure 10), which combine several toxin domains such as two anthrax lethal factor-like metallopeptidase, two caspase, three ART and one MCF1-SHE domains ( Figure 10). A second distinct type of R-S-E clade ART domains, which is found in similar actinobacterial toxins (e.g., gi: 320008023 from Streptomyces flavogriseus), is closely related to the lepidopteran ARTs, such as pierisin, which ADP-ribosylates the N2 atom of guanine in DNA to induce apoptosis and the insecticidal toxin of Bacillus sphaericus [151]. Interestingly, the close relationship of the lepidopteran pierisin-like ARTs to the bacterial insecticidal toxins suggests that they were probably a late lateral transfer into these insects from a bacterial symbiont or parasite, followed by their reuse as an apoptotic effector. In this study we found novel toxins of the H-Y-E clade from actinobacteria, which are closely related to the eukaryotic PARPs (Tox-ART-PARP), and are associated with the PVC-SS from (e.g. gi: 291451874 from Streptomyces albus). We also identified related toxin domain among the toxins secreted by the intracellular pathogen Legionella drancourtii (e.g. LDG_5757; gi: 374260808). Additionally, we also found three distinct families of toxin ARTs belonging to the H-Y-E clade. The first of these is an extremely divergent version, which is typified by a protein with an architecture similar to a classical polymorphic toxin from Shewanella baltica (gi: 152999126), but without associated immunity proteins and might be directed against eukaryotic hosts. The two other families (Tox-ART-HYD1 and 2 prototyped by gi: 336178949 and gi: 238064042 respectively) are widely distributed in free-living bacteria and are associated with distinct immunity proteins suggesting that they might be mainly deployed in intraspecific conflict like the classical polymorphic toxins. Nevertheless, versions of Tox-ART-HYD2 appear to have been transferred to several eukaryotes such as fungi and choanoflagellates (e.g. gi: 331216471 from Puccinia graminis). The above observations suggest that the use of ARTs to modify proteins, and in some cases DNA, appears to be yet another strategy that is common to effectors deployed in both intrabacterial and bacterio-eukaryotic conflicts.

Lipid-modifying toxin domains
Three distinct lipid-modifying enzymes are represented among the toxin domains of classical polymorphic toxins and related PVC-SS-delivered toxins. Two of these namely the glycerophosphoryldiester phosphodiesterase (GPDase, gi: 218438711 from Cyanothece) and the CDPalcohol phosphatidyltransferase (CAPTase, gi: 317401091 from Neisseria mucosa) domains are found exclusively in PVC-SS toxins ( Figure 4C). In contrast, phospholipase A2 (PLA2) is found in classical polymorphic toxins with filamentous N-terminal regions (e.g. gi: 118578532 from Pelobacter propionicus), which might be secreted via different mechanisms, including the T6SS ( Figure 10D). Of these the GPDase can catalyze the hydrolysis of glycerophospholipid head groups by releasing alcohols linked to glycerol 3-phosphate via a phosphodiester linkage [152]. On the other hand, phospholipase A2 can hydrolyze lipids by releasing of one of the fatty acid tails from glycerol 3-phosphate [153]. Closely related homologs of the Tox-phospholipase A2 domains (Tox-PLA2) are also found in secreted proteins from fungi and oomycetes (Table 2, Additional File 1). More generally, phospholipase A2 domains are also found in animal toxins from reptilian venom and from mammalian immune systems [152], suggesting that the use of this domain as a toxin is a prevalent strategy throughout evolution. Intriguingly, members of the CAPTase superfamily are membraneembedded enzymes catalyzing the reverse reaction (lipid synthesis) using cytidine-diphosphate-linked alcohols as substrates, e.g. phosphatidylserine, phosphatidylcholine, phosphatidylglycerolphosphate, phosphatidylinositol and cardiolipin synthetases [154]. It is conceivable that a novel lipid synthesized by this toxin domain creates discontinuities in lipid bilayers, as has been observed with cardiolipin [155]. Thus, all three of these enzymes could potentially mediate their cytotoxicity by damaging the cell membrane of target cells, either through hydrolysis of lipids or disruption of the bilayer.
A toxin domain was uncovered in several classical polymorphic toxins (e.g. Tmz1t_2699 from Thauera sp.; gi: 237653364) that partly overlapped with a "domain of unknown function" (DUF2235 in the PFAM database). Sequence profile searches with the PSI-BLAST program recovered significant hits to α/β hydrolases (e = 10 -5 -10 -7 ; iteration 3 in a search initiated with the domain from the above Thauera protein). While α/β hydrolase superfamily encompasses hydrolases with several distinct activities, such as lipases, peptidases and thioesterases, profile-profile comparisons with the HHpred program suggested that these α/β hydrolases (Tox-ABhydrolase-1) are closest to lipases (e.g. the recovery of triacylglycerol lipases; PDB: 1tgl). In most cases this α/β hydrolase domain is either found fused to N-terminal phage baseplate modules (e.g. gi: 77461818 from Pseudomonas fluorescens) or encoded by a gene adjacent to a gene coding for such modules ( Figure 10E). This suggests that Tox-ABhydrolase-1 might be a toxin that is mainly delivered via T6SS. These α/β hydrolase domains also appear to have been transferred to fungi prior to the divergence of the ascomycetes and the basidiomycetes and are present in most fungal lineages. We recovered two more distinct, previously uncharacterized α/β hydrolase families that are potential toxin domains that are associated with numerous classical polymorphic toxins (Tox-ABhydrolase-2 and 3, Figure 10E). Profile-profile searches with ABhydrolase-3 recovers the lipases (e.g. pdb: 1lgy; p = 10 -12 ; probability 95%) as the best hit to the exclusion of other ABhydrolases. Hence, it is conceivable that Tox-ABhydrolase-1 and Tox-ABhydrolase-3 are further toxins that might disrupt cell-membranes of target cells via their action on lipids. ABhydrolase-2 is primarily present in proteobacteria and has also been transferred to ascomycete fungi. It is also found in the endosymbiont Parachlamydia amoebophilus independently of an immunity protein and might be deployed against host molecules. However, Tox-ABhydrolase-2 did not show any specific relationship to previously characterized lipases. Given, that the ABhydrolase superfamily includes hydrolases with a very diverse array of activities, it is not clear if Tox-ABhydrolase-2 might also act on lipids or target some other cellular component.

Carbohydrate-related toxin domains
We detected two enzymatic domains, which are predicted to act on carbohydrate substrates, as toxin domains of polymorphic and PVC-SS-delivered toxins. The first of these belongs to a superfamily of glycohydrolases, typified by bacterial proteins, such as FlgJ and the N-acetylmuramoyl-L-alanine amidase (gi: 220928985 from Clostridium cellulolyticum), which cleave the glycopeptide linkages in peptidoglycan or endo-glycosidic linkages in oligosaccharides [156,157]. Hence, it is likely that these toxin domains act by hydrolyzing linkages in the peptidoglycan of the target cells. These might be compared to the recently described amidase toxins from Pseudomonas aeruginosa that are believed to act on peptidoglycan [15]. The second toxin domain in this group is an oxidoreductase with a TIM barrel fold catalytic domain (gi: 158339325 from Acaryochloris marina) [158]. Within this superfamily, the toxin domains are most closely related to the aldo-keto reductases, such as 2,5didehydrogluconate reductase, suggesting that they are likely to act on sugar substrates. However, the exact mode of action of this toxin remains unclearit could either act on carbohydrates in the peptidoglycan or within target cells.

Toxin domains related to nucleotide signaling
The RelA/SpoT-like toxin domain is found in classical polymorphic toxins from Gram-positive bacteria delivered by the ESX/T7SS (e.g. 302865491; Micau_0989 from Micromonospora aurantiaca; Figure 10D). A related toxin domain is also found in the T3SS-delivered effectors directed against plant hosts by several plant pathogens, such as Xanthomonas (e.g. gi: 353464269; the XopAD effector), Ralstonia solanacearum and Pseudomonas syringae. These proteins typically contain two copies of the RelA/SpoT domain. Further, in several bacteria (e.g. gi: 149004362 from Streptococcus pneumoniae and gi: 254362874 from Mannheimia haemolytica) the RelA/SpoT toxin domain is found fused to the MuF domain of prophages and is thereby predicted to be delivered via this distinct phage-derived system. The RelA/ SpoT is a nucleotide-binding domain related to the DNA polymerase β-type nucleotidyltransferase fold [159] that synthesizes the alarmone (p)ppGpp [160]. It has been observed that high levels of (p)ppGpp in nonstarvation conditions rapidly inhibits growth and protein synthesis [160]. Hence, it is conceivable that this toxin acts as an unregulated alarmone synthetase in target cells to shut down their protein synthesis. Its widespread presence in several phylogenetically distant plant pathogens is consistent with the presence of a (p)ppGppdependent signaling pathway in plants, similar to that seen in bacteria [160]. In light of this, it appears likely that the MuF-fused versions found in the animal pathogens such as Streptococcus pneumoniae and Mannheimia haemolytica might be deployed in intra-bacterial conflict similar to the classical polymorphic toxins, rather than against the animal hosts.
Another distinct nucleotide generating enzymatic domain, which we found in several polymorphic toxins from several major bacterial lineages ( Figure 10C), is the ADP-ribosyl cyclase (Tox-ARC) domain. These toxins are coupled to various delivery systems including T5SS, T6SS and T7SS. This domain has previously only been characterized in animals and generates two distinct metabolites, namely cyclic ADP ribose (cADPr) and nicotinic acid adenine dinucleotide phosphate (NAADP), respectively from NAD and NADP [161]. The former two nucleotides have been shown to function as potent inducers of calcium influx via the ryanodine receptors [162]. At the same time by channeling NAD it can also affect protein deacylation by Sirtuins and other processes requiring NAD [163]. Given that polymorphic toxins with Tox-ARC domains occur in free-living bacteria, and are typically coupled with the genes for the immunity protein Imm74, it is likely that they are used in intra-specific conflict rather than against eukaryotes. Their mode of action in the bacterial context is not entirely clearit is possible that they deplete NAD or NADP and interfere with various metabolic processes dependent on them. Alternatively, the cADPr or NAADP generated by them could have toxin consequences for the target cell, for example by interfering with NAD-utilizing process such as RNA metabolism or DNA ligation. The bacterial Tox-ARC domains show considerably more sequence diversity than the eukaryotic counterparts and appear to have been the progenitors of two independent sets of eukaryotic representatives in animals and fungi respectively.

Non-catalytic toxins: Pore-forming and peptidoglycanbinding domains
Several classical polymorphic and PVC-SS delivered toxin proteins display unusual C-terminal predicted toxin domains that do not show any indications of being enzymes. Further analysis of these predicted toxin domains suggested that they are likely to operate via non-catalytic mechanisms. One of these, which is thus far restricted to proteobacteria is the W-TIP domain that was named after a conserved tryptophan and TIP tripeptide motif ( Figure 10F). This small toxin domain is highly hydrophobic in composition and is predicted to form two membrane spanning-helices. The first of these helices bears two absolutely conserved positively charged residues (RxxR signature), while the second bears the W-TIP motif. These features suggest that the W-TIP toxin domain might effect its cytoxicity by forming a transmembrane pore similar to pore-forming toxins from diverse organisms [164,165]. Several PVC-SS delivered toxins also display a single annexin domain ( Figure 4C); however, this domain is unlikely to be a stand-alone toxin domain as it is always followed by a further C-terminal bona fide enzymatic toxin domain (e.g. the anthrax lethal factor-like metallopeptidase and Ntox3 domains; Figure 4C). The eukaryotic annexins typically contain four tandem annexin domains and bind both phospholipids, such as phosphatidylinositol (4,5)bisphosphate (Annexin A2) and phosphatidylserine (Annexin A5), or components of lipid rafts such as cholesterol (Annexin A2) [166]. The eukaryotic annexins also have the unusual capability of apparently traversing cell membranes despite lacking signal peptides. Hence, it is conceivable that the annexin domains in bacterial toxins act as accessory domains that aid in the breaching of target cell membranes to facilitate the delivery of the Cterminal toxin domain.
One of the most enigmatic toxins is Ntox38 ( Figure 10G), which is currently restricted to actinobacteria, and might be found in several paralogous copies per genome (e.g. 7 copies in Actinosynnema mirum and 9 copies in Saccharopolyspora spinosa). This toxin domain is usually linked to a N-terminal WXG domain by a low-complexity glycine-rich linker, suggesting that it is secreted via the T7SS. This is further supported by the frequent presence in their gene neighborhoods of a gene encoding a subtlisin-like serine peptidase associated with processing of proteins secreted via the T7SS [126]. The Ntox38 domain is just 33-43 residues in length and is predicted to adopt a simple three-stranded fold ( Figure 10G). Its size and lack of potential conserved catalytic residues suggest that it is unlikely to be an enzymatic domain. It shows several, conserved hydrophobic residues and an invariant C-terminal PXhhG signature (where h is a hydrophobic residue). It is one of the few toxin domains whose mode of action remains rather elusive, but is likely to involve a physical interaction with a key cellular component rather than catalytic modification. It shows a strong association with a single immunity protein, Imm56.
We uncovered an unusual toxin domain at the Ctermini of giant toxin proteins from arthropod alphaproteobacterial and gammaproteobacterial endosymbionts such as Wolbachia and Rickettsiella grylli ( Figure 10H). Homologous domains are also found at the C-termini of the latrotoxins (latrotoxin-CTD) of the black widow spider (Latrodectus species) [167]. The latrotoxins also display other architectural similarities with the above bacterial toxins in sharing N-terminal ankyrin repeats. Interestingly, the latrotoxins are not secreted in a conventional fashion, but released upon disintegration of the producing cell [167]. Upon release the latrotoxin-CTD is proteolytically cleaved off to form the mature latrotoxin [168]. Given that the latrotoxin-CTD is shared by distantly related bacterial endosymbionts, which colonize a wide range of arthropods, it appears likely that the spider latrotoxins were acquired via lateral transfer from a bacterial endosymbiont. The latrotoxin-CTD is characterized by a conserved, hydrophobic helix; hence, it is possible that it associates with the membrane and might facilitate disintegration of the producing cells in spiders. Bacterial toxins with latrotoxin-CTDs do not display any neighboring immunity protein genes; hence, it is likely that they are primarily used against the eukaryotic hosts. In this regard, it is interesting to note that the salivary gland proteins of mosquitoes have been suggested as being laterally transferred from Wolbachia [169,170]. We found that such proteins are more widely distributed across arthropods (e.g. the crustacean Daphnia pulex), and that they are related to endosymbiont toxin proteins, such as those reported above. However, in place of a C-terminal toxin domain they contain a conserved domain termed the SGS domain (for salivary gland secreted protein), which is not found in any bacterial toxin, but only in arthropods ( Figure 10H, Additional File 1). Thus, it appears that following lateral transfer of a bacterial toxin protein, the toxin domain was displaced by an arthropod-specific domain. Hence, the latrotoxin and SGS proteins could represent different examples of toxins of endosymbiotic bacteria being coopted for arthropod-specific functions.
Several toxins delivered via the PVC-SS displayed a putative toxin domain belonging to the OmpA superfamily of peptidoglycan-binding domains [171][172][173] (e.g. gi: 171059731 from Leptothrix cholodnii; Figure 4C). While several toxin polypeptides contain domains that might facilitate extracellular adhesion, including peptidoglycanbinding domains such a PGB1 and the LysM domains, the OmpA domain, unlike those, always occurred at the extreme C-terminus. This supports the inference that in these cases the OmpA domain might have a toxin function. The OmpA domains have been shown to anchor porins and the T6SS to the peptidoglycan [172][173][174]. Given that OmpA domains can bind peptide precursors for peptidoglycan biosynthesis [172], it is possible that such toxin domains might act by interfering with peptidoglycan synthesis through binding of such peptides.

Lineage-specific expansion of N-terminal domains in toxin proteins: Novel secretion/anchoring mechanisms?
The N-terminal domains of the full length polymorphic toxins are usually good predictors of their trafficking pathways because they contain domains that are specific to a given secretory pathway (Table 1). We found another interesting feature in the N-terminal regions of certain polymorphic toxins and related proteins from endosymbionts/parasites secreted via the T2SS, which is thus far restricted to a few bacteria. This feature is characterized by the presence of lineage-specific domains that occurs downstream of a N-terminal signal peptide in fulllength toxins from certain organisms. The best example of this is provided by the MAFB group of polymorphic toxins found in Neisseria species ( Figure 10I). Here all the full-length toxin proteins display a globular domain, the MAFB-N domain (Additional file 1; overlapping but not identical to the model defined as the domain of unknown function DUF1020 in the PFAM database), just after their signal peptide. Across different full length toxins the MAFB-N domain is highly conserved, which is in sharp contrast to the C-terminal polymorphism in their toxin domains ( Figure 10I). Furthermore, though the MAFB-N domain is strongly conserved in the genus Neisseria, the MAFB-N domain is not found outside of it. In terms of operonic organization, all full-length genes encoding MAFB-N type polymorphic toxins are accompanied by an upstream gene which encodes MAFA, a secreted protein with a lipobox, indicating that it is a lipid anchored surface protein [175]. Like the MAFB domain, the MAFA domain is restricted to Neisseria and shows no polymorphism. This suggests that the conserved MAFB domain of these polymorphic toxins is likely to interact with the surface-anchored MAFA protein, thereby anchoring them to the cell surface. This hinted that certain lineagespecific N-terminal domains might serve as a surface anchor for toxins. A comparable situation was observed in a group of seven polymorphic toxins in Microscilla marina, which are typified by a conserved N-terminal domain upstream of their signal peptides (Microscilla-N). This conserved globular domain is currently not observed outside of this species and might again play a specific anchoring function for these polymorphic toxins. It is also conceivable that homotypic interaction between these "constant" N-terminal domains help spatial clustering of different toxins on the cell surface.
Like Microscilla, yet another member of the bacteroidetes clade, i.e. the Acanthamoeba endosymbiont Amoebophilus asiaticus displays a variety of effectors, which are predicted to be directed against its eukaryotic host, that are united by shared conserved N-terminal domains. We were able to identify two distinct types of such N-terminal domains that occur immediately downstream of a signal peptide and a lipobox, that we termed Amoebo philus-prodomain 1 (APD1) and 2 (APD2) respectively (Additional File 1). The presence of the lipobox prior to APD1 and APD2 suggests that these effectors do not diffuse into the host cytoplasm, but are likely to be anchored on the surface of endosymbiont. The proteins bearing the APD1 and APD2 domains show highly conserved N-termini but extremely polymorphic C-termini, with several distinct effector domainsthus, they appear to represent a mechanistic principle similar to the MAFB-N and Microscilla toxin N-terminal domains. However, unlike the classical polymorphic toxins, where the C-terminal domains are serially variable due to displacement by alternative toxin domain cassettes, the Amoebophilus effectors with diverse C-termini are likely to be deployed in parallel at the same time [79]. Among the variable C-terminal domains of these effectors are several domains shared with the toxin domains of polymorphic toxin systems, such as: 1) papain-like peptidases of the Otu family; 2) lipase-like α/β hydrolases; 3) The EDA39C-like nucleases. Additionally, these effectors also display diverse C-terminal domains that are specifically related to the ubiquitin system, such as the F-box and U-box subunits of ubiquitin E3 ligases, SMT4/Ulp1-like desumoylating and UBCH-like deubquitinating peptidases, and other regulatory modules such as the GIMAP-type GTPase domains, STAND NTPase domains, SecA-like helicase-related domains and SbcC-like ATPase domains [79,176,177]. This suggests that over and beyond typical toxin-like effectors, the Amoebophilus effectors also interface with the host via a wide range of catalytic activities that are typically not encountered in the polymorphic toxin systems. Indeed, the deployment of effectors interacting with the eukaryotic Ub-system is a common strategy used by several endo-symbiotic/parasitic bacteria as well as exoparasitic bacteria that deliver effectors via different secretory systems [80]. On the other hand deployment of STAND NTPases and GIMAP-type GTPases is a strategy limited to endo-symbiotic/parasitic forms. Nevertheless, the presence of the lineage-specific APD1 and APD2 domains suggests that, as in the case of the polymorphic toxin systems, these N-terminal domains might mediate surface anchoring or homotypic interactions that allow clustering of effectors to certain locations on the cell surface. Given the lineage-specific nature of this feature, it might turn out to be more widespread upon more careful analysis.

Immunity proteins
Our earlier studies had revealed that two major immunity protein superfamilies, namely SUKH and SuFu, dominate the polymorphic toxin systems [17]. The current study further corroborated this observationsystematic comparisons revealed that members of the SUKH superfamily act as immunity proteins across the greatest mechanistic and structural range of toxins. They were found as immunity proteins for toxin domains belonging to 18 distinct families of nucleases displaying eight distinct folds, three families of deaminases, DOC-like protein AMP/UMPylating enzymes, TIM-barrel aldo-keto reductase, two types of α/β hydrolases and two mechanistically distinct peptidases (Table 3). We extended the diversity of the SuFu superfamily by identifying a second, previously unknown clade of SuFu domains (Table 3, Additional File 1). These domains are extremely divergent with respect to the classical SuFu domain but could be unified with them by means of profile-profile comparisons (p = 10 -6 ; probability 86% for matching the classical SuFu superfamily profile). Together, the two clades of SuFu domains are immunity proteins for toxins with six families of nuclease domains of the HNH/EndoVII fold, the ParB domain, Ntox7 nuclease domain, peptidase domains belonging to two unrelated folds and the glycerophosphodiester phosphodiesterase domain. Thus, the extended SuFu superfamily is only next to the SUKH superfamily in terms of the mechanistic and structural range of toxins that it can neutralize (Table 3). A key point to note is that these two superfamilies of immunity proteins work across toxins, which utilize entirely unrelated biochemical mechanisms and target very distinct types of macromolecules (RNA, DNA, proteins, lipids and carbohydrates; Table 3). This observation supports our earlier proposal that the SUKH and the SuFu superfamilies primarily function by being able to bind diverse target proteins by means of sequence variability in their respective versatile binding interfaces [17]. Thus, in a sense they parallel the use of certain highly variable but versatile binding interfaces found in domains from eukaryotic antigen receptors such as the leucine rich repeats and the immunoglobulin domain [178]. Beyond the SUKH and SuFu superfamilies, we recovered over 85 different superfamilies of immunity proteins associated with polymorphic toxin systems (Table 3). In contrast to the SUKH and the SuFu superfamilies, majority of these are specific to only one or a few types of toxin domains (Table3, Figure 11). For example, the Imm-barstar is specifically associated with toxins containing the barnaselike nuclease domain, and Imm39 with URI domain nucleases across practically all major bacterial lineages. Likewise, Imm35 is specifically associated only with the papain-like peptide Tox-PL1, suggesting that it functions specifically as a peptidase inhibitor. The strong association with a single family of toxin domains indicates that several of the immunity proteins have evolved to counter only a single type of toxin. Unlike the versatile immunity proteins, these tend to strongly conserve an interface that facilitates a very specific interaction with their cognate type of toxin. Thus, we observe opposing evolutionary trajectories among the immunity proteins: few versatile immunity proteins are selected for sequence diversification at binding interface to cope with a structurally diverse range of the toxin domains, whereas a large number of immunity proteins are selected to retain the ability to specifically interact with a single type of toxin domain across a wide phylogenetic range.
All but few of the currently identified immunity proteins are cytoplasmic globular proteins and typically do not show relationships to any known enzymatic domains. This implies that they primarily act in the cytoplasm by directly binding to the toxin domains. Two immunity proteins (Imm-CdiI and Imm17) show a comparable architecture in being comprised of two TM helices. Unlike the other immunity proteins these might act by preventing uptake of the toxin at the cell membrane. Likewise, a subset of the immunity proteins associated with the L,D peptidase, which is predicted to function on the cell-surface, are secreted or TM proteins, consistent with the localization of the active toxin. Imm65, which shows a strict association with Tox-JAB-1 is also exceptional in being the only immunity protein in our collection that appears to be a lipoprotein anchored via its N-terminal lipobox. Imm-ARG is also exceptional in that it is the only currently known enzymatic immunity proteinit contains a catalytically active ADPribosylglycohydrolase domain (ARG) [148]. Given that it strictly associates with toxin ARTs of the R-S-E clade, it is likely that Imm-ARG neutralizes these toxins by reversing the ADP-ribosylation catalyzed by them.
Secondary structure analysis indicates that on the whole the majority of immunity proteins are α + β domains (64%) followed by all-α domains (25%). Interestingly, while there are over 50 different types of immunity proteins, with α + β domains being preponderant, only a few of them belong to previously characterized superfamilies of domains mediating protein-protein interactions in other sub-cellular contexts. Among these are Imm-NTF2 and Imm-NTF2-2 (NTF2 fold domain), Imm-MyosinCBD (related to the cargo-binding domain of the type VI myosins of animals), Imm-LRR (leucine-rich repeats), Imm-Ank (Ankyrin repeats) and Imm-HEAT (HEAT repeats), which display domains that are widely used in proteinprotein interactions across several cellular systems (Table 3). However, unlike the SUKH or SuFu superfamilies, none of these immunity proteins with versions of previously characterized interaction domains are widely used across different toxin types in the polymorphic toxin systems. Some otherwise common protein-protein interaction domains used in other biological systems, such as the immunoglobulin or β-propeller domains, have not yet been found among immunity proteins. This suggests that, rather than widely coopting common protein-protein interaction domains that are prominent in other sub-cellular systems, the polymorphic toxin systems have selected for their own unique set of proteins specializing in protein-protein interactions (Table 3). In the case of the SUKH and the SuFu superfamilies, evidence from gene neighborhoods and phyletic patterns suggests that they primarily function in the context of the polymorphic toxin systems and were on several occasions secondarily adapted for other protein-protein interaction functions, especially in eukaryotes and viruses [17]. Interestingly, most immunity protein superfamilies are entirely absent in archaea (Table 3). This is consistent with the general paucity of classical polymorphic toxin systems in most archaea; though haloarchaea display functionally related PVC-SS delivered toxin systems (See below for further discussion). These observations also indicate that the polymorphic toxin systems have provided a unique niche in bacteria for the innovation of a great variety of domains mediating distinctive protein-protein interactions, majority of which are not utilized elsewhere. Nevertheless, at least 13 distinct types of immunity proteins have been transferred on different occasions to eukaryotes (Table 3). While some of these transfers to eukaryotes are ancient, the majority of these transfers are to fungi and diverse amoeboid eukaryotes which share micro-environments with bacteria. It would be of interest to investigate if these have been adapted for eukaryote-specific functions as observed in the case of the SUKH and SuFu superfamilies [17]. In conclusion, we suggest that a systematic structural investigation of the toxin-immunity protein interactions might offer a unique opportunity to study the evolutionary constraints acting on protein-protein interaction interfaces.

Polyimmunity loci and polyimmunity proteins
Our earlier analysis had indicated the presence of tandem arrays of genes encoding several distinct paralogous immunity proteins of the SUKH superfamily, many of which are often only distantly related to each other [17]. We term these "polyimmunity loci". Such polyimmunity loci were suggested to function as potential backups that allow organisms to survive not only their own toxins but also neutralize a range of toxins that might be delivered by non-kin strains that are present in the environment [17]. Further, they might provide reservoirs of immunity proteins that allow an organism to potentially "cover" any new toxin it might evolve or acquire through lateral transfer. In this study we systematically identified several new polyimmunity loci and further extended this concept to include homogeneous and heterogeneous polyimmunity loci ( Figure 11A): The homogeneous polyimmunity loci are defined as those which are dominated by a single type of immunity protein e.g. several tandem paralogs of the SUKH superfamily [18]. The most frequently found homogeneous polyimmunity loci are those containing tandem SUKH superfamily genes. In addition, Imm6, Imm11 Imm28, Imm33, Imm36 and Imm 41 also form prominent homogeneous polyimmunity loci (Additional File 1). In contrast, the heterogeneous polyimmunity loci contain a wide range of structurally unrelated immunity proteins. For example, a heterogeneous polyimmunity locus from Bacteroides sp. D22 encodes 19 different immunity proteins belonging to 13 distinct superfamilies, of which the SUKH superfamily alone is represented by 6 distinct versions in this locus ( Figure 11A). As such these polyimmunity loci represent a unique type of prokaryotic gene clusterthey differ from other large prokaryotic gene clusters in concentrating genes that are effectively functionally equivalent in a certain sense rather than encoding multiple subunits of a protein complex (e.g. ribosomal or CRISPR operons) or enzymes catalyzing successive steps of a complex pathway (e.g. the antibiotic and siderphore biosynthetic operons) [179,180].
Examination of both polyimmunity loci reveals several interesting features ( Figure 11A and Additional File 1): 1) The immunity genes in a polyimmunity locus are never interrupted by intervening toxin genes or toxin cassettes. Thus, they are distinct from regular polymorphic toxin loci, which typically display arrays of toxins or toxin cassettes, often with an adjacent immunity protein.
2) The intergenic distance between two immunity genes in a polyimmunity locus is typically small and they are arranged in the same orientation. This implies that they might be transcribed into a single polycistronic message, from which multiple immunity proteins are synthesized at once. This appears to distinguish them from the immunity proteins located within a regular polymorphic toxin locus in which only the complete toxin gene and its adjacent immunity protein are expressed [181].
3) The polyimmunity loci show considerable differences in terms of the number and type of included immunity genes, even between strains of the same species ( Figure 11A). 4) In several cases the polyimmunity loci are adjacent to genes encoding recombinases, such as the XerC/D recombinase (Additional File 1). It is conceivable that the recombination mediated by these adjacent elements might play a role in accumulation of immunity genes at polyimmunity loci. 5) Usually organisms possess only a single polyimmunity locus. A minority of the organisms possess more than one polyimmunity locus (~13% of the organisms with polyimmunity loci). 6) Extended polyimmunity loci (i.e. those with four or more tandem immunity genes) are not found in all bacterial lineagesthus far, they are only found in certain lineages of proteobacteria, bacteroidetes, firmicutes and actinobacteria. This suggests that extended polyimmunity loci are probably selected for only in certain ecological settings (see below). Some of the above features indeed suggest that these loci are probably under selection to provide a preemptive defensive backup against a constantly changing profile of deployed toxins in context of frequent, recurrent organismal conflicts (see below for further details).
Comparable to the polyimmunity loci, are the polyimmunity proteins, which combine multiple immunity protein domains into a single polypeptide ( Figure 11B). Thus, they may be viewed as polyvalent immunity proteins that have the ability to neutralize more than one toxin simultaneously or serially. We first observed such polyimmunity proteins in the SUKH superfamily, wherein the same protein contains multiple tandem repeats of the SUKH domain [17]. Similarly, we observed that the SUKH domain might also be fused to SuFu and Imm33 (DUF2185) domains indicating that there are polyimmunity proteins, which combine structurally unrelated immunity domains in the same polypeptide. A systematic search for polyimmunity proteins revealed several additional architectures ( Figure 11B). Some of the largest polyimmunity proteins combine up to 10 distinct immunity domains in a single polypeptide (e.g., gi: 160893617 from Clostridium sp. L2-50; Figure 11B). Given its prevalence as an immunity domain, not surprisingly, the SUKH domain is a common denominator in several of these polyimmunity proteinsit is combined with at least 8 structurally unrelated immunity domains in different polypeptides ( Figure 11C). The other prominent domains in polyimmunity proteins are SuFu (combined with five other domains), Imm13, Imm33 and Imm-Ank (combined with four other domains) and, Imm11 and Imm34 (each with combinations to three other domains) ( Figure 11C). The most frequently found domain combinations in polyimmunity proteins with more than one type of immunity domain involve combinations between one or more of the following immunity domains: SUKH, SuFu (including SuFu-family 2), Imm-Ank, Imm5, Imm33, Imm34, Imm36, Imm66, Imm67, Imm68 and Imm69. Like the polyimmunity loci, the polyimmunity proteins are encoded in operons, which usually do not contain associated toxin genes or cassettes. Interestingly, while polyimmunity proteins tend to be coded by small polyimmunity loci with two or three tandem immunity genes, they might not be found in the same bacteria with extended polyimmunity loci (see above) suggesting that the two are functionally related but Figure 12 Network derived from the domain architectures of toxins. The central panel shows the network for all toxins in all species, whereas the lower panels show networks derived for major bacterial clades. The network is a directional graph with edges connecting neighboring domains in a polypeptide, in which the N-terminal domain is the source node, whereas the C-terminal domain is the target node. Edges are colored to match the source node color to illustrate the main direction of flow in the graph. Domains with similar properties are grouped together as shown.

Contextual features: Functional implications of geneneighborhoods and domain architectures
To better understand the functional aspects of the genomic organization of the polymorphic toxins and related toxin systems in terms of genomic organization, recombination, secretion and interactions with immunity proteins, we resorted to a systematic analysis of their gene neighborhoods and domain architectures of toxins. For the sake of visualization, we represented the connections emerging from both these types of analysis as directed graphs: In the case of domain architectures, the nodes in the graph are the individual domains and the edges are connections between two adjacent domains in a polypeptide in the N-to C-terminal orientation. Each of the repetitive structures such as RHS and filamentous hemagglutinin repeats were treated as a single node (Figure 12). In the case of gene neighborhoods the nodes are individual genes or toxin cassettes and the edges indicate their neighborhood relationship in the 5'-> 3' orientation (Additional File 1).

Inferences from the gene neighborhoods
The one pervasive feature of polymorphic toxins across most gene neighborhoods was the predominance of the Figure 13 Length distribution for predicted complete active toxins in different bacterial clades. Complete active toxins, as against cassettes, were identified based on characteristic marker domains for each of the distinct secretory systems associated with the toxin either in the same polypeptide or in gene neighborhoods (Table 1). The topmost row shows the combined statistics for all active toxins while other panels present the breakdown of these distributions based on secretory bacterial clades. The toxin length distribution is represented as beanplot [182] (e.g. left panel in the first row) and a raw histogram (top row, central panel) and clearly indicates the multimodal nature of toxin length. The barplot on the first row (rightmost panel) shows the frequencies of consecutive toxin and/or immunity gene pairs in theses genomes. Only pairs of gene encoded by the same strand where considered. The labels indicate whether an immunity protein (I) or a toxin (T) is encoded upstream or downstream of its neighbor in putative operons, e.g. TI corresponds to a pair where an immunity gene is preceded by a toxin gene. Note that the TI (toxin -> immunity) architecture is the most frequent pair observed in all graphs except for bacteroidetes/chlorobi and firmicutes, where the presence of polyimmunity loci inflates the II category. Dashed vertical lines correspond to the median protein length for the data on each panel, and the solid vertical lines over each beanplot correspond to the median length in that secretory system alone. The axes at the right of each panel contain the number of active toxins per secretory system. toxin-immunity gene (TI) order, wherein the toxin gene is to the 5'end, while the immunity gene is to the 3' end of the operon (Figure 13). This tendency holds good for both complete toxin genes encoding all the N-terminal domains, as well as individual toxin cassettes which only encode toxin domains. There are several implications of this gene organization: 1) The toxin is synthesized prior to the immunity protein during translation. As the toxin protein is targeted to one of the many secretion systems for delivery to the cell surface, it is unlikely to cause immediate "self-intoxication", thereby obviating the need for a premade immunity protein. This is supported in experiments with toxins exported by the T5SS, where the toxin is only activated in the target cell [183]. 2) Because polymorphism is achieved by recombining different toxin cassettes to a constant 5' gene body coding for trafficking and presentation domains, there is the need for the recombination event to not only replace the 3' toxin cassette [17,45], but also bring in its cognate immunity gene. This feature explains why cassettes also occur as TI pairs: On account of the TI organization of cassettes, a single recombination event at the 3' tip of the complete toxin gene can replace the existing toxin coding region with a new toxin cassette and simultaneously bring in the new immunity gene. Evidence for multiple such recombination events is presented by the genomic organization of the full toxin genes. They often have a string of multiple immunity genes at the 3' end [17]: each of these immunity genes is likely to represent a remnant of a former recombination even that replaced the tip toxin region while inserting a new immunity gene ahead of it. Thus, the lack of the need for a premade immunity protein due to outward trafficking of the toxin appears to have allowed the emergence of the TI gene order. The TI gene order in turn seems to have facilitated the emergence of polymorphism in these systems. Indeed the widely distributed simple barnase-barstar gene pairs might represent an incipient TI gene order without notable polymorphism, whereas the barnase cassette within larger polymorphic systems represents its incorporation into the fully developed versions of these systems.
The gene-neighborhood graph also contains the imprint of some of the secretory systems utilized for the outward trafficking of toxins by the producing cells (Additional File 1, Table 1) [18]. The complete toxin genes trafficked via the T5SS, T6SS, T7SS and PVC-SS often contain neighboring genes whose products mediate their trafficking. In the case of the T5SS the adjacent gene typically codes for CdiB-like proteins belonging to the TpsB class of outer-membrane trafficking proteins [37]. Such gene neighborhoods are only found in proteobacteria, bacteroidetes, fusobacteria and the negativicute clade of firmicutes (e.g. Veillonella and Selenomonas) and are strong markers indicative of the use of the two-partner system (T5SS) for the extrusion of toxins. The phyletic pattern of this system suggests that it might have emerged in the proteobacteria-bacteroidetes assemblage (members of the group I bacterial division [184]) followed by transfer to a subset of group II lineages such as negativicutes and fusobacteria. This supports the hypothesis that the negativicutes have secondarily acquired a "proteobacterial"-type cell wall through lateral transfer of specific components, and not as a by-product of the sporulation system as recently proposed [185]. The T6SS, PVC-SS, and MuF-SS utilizing toxins are typically marked by the presence of genes for the injection or capsid packaging apparatus, and a recycling AAA + ATP in the case of the former two systems [38,39,75,82]. Several T6SS operons additionally encode a PsbP/MOG1like protein. The gene coding for the latter protein is often adjacent to the toxin gene and is related to the photosynthetic oxygen-evolving complex protein PsbP (p = 10 -17 ; probability 98% in profile-profile searches) and might represent a novel subunit of the T6SS that acts as an adaptor between the secreted toxin and the injection apparatus. The genes of toxins secreted via the T7SS are occasionally characterized by gene neighborhoods that encode additional T7SS components such as the YueA-like FtsK/HerA ATPase (the motor driving T7SS), and EsaC, which contains a bacterial version of the PH-like fold [33,186]. Toxins associated with T7SS neighborhoods are found only in firmicutes, actinobacteria and chloroflexi, suggesting that toxins with this secretory mode possibly emerged early in the diversification of the group II bacteria (Table 1).

Inferences from domain architectures
Comprehensive analysis of domain architectures of complete toxins reaffirms the results from the more restricted studies regarding the generally "tripartite organization" of the polymorphic toxins ( Figure 1B): The N-terminal-most domains are related to trafficking of the toxin to the cell surface in the producing cell. The central domains, typically forming filamentous structures, are related to presentation of the toxin on the cell surface, and processing and release for delivery into the host cell. The C-terminal-most domains are the toxin domains. This architectural blue print might be violated in certain toxins that lack the central filamentous elementsthese are usually shorter secreted proteins. Nterminal modules are usually associated with the secretory pathway taken by the toxin, with specific domains uniquely characterizing different secretory pathways (Table 1; Figures 12, 13): 1) The TpsA-like secretion domain (TPSASD) defines the T5SS [37]; 2) the PVC metallopeptidase is determinant of the PVC-SS; 3) The WXG-like helical bundle (including LXG and LDXD) domains are strictly associated with the T7SS [187]; 4) the SpvB domain with integrin-like β-propeller domains are the determinants of the TcdB/TcaC export pathway [42]; 5) the PrsW peptidase domain defines the eponymous export system. In the case of the T6SS, the VgrG module, which form the tip of the injection apparatus [39], might be fused in certain cases to the N-terminus of the toxin protein. Although the VgrG module might be also found in the PVC-SS gene neighborhoods it is never fused to toxins secreted via this pathway. Additionally, our current analysis indicated that the conserved PAAR motifs (named after the eponymous signature found in a subset of these domains; PFAM: PF05488) with an associated TM helix is found in toxins strictly associated with T6SS gene contexts. This suggests that the PAAR motif is a determinant for T6SS-driven export. The PAAR motifs typically occur as pairs and each motif is predicted to form a 3-stranded element, with the second copy usually displaying conserved cysteines, histidines and an aspartate that might constitute a stabilizing metal-binding site (See Additional file 1 for alignment). Given their fixed N-terminal location in the complete toxins and their specific gene-context association with components of the T6SS, it is likely that the PAAR motif represents a signal recognized by this secretory pathway. The T2SS (general secretory pathway) is the most prevalent secretory system for polymorphic toxins (Figure 12, 13). Of the dedicated secretory systems (i.e. those other than T2SS) we found that T7SS, T6SS and T5SS are the dominant ones, accounting for 12, 11 and 10 percent respectively of the complete toxins in our collection ( Figure 13). The remaining dedicated secretory systems accounted for lower numbers of the total number of complete toxins. With respect to the~150 distinct types of toxin domains we identified among polymorphic toxins and related systems, other than the general secretory pathway, the T7SS, T6SS and T5SS again dominate in terms of diversity of the C-terminal toxin domains with which they are associated ( Figure 12). They are respectively being combined with 45, 43 and 43 percent of the total number of different types of toxins. Though the total number of toxin proteins delivered via the PVC-SS is much lower than that delivered by the three previously named systems, it is combined with a considerable diversity of distinct types of C-terminal toxin domains (31.5% of the total number of toxin types).
As discussed above, the two distinct positions of the processing peptidases, i.e., just prior to the toxin domain (e.g. HINT, papain-like peptidase, caspase) or at the Nterminus of the toxin protein (e.g. ZU5 and PrsW) appear to reflect two distinct functional themes in terms of autoproteolytic cleavage of the toxin protein. The HINT peptidase is found in association with T2SS, T5SS, T7SS and the TcdB/TcaC export pathway but never with the T6SS and PVC-SS (Table 1, Figure 12). This suggests that proteolytic processing by HINT and the PVC-metallopeptidase are mutually exclusive. This supports our above-stated inference that the PVCmetallopeptidase and the HINT peptidase are functionally equivalent. It also suggests that the injection process of the T6SS probably obviates the need for autoproteolytic action in toxin release. Of the repeats constituting the central filamentous regions, the filamentous hemagglutinin repeats are found only in toxins delivered via the T5SS. In contrast, the RHS repeats are found in toxins delivered by all the different secretory systems, except the T5SS. The less-common, central filamentous modules, which are also promiscuous in terms of secretion systems, include the phage tail-fiber and the alphahelical ALF repeats. The HINT peptidase domain is found in association with representatives of all these different repeat types in classical polymorphic toxins suggesting that autoproteolytic processing to release the C-terminal toxin is a phenomenon that is independent of the type of the N-terminal stalk on which it is borne. A subset of toxin proteins from firmicutes, actinobacteria, proteobacteria and bacteroidetes are characterized by the presence of additional adhesion-related domains in their architectures (Figure 12). Most are carbohydrate or peptidoglycan binding and include the LysM, discoidin, Laminin-G, RicinB, bulb-lectin, PGB (peptidoglycan binding), CWB (cell wall binding) and SH3 domains [188][189][190]. The SH3 and laminin-G domains are usually found at the N-termini of the complete toxin proteins delivered by the T2SS and are likely to help in anchoring the toxin to the cell wall of the producing cell by binding components of the peptidoglycan or cell-surface carbohydrates. In contrast, RicinB, discoidin and bulb lectin domains might be found either at the N-termini or embedded among the RHS repeats or close to the C-terminal toxin module. This suggests that certain versions of these domains might also be used to enhance contact with target cells. Indeed, previously the RHS repeats have also been proposed to possess carbohydrate binding abilityhence, the RHS repeats might also directly participate in the adhesive action of the long toxins with such stalks [115,191]. The architecture graph also makes it clear that the nucleic acid-targeting toxins are the most prevalent type of toxin, far exceeding the peptide-and lipid-targeting toxins by a large margin ( Figure 12). This is likely to be a reflection of the fact that a cell can be killed most effectively by disrupting the two key junctions in the flow of biological information, namely by disrupting the genome and by blocking translation.
Examination of the length distribution of the complete toxins reveals a multimodal distribution with peaks of decreasing magnitude (Figure 13). The first peak is around 400, the second is between 1400-1600, the third is between 2200-2400 and the fourth is between 3000-3400 residues in length. The longest toxin recorded in our set is SACTE_5178 (gi: 345002682), with multiple toxin domains, from Streptomyces sp. SirexAA-E, and 13652 amino acids in length. This suggests that while the complete toxins cover a wide length range there are certain preferred lengths. In general terms it suggests that the polymorphic toxins are of two types: 1) stalkedthose with long N-termini with multiple repetitive elements, which are likely to be used primarily in the contact dependent mode as described for the original CDI systems [17,36]. 2) Unstalkedthese toxins lack a substantial N-terminal extension and are like to be secreted toxins that possibly act through diffusion into the environment or through directed delivery into the target cell [17]. The peaks of the distributions of the toxins delivered via the PVC-SS, T7SS and phage MuFterminase system, are in the short range and these contribute in a major way to the first peak in the overall length distribution curve ( Figure 13). In the case of the T7SS, while the majority of toxins are short and likely to be unstalked, there is a smaller set of longer stalked toxins which are also delivered by this system (Figure 13). The T6SS delivered toxins show a clear bimodal length distribution, with a shorter variety lacking stalks or fused to Nterminal HCP1 domains ( Figure 13). This type contributes to the first peak seen in the overall length distribution curve. The second peak is around 1400-1500 amino acids in length (matching the second peak in the overall length distribution curve) and consists of stalked toxins with RHS repeats. This suggests that the T6SS delivers both unstalked and stalked toxins. The former are probably directly delivered into the target cell, whereas the latter are merely placed on the cell surface and might act through the contact-dependent mode. TcdB/TcaCdelivered toxins show a peak at around 2200 amino acids and contribute to the third peak observed in the overall distribution. The T5SS-delivered toxins show a peak a little after 3000 residues and contribute to the 4 th peak in the overall distribution ( Figure 13). The toxins with RHS repeats show a peak in their length distribution around 1400-1600 amino acids (second peak in the overall distribution), while for the filamentous hemagglutinin repeats the peak length distribution is 3000-3400 amino acids (the fourth peak in the overall distribution) (Figure 13). This indicates that the major types of stalked toxins with different kinds of repeats, each have their own preferred lengths. This suggests that contact via such stalked toxins happens at a relatively constant distance from the cell surface. This in turn probably points to an optimal approach distance between neighboring cells in colonial aggregates, such as biofilms, where intra-specific competition would be expected.

Comparisons with other toxin systems
The polymorphic toxin systems show several similarities and differences with other well-studied toxin systems of bacteria involved in different levels of intra-genomic, intra-species and inter-species conflicts. We compare below the polymorphic toxin systems with several of these systems and discuss the potential importance of significance of the similarities and differences: 1) Effectors directed at hosts and distantly related competitors: Mechanistically the polymorphic toxins and the effectors directed against hosts and distantly related competitors are closely related. These effectors are usually chromosomally encoded like classic polymorphic toxins. As seen from the above discussion (Tables 1, 2), both these systems share a large number of toxin domains, processing peptidases, and also common secretory pathways including T2SS, T5SS, T6SS, T7SS, PVC-SS and TcdB/TcaC-like export. However, the T3SS and T4SS do not appear to be used by classical polymorphic toxins, even though they are common export pathways for effectors in specific bacterial lineages [34,192]. Some of them also have a structure closely resembling conventional polymorphic toxins and are only distinguished by the lack of associated genes for immunity proteins. Neighboring cassettes for standalone toxin domains are rare in these systems. However, the organization of other effector proteins sharing toxin domains with conventional polymorphic toxins might be differentthe toxin domain is not necessarily located at the C-terminus and might occur internally or as a standalone protein. Additionally, these effectors also display certain toxin domains, such as those pertaining to the eukaryotic Ub-systems that are not deployed in classical polymorphic toxin systems used in intraspecific conflict. This reflects the relative rarity or the relatively limited functional penetration of sub-cellular systems by the prokaryotic cognates of the Ub-system [126], making them less effective targets for interference.
2) Plasmid-encoded bacteriocins: The plasmid-encoded bacteriocins, such as colicins, pyocins and cloacins conceptually resemble the classical polymorphic toxins in being deployed against closely related target cells. They also share the general architectural organization with classical polymorphic toxinsthe N-terminal and central domains being deployed in trafficking with a toxin domain at the extreme C-terminus. Likewise, these systems are also characterized by immunity proteins that help protect the producing cells [20]. Not only do their toxin domains share several mechanistic themes, such as cleaving of DNA, RNA and perforating of membranes, with the toxin domains of polymorphic toxins, but they also share certain homologous toxin domains such as the HNH, ColE3 and BECR-fold nucleases such as the colicinD and ColicinE5 domains (Table 2). However, being on plasmids their primary function is to enhance the fitness of the carrying plasmid. Hence, they usually do not have dedicated systems for their export and depend on inducing lysis of a subset of the producing cells [20].
3) Toxin-Antitoxin systems (Type I, II and III TAsystems): These systems might be encoded either on the chromosome or on a plasmid, and resemble the polymorphic toxin systems in comprising of a pair of elements with opposing activities. In the type II systems both the toxin and antitoxin are proteinaceous and interact physically with each other, thus being analogs of the polymorphic systems [22,24,28,193]. In contrast to the above described TI order of the polymorphic toxin systems with a 3' immunity gene, in TA systems the antitoxin is typically the 5' gene [22]. These elements are primarily intra-genomic selfish elements that are selected for maintaining themselves, and on occasions providing incidental advantage to the host cell [24,28]. Thus, they do not have a need for any kind of export trafficking and delivery apparatus that are encountered in the other systems. As a consequence both the toxin and antitoxin from these systems are small proteins, typically comprised of a single domain [22]. Nevertheless, certain toxin domains from the TA systems are homologous to toxin domains of polymorphic toxins. The chief examples of these are the RNases belonging to the BECR fold (see above), the RES domain, Ntox24 and DOC-like protein AMP/UMPylating enzymes. However, we currently do not have evidence for sharing of any of the metal-dependent nucleases between these two systemsthe PIN domain nucleases are thus far only known from TA systems [108], whereas the REase, HNH and URI fold nucleases of the polymorphic toxin systems are not seen in the TA systems. On the whole, toxins of TA systems tend to predominantly target the genome and the RNAs of the translation apparatus [193], but those from the polymorphic toxin systems appear to have a much wider range, though even among them there is preponderance of nucleic acid-targeting activities that target the above functions ( Figure 12). Peptidases are relatively rare in classical TA systems in comparison to the polymorphic toxins and their PVC-dependent relatives. However, in course of this study we uncovered a previously unknown TA system, which combines a toxin peptidase of the YabG family with a distinctive antitoxin which was previously annotated as a "domain of unknown function" (DUF1021). This adds to the pool of toxin domains that are shared by these systems. Another enzymatic domain shared by the toxins of type II TA systems and polymorphic toxins is the ART domain [148]. Interestingly, in this case the immunity protein or the antitoxin in both these systems might be an enzyme that removes the ADP-ribose modification, such as the ADP-ribosyl glycohydrolase. The immunity proteins from the type II TA systems, in addition to physically binding their cognate toxins, also usually act as transcription factors that regulate the expression of the TA gene-pair via their common promoter [22]. There is currently no evidence for any immunity proteins with a transcription factor function in the polymorphic toxin systems. In the case of the type I and type III TA systems the antitoxin is a small RNA that respectively interacts with the toxin transcript or the toxin protein [24,133]. Currently, there are no known polymorphic toxin systems with RNA regulators. It appears that the need for specific physical interactions between the toxin and antitoxin in most type II and III TA systems places certain restrictions on the types of toxin domains that can be incorporated into themthey typically are small domains that are not vastly different in size from the antitoxins.
4) Restriction-Modification systems: Like the TA systems, the R-M systems are mobile, intra-genomic selfish elements that operate in prokaryotic genomes [21]. Comparable to the cell-killing mediated by TA systems they have means of enforcing addiction by launching restriction attacks on cell if they are disrupted [194]. They resemble both classical polymorphic toxins and TA systems in combining a toxin (the restriction enzyme) with an antidote (the modification enzyme, typically a cytosine or adenine DNA methylase). However, unlike those systems the physical interaction between the modification enzyme and the restriction enzyme is not central to the counteraction of the latter's toxic properties. Rather, since they operate on DNA, the antidote action of the modification enzyme is mediated by rendering the genome resistant to the restriction enzyme by preemptively modifying it. Being purely intragenomic selfish elements, like TA systems, but unlike polymorphic toxin systems, they do not have any features related to trafficking or delivery. Instead, R-M systems display elaborate adaptations that enhance their target specificity and DNA-binding and manipulation capabilities in the form of specialized DNA-binding domains and accessory subunits such as helicases and MORC ATPases [120,195,196]. Nevertheless, as noticed above, R-M systems and polymorphic toxin systems appear to share several enzymatic toxin domains such as the REase, HNH, URI and ParB domains.
In conclusion, polymorphic toxin systems share certain key features with each of the other well-characterized prokaryotic toxin systems. The distinctions appear to arise from the differences in selective forces shaping each of these systems. On the whole the greatest mechanistic diversity of toxin and immunity domains are seen in the polymorphic toxin systems, which is reflective of the relatively few constraints faced by them in terms of their targets. However, certain types of catalytic domains are preponderant across several of these systems due to disruption of the genome or the translation machinery being apparently the easiest means of killing a cell.
Genome-wide distribution of polymorphic toxin systems and ecological implications Differences in distributions and structure of toxins and immunity protein: Phylogenetic and ecological tendencies To better understand the ecological significance of polymorphic toxins and related systems we systematically compared their genome-wide prevalence to organismal phylogeny. Our analysis revealed that all the major lineages of bacteria with sufficient genomic data had at least one representative coding for polymorphic toxin systems. However, the distribution of these systems between different bacterial lineages shows pronounced differences (Figures 13, 14). Among the group-I bacteria [184], polymorphic toxin systems are abundant in the proteobacteria-like clade (including acidobacteria), bacteroidetes, and the clade unifying chlamydiae, verrucomicrobia and planctomycetes, but are relatively rare in aquificae and spirochaetes. Among the group-II bacteria [184], such systems are abundant in firmicutes, actinobacteria and chloroflexi but are relatively rare in cyanobacteria and thermotogae. They are generally absent in most archaeal lineages, with the rare exception of certain methanoarchaea and haloarchaea. Of these, Methanosarcina acetivorans displays classical stalked polymorphic toxins with RHS repeats and cassettes for toxin modules and immunity proteins, just as in the cognate bacterial systems. A few other methanoarchaea display simple barnase-barstar-like systems, whereas haloarchaea like Halogeometricum borinquense display several PVC-SS delivered toxins with variable C-terminal toxins modules (Additional File 1). This general rarity of the polymorphic toxin systems is in striking contrast to the general prevalence of the toxin-antitoxin systems across archaea [22]. This distribution, with a dominant presence in most major clades of both group-I and group-II bacteria, suggests that polymorphic toxin systems could have been present in the ancestral bacterium. However, it should be noted that these genes and cassettes are highly prone to lateral transfer as suggested by the sporadic phyletic distribution of both toxin domains and immunity proteins [17]. Hence, the distribution of these systems might also reflect in part the secondary dispersion of such systems across diverse bacteria by lateral transfer. In support of this it may be noted that in many organisms the polymorphic toxins are situated on hypervariable chromosomal islands that are prone to lateral transfer [197]. Nevertheless, distributions of the associated specialized secretory systems that deliver these toxins usually follow stricter phylogenetic boundaries, i.e. T5SS and T6SS occur primarily in group-I bacteria and T7SS in group-II bacteria. This suggests that indeed there might have been an ancestral presence of such polymorphic toxin systems in bacteria that selected for different dedicated delivery systems in each lineage and diversified further as these delivery system were fixed.
Certain patterns of distribution of polymorphic toxin systems appear to transcend phyletic boundaries ( Figure 14): 1) the hyperthermophiles, which are often chemoautotrophs, from both bacteria and archaea show a strong tendency to lack such systems. 2) Likewise, the photosynthetic bacteria across different bacterial clades have a dearth of such systems (Figures 12, 14; Additional File 1). The relative underrepresentation of such systems in both these groups of organisms is not related to their genome sizes because organisms with similar sized genome with other lifestyles do possess such systems. In particular, the relative rarity of such systems in cyanobacteria is striking when they are compared to other bacteria with multicellular tendencies and similar complex signaling mechanisms [65], such as deltaproteobacteria and actinobacteria, which in contrast possess abundant arrays of polymorphic toxin systems (Figures 12, 14). While in the case of archaea it is possible that the rarity of these systems is related to their lack of bacterial-type protein uptake systems [20], it should be noted that bacterial hyperthermophiles show a similar pattern. The only exception is the firmicute Geobacillus thermoglucosidasius, which, unlike the rest, is not a classical hyperthermophile, and can survive across a wide temperature range [198]. It appears that the relative rarity of such systems might be more related to their phototrophic or chemolithotrophic tendencies. It is possible that that their relative independence with respect to energy, reducing equivalents and/or carbon dioxide results in lower levels of intra-specific competition for resources.
Finally, we also observed strong phylogenetic signals in the length distributions of complete toxins: 1) The group-I bacteria with Gram-negative cell walls with outer membranes (proteobacteria and bacteroidetes) had (See figure on previous page.) Figure 14 Scatterplots of the number of toxins versus number of immunity proteins per genome. In scatter plots, black or gray dots in the background represent all taxa, and red or blue dots correspond to taxa belonging to the clade or ecological properties described on each plot's title. The dashed line corresponds to the diagonal (x = y) and the ellipses encircle taxa that are characterized by an excess of immunity proteins as discussed in the text. a multimodal distribution of complete toxins, showing both unstalked toxins and stalked toxins of various modal lengths ( Figure 13). This suggested that they are likely to engage in both contact-dependent inhibition as well as inhibition via secreted toxins. 2) Firmicutes with the exception of the negativicute clade showed a largely unimodal distribution of complete toxin lengths with a median value of 492 residues. This suggests that the firmicutes deploy their toxins either mainly via secretion or through much closer contact than in the previous group.
3) The actinobacteria show a bimodal distribution of toxin lengths ( Figure 13). The first peak is around 400-500 amino acids in length and the second is around 1400-1500 amino acids. This suggests that, like proteobacteria, they use both distant contact and secretion or close contact. The use of both short secreted toxins and longer contact-dependent toxins suggest that intraspecific conflict might play out both in the context of biofilms, where contact is critical, and also in motile phases and swarming growth, where contact might be less intense. The distinction in this regard between firmicutes and the two other groups raises question as to whether certain bacterial groups might resort to such forms of conflict only under specific circumstances.

Differences in the relative numbers of toxins and immunity proteins: Implications of intra-and inter-specific conflicts
The median number of toxin domains found in organisms that possess such systems is 3, which is the same as the median number of immunity proteins found per genome (Additional File 1). The difference in the number of immunity proteins and toxin domains per organism is normally distributed with a sharp peak at 0 (Additional File 1). Furthermore, there is a positive correlation between the number of toxin domains and number of immunity proteins with an approximately linear increase in the number of immunity proteins with increasing number of toxin cassettes ( Figure 14). These observations indicate that on the whole there is a balance between the number of toxin cassettes and immunity proteins, which is consistent with the genomic organization of the polymorphic toxin loci and the principle of approximately one-to-one mapping of immunity proteins with toxins. The number of active toxins is positively correlated with the total number of toxin cassettes, suggesting that with an increase in the number of individual polymorphic toxin loci the number of toxin cassettes associated with them increase more or less linearly (Additional File 1). The median number of active cassettes per organism is 1, indicating a median 1:3 ratio between active toxins and associated toxin cassettes.
We then studied the patterns of relative numbers of active toxins, cassettes and immunity proteins and their correlations, if any, with life-style and preferred ecosystems of the organisms. With exceptions discussed in the preceding subsection, bacteria across most wellsampled ecosystems display polymorphic toxin systems. However, we observed that a subset of organisms show strong anomalies in terms of the relative distribution of toxin domains to immunity proteins ( Figure 14). We measured this anomaly using the difference between the number of immunity proteins and toxin domains and uncovered some striking ecological correlations. In general, in aquatic ecosystems we observed a strong proportionality in the number of toxins domains and immunity proteins, with roughly equal number of both ( Figure 14). This suggests that in these niches there is a tendency for "honest" intra-specific conflict, with the polymorphic toxin systems primarily geared towards discrimination of non-kin conspecifics. Those organisms that showed significantly greater number of toxins than immunity proteins could be grouped into two general ecological niches: 1) pathogens-Both extracellular and intracellular pathogens of animals, plants and microbial eukaryotes. We interpret the relative abundance of toxins to immunity proteins in the former group as an adaptation for pathogenesisthe toxins are primarily used against hosts, rather than for intra-specific conflict; hence, many of their toxins do not have corresponding immunity proteins. This situation is especially prominent in intracellular bacteria such as Waddlia chondrophila, Legionella and Amoebophilus asiaticus, which have a large number of toxins but hardly any immunity proteins (Additional File 1). In general, the notable absence of immunity proteins in intracellular pathogens suggests that in most cases (baring exceptions like Odyssella) they do not engage in competition with conspecifics in their distinctive niche. In contrast, other pathogens of animals (e.g. Neisseria species), plants (e.g. Ralstonia and Pseudomonas syringae) and microbial eukaryotes (e.g. Odyssella), while showing a large number of toxins, also have comparable number of immunity proteins. This suggests that they are likely to compete actively with conspecific rivals in course of colonizing niches on or within their hosts. 2) Slow growing, heterotrophic bacteria with a degree of "multicellular" organization, mainly actinobacteria and deltaproteobacteria [65]. Organisms of this group are also well-known for their production of diverse nonproteinaceous antibiotics and maintain their slowgrowing life-style by inhibiting competing faster-growing bacteria [5]. Thus, we see the over-representation of toxins relative to immunity proteins in this group as being part of their weaponry deployed in inter-specific competition. Importantly, both these groups are also enriched in organisms coding for the greatest number of toxin domains in their genomes. The greatest number of toxins is seen in different Photorhabdus species, which are nematode symbionts that aid nematodes in killing their insect prey [84]. Indeed, this bacterium is not only known to kill insects with their toxins, but also competes intra-and inter-specifically with other bacteria [199]. Thus, a large number of toxins domains might be a predictor for not just pathogen-host and inter-specific conflict but also intense intra-specific competition in certain niches.
On the other end of the spectrum we found several bacteria with an overrepresentation of immunity proteins relative to toxins. Especially striking were bacteria which showed a marked paucity of toxins but had a large number of immunity proteins, typically occurring in polyimmunity loci or as polyimmunity proteins. This group of bacteria is enriched in taxa belonging to the human oral microbiome (Figure 14; Additional File 1). Interestingly, this phenomenon was observed across bacteria belonging to phylogenetically distinct clades in the human oral microbiome: this group includes representatives of bacteroidetes (Capnocytophaga gingivalis), betaproteobacteria (Eikenella corrodens), spirochetes (Treponema denticola), actinobacteria (Actinomyces sp.) and firmicutes (Streptococcus oralis) (Figure 14; Additional File 1). This indicates that the oral environment has repeatedly favored proliferation of immunity proteins relative to toxins in a subset of bacteria across different clades. We interpret this imbalance in terms of the ecology of microfilms formed in the oral environment, where several bacteria are often packed in close proximity [200]. In this situation, non-kin "cheaters" which can invade microfilms to benefit from cooperative associations with proximal organisms can accrue an increase in fitness. Hence, we propose that the excess of immunity proteins in these organisms, particularly in the form of polyimmunity loci and polyimmunity proteins, is an adaptation to evade attack from a diverse array of toxins while invading nonkin bacterial assemblages. In support of this, we observed that there is a second group of taxa from the human oral microbiome that display relatively balanced ratios of toxins and immunity proteins ( Figure 14; Additional File 1). It is likely that these organisms are the targets for invasion by the lineages with excess immunity proteins. Generalizing, this observation we propose that the presence of a large excess of immunity proteins over toxins might be a predictor for cheating behavior in invading non-kin bacterial assemblages.
A distinct second group of bacteria with a large excess of immunity protein differed from the above group in having a median or above median number of toxins. This group was greatly enriched in bacilli from soil such as Bacillus cereus, B. mycoides, B. thuringiensis, Brevibacillus brevis and Paenibacillus polymyxa and representatives of the human colonic microflora (Figure 14; Additional File 1). Even in this case, the excess of immunity proteins were typically associated with the presence of polyimmunity loci and polyimmunity proteins. Remarkably, we found that even within the same species (e.g. B. cereus and B. thuringiensis) different strains widely differed in the relative number of toxin domains to immunity proteinssome isolates had a considerable excess of immunity proteins, while other had a balanced ratio to toxin domains and immunity proteins ( Figure 14; Additional File 1). This suggests that the different strains in a given species adopt two general strategies during intra-specific competition: 1) those which participate in "honest" cooperation between kin and discrimination against non-kin. These have similar numbers of immunity proteins and toxins because they possess only as many immunity proteins as required to balance their own toxins. 2) Those which adopt the strategy of cheating by invading non-kin assemblages. These varieties could potentially shift to the second strategy, by expressing their polyimmunity loci or proteins, when there is an excess of "honest players", because in these situations cheating might become profitable. Notably, not all soil bacilli present an excess of immunity proteins over toxins, e.g. B.subtilis does not show the marked imbalance we observed in the above species. This predicts that there are likely to be differences in the social behavior of different soil bacilli, with species like B.cereus possibly engaging in greater degree of colonial or cooperative behavior throughout their life history. Further, the observation that the soil bacilli with an excess of immunity proteins have multiple toxins, unlike several of the above-described oral taxa which lack toxins, indicates that the context in which these groups might adopt a cheating strategy might differ. Among the oral taxa that lack toxins, it is conceivable that they have a phase in their life history where they do not engage in interactions with other bacteria. However, when they encounter target bacteria that can be invaded, they probably express their polyimmunity loci to interact with them while evading their toxins. In general terms, our findings might also explain how these organisms might escape collapse of the cheating strategy, which would happen when the numbers of cooperators are diminished. By facultatively expressing polyimmunity proteins or loci only when target cooperators are abundant and switching them off when they are absent, the deployment of the cheating strategy might be limited to advantageous circumstances.

Transfer of components of polymorphic toxins and related systems to eukaryotes and their viruses
While eukaryotes deploy a wide-range of toxins, some of which share homologous domains with the polymorphic toxins and related systems, most of them do not seem to represent direct counterparts of the bacterial systems. The eukaryotic systems that come closest to the bacterial systems described herein are the fungal killer toxins such as the Kluyveromyces lactis γ-toxin and PaT secreted by Millerozyma acacia and Debaryomyces robertsiae [201][202][203]. Like the bacterial polymorphic toxins, these secreted fungal toxins are primarily used in conflict with closely related non-self strains and act as endo-tRNases. However, it should be noted that they are coded by linear plasmids, which makes them similar to the classical colicin-like bacteriocins, though, unlike them, release of the fungal toxins does not entail lysis of the producing cells. These endo-tRNases currently do not have any homologs outside of fungi and were not detected in any bacterial toxin system. Nevertheless, in this study we observed that at least 13 toxin domains from polymorphic toxin systems and their relatives have been laterally transferred to fungi ( Table 2). This suggests at least a subset of these toxin domains of bacterial provenance might also be used by fungi in intra-specific conflict in a manner comparable to the abovementioned, fungi-specific tRNases. Our earlier study of the deaminase toxins revealed that at least a subset of these, which were acquired by fungi, are probably used in intra-specific conflict, counter-selfish element defense or in phenomena related to heteroincompatibility [18]. Indeed, a major effector in the apoptosis-like heteroincompatibility process of several fungi, namely Het-C, appears to have originated from a bacterial toxin domain found in polymorphic toxin systems (see above).
The toxin domains from the bacterial systems also appear to have been acquired by animals and several other eukaryotes. At least 14 toxin domains observed in polymorphic toxin systems are also present in metazoans, whereas at least six are present in amoeboid eukaryotes belonging to the amoebozoan and heterolobosean lineages ( Table 2). Experimental evidence in animals suggests that at least a subset of these, are deployed in antiviral defense and apoptosis. The AID/APOBEC deaminases are notable in the former context, though it appears that their role has further expanded in animals to encompass genome mutagenesis for generating antigen receptor diversity [204]. Like the fungal Het-C, on at least two occasions in metazoans, executers of apoptosis have emerged from toxin domains derived from polymorphic toxin systemsthe DNA-fragmenting nuclease CIDE (a HNH fold endonuclease domain) [114] and the pierisin-like ARTs which ADP-ribosylate DNA [205,206]. The phyletic patterns indicate that the lateral transfer of these two toxin domains happened at very different points in animal evolutionthe CIDE-like nuclease was transferred close to the base of the metazoa, whereas the pierisin appears to have been transferred only into the lepidopteran insects. Indeed, several of the toxin domains that have been sporadically transferred to eukaryotes could have been incorporated as lineage-specific components of apoptosis or antiviral defense systems. Of particular interest is the animal version of the Het-C domain which is currently known from chordates and the rotifer Adineta vaga. Like bacterial polymorphic toxins, it occurs in a cell-surface protein, which in vertebrates is encoded by the MHC class III region [207,208]. Given this architecture it is conceivable that it is deployed as a defensive toxin against fungal or bacterial pathogens. However, in certain cases, such as the GHH domain, which was acquired by animals, the toxin is no longer retained in its catalytic form; instead the catalytically inactive form is used as an extracellular signaling molecule (i.e. Od-Oz or teneurin). As noted above, the ADP-ribosyl cyclase appears to have been acquired by both metazoa and fungi from bacterial polymorphic toxin systems. In metazoa this enzyme was recruited as a signaling enzyme (prototyped by human CD38 and CD157), which generates two nucleotide messengers cADPr and NAADP that in turn regulate the influx of calcium via the ryanodine receptor [162,163]. Thus, the origin of multiple metazoan signaling messengers can be traced back to the polymorphic toxin.
Of note is the observation that several toxin domains of the polymorphic toxin systems are shared with effectors delivered by endo-parasitic or symbiotic bacteria. Given the widespread presence of such resident bacteria in cells of animals, amoeboid eukaryotes and ciliates [78,79,209], it is probable that such effectors are a major source of several of the toxin domains transferred to eukaryotes and their viruses (which might share the host cell with the intracellular bacterial residents; Tables 2). Indeed the toxin-like domains of effectors and polymorphic toxins deployed by several intracellular bacteria, such as Wolbachia, Orientia, Rickettsia, Rickettsiella, Legionella, Odyssella, Amoebophilus, Protochlamydia and Hamiltonella might affect the host evolution at various levels. In a very direct sense, their action might play a major role in the manipulation of host behavior, reproduction, sex ratio and fitness (e.g. defense against parasitoid wasps in aphids by Hamiltonella [100,101,144]). In certain animal lineages, such as the arthropods, the pervasive presence of endosymbiotic bacteria might facilitate the routine transfer of certain toxin genes, and appears to have contributed to the toxins of the arthropods themselves, as suggested by the latrotoxins of spiders. The acquisition of certain toxin domains by the mimiviruses (Tox-MCF1-SHE and Ntox19), iridoviruses (Tox-Otu domain), and several NCLDVs (Tox-JAB-2) suggests that they might be used by these viruses to manipulate host behavior in a manner comparable to the intracellular bacteria. Similarly, several toxin domains are also encountered in bacteriophages (Table 2), suggesting these viruses might also utilize toxin domains as a strategy to interfere with host physiology.
Certain endosymbiotic bacteria like Odyssella also contain full-fledged polymorphic toxin systems with both toxins and immunity proteins. Such endosymbionts could possibly explain the occasional acquisition of immunity protein domains by eukaryotes and their viruses (which might share the host cell with the resident bacteria; Tables 2, 3). As previously noted, the SUKH domain proteins observed in several lineages of DNA viruses appear to have originated from immunity proteins of the polymorphic toxin systems [17]. Likewise, we had shown that the SuFu immunity protein has given rise to an intracellular component of the metazoan-specific hedgehog signaling pathway [17]. Our current analysis indicated that the C-terminal cargo-binding domain that is unique to animal type VI myosins is evolutionarily related to the immunity protein Imm-MyosinVICBD [210] (p = 10 -7 in iteration 4 with JACKHMMER in a search initiated with an immunity protein gi: 332655030) that is predicted to counter certain ADP-ribosyltransferase toxins. Given that in eukaryotes the MyosinVICBD is only found in the animal lineage and in a single association, i.e. with myosin VI, it is likely it was acquired from bacteria through transfer of a gene encoding an immunity protein. Transport of cargo by the myosin VI is unique in that it is directed toward the minus ends of the actin filaments and is required for several key cellular differentiation events in eukaryotes [210]. Other than toxin domains and immunity proteins, processing components such as the HINT peptidase domain, have been acquired by eukaryotes and incorporated into several distinct eukaryote-or even animal-specific regulatory systems such as the hedgehog pathway [17]. Another example of a processing peptidase from polymorphic toxin-like proteins, the ZU5 autopeptidase domain, might have also contributed to the evolution of the animal apoptosis systemthe two ZU5 domains are observed in PIDD, the core protein of the PIDDosome, which provides a platform for recognizing molecular patterns that are associated with loss of genomic integrity and genotoxic stress [211]. We observed that related ZU5 domains are also observed in a lineagespecifically expanded group of proteins from sponges, which might have a role in defense against pathogens (Additional File 1).
On a more general note, several endosymbiotic alphaproteobacteria such as Wolbachia, Rickettsia and Odyssella closely resemble the progenitor of the mitochondrion [212]. Thus, such endosymbiotic associations point back to the very origin of the eukaryotes. Similarly, other endosymbiotic associations, such as those with chlamydiae might have played an important role in the origin of the photosynthetic plant lineage [213,214]. Hence, it is conceivable that the origin of some of the eukaryotic systems might be related to acquisition of genes from the toxin systems of these early bacterial symbionts. We had earlier proposed that the PIN domain RNases of the eukaryotic nonsensemediated mRNA decay system might have emerged from the prokaryotic toxin-antitoxin systems [22]. Similarly, the SUKH, Tad1/ADAR-like deaminase, the SuFuassociated HNH fold nuclease, ADP-ribosyltransferase and the ParBL1 domains might be early acquisitions from polymorphic or related secreted toxin systems of endosymbiotic bacteria, which were incorporated into various core function systems of eukaryotes [17,18]. In this context, it is tempting to suggest that the deubiquitinating peptidases such as those of the Otu clade, the Zu5 peptidase domain in the nuclear membrane protein Nup96/98, and the polyADP-ribose transferases (PARPs) might also be early acquisitions from polymorphic toxins or related effectors of the earliest endosymbionts in the associations leading to eukaryogenesis. Hence, it is conceivable that the very origin of certain features of the eukaryotic cell, and pan-eukaryotic regulatory systems such as ubiquitination and polyADP-ribosylation might have depended on domains derived from systems used in intra-and inter-specific conflict among prokaryotes. Thus, components derived from polymorphic toxins and related systems in symbiotic or pathogenic bacteria might have been critical for more than one major evolutionary transition in eukaryotes.

Conclusions
The current work is the first comprehensive analysis of the recently discovered polymorphic toxin systems. It builds upon our two earlier studies [17,18] that first uncovered these systems and revealed that their diversity was much greater than what was suspected in initial experimental studies [44]. In this work we have systematically identified the most prevalent toxin and immunity protein domains and have classified them based on sensitive sequence and structure analysis. This work thereby provides a framework for future studies on this exciting class of toxin systems. By creating an annotated inventory of toxins and immunity proteins it allows for further biochemical characterization of these proteins. In this regard, we offer a number of clear biochemical predictions in terms of the secretory mechanisms, the mode and site of action, enzymatic activities, active sites and possible catalytic mechanisms of toxins and immunity proteins. The systematic collection of toxins also aids their investigation as potential biotechnological and therapeutic reagentsa possibility underscored by the precedent presented by several other related toxins [4,7]. The pervasive relationship of toxins involved in intraspecific conflict to those used by bacteria in interspecific conflict, such as toxins directed against hosts, is highlighted in this study. Thus, the results presented here also help in understanding the pathogenesis of numerous plant and animal pathogens, as also the interaction between unicellular eukaryotes and their abundant intracellular bacterial residents. These findings might have considerable significance for our future understanding of the virulence of key pathogens, such as Pseudomonas aeruginosa, Legionella, and rickettsiae among other animal pathogens, and Pseudomonas syringae, Xanthomonas and Ralstonia among plant pathogens. The toxins characterized here also provide insights regarding the biochemical basis for complex multiorganism interactions, such as the role for Hamiltonella in defense against parasitoid wasps and Photorhabdus in nematode predation of insects [84,100,101,144,199].
This study offers a platform for understanding certain key ecological aspects of bacterial interactions. Systems characterized here suggest, for the first time, possible molecular determinants for phenomena such as kin versus non-kin discrimination, cooperation and cheating both in the context of biofilms and motile growth. The ideas presented here allow for several testable microbiological hypotheses regarding bacterial conflicts. For example, the proposal regarding cheating in diverse taxa from the oral microbiome and certain soil bacilli can be tested via relatively straight-forward competition experiments. Indeed, such experiments can test our proposal if the polyimmunity loci and proteins facilitate a facultative cheating strategy in interactions between conspecifics. The systematic characterization of these loci also allow for further exploration of the rates of polymorphic transitions of toxins under different conditions and in different ecosystems. Some of these studies might have considerable bearing in human, non-human animal and plant health, because they might help explaining the preferential colonization of bodily niches by certain strains as opposed to others [15,199]. This might be of considerable value in facilitation of processes such as wound healing and appropriate re-colonization of bodily niches after antibiotic therapy.
The immunity proteins from these systems also offer a means for understanding the two contrasting aspects of the evolution of protein-protein interfaces. Our earlier study had shown the versatility of the SUKH and SuFu domain immunity proteins in interacting with a diverse array of structurally and mechanistically distinct toxin domains [17]. Thus, they join the previously studied scaffolds such as the immunoglobulin domain and LRRs in vertebrate antigen receptors as models to understand how a single structural scaffold can diversify to accommodate an enormous variety in protein-protein interactions [178]. On the other hand, we have also uncovered numerous immunity proteins that are specific in terms of the toxins they counter. Furthermore, a notable majority of these immunity proteins are apparently unique to these systems. This presents them as models for the converse aspect of the evolution of interactions, i.e. how a large number of distinct domains with very specific interfaces for interaction have emerged apparently de novo in these systems. Further investigation of immunity proteins through a combination of structure determination studies and biochemical analysis would be of greatest interest in regard to the evolution of these specific protein-protein interaction capabilities.
Finally, the analysis of the diversification of components from polymorphic toxins and related systems points to a previously underappreciated evolutionary principle. Several toxin, immunity protein, structural modules and secretory components from these systems have a distinct life beyond their locus of provenance, especially in eukaryotic regulatory and defense systems. We have documented that on numerous occasions components from these systems were incorporated into regulatory systems of eukaryotes, and in many cases might have played a major role in the very origin of some of these systems [17,18]. Thus, these systems appear to be particularly rich sources to draw from for new functional innovation. We attribute this to the consequences of natural selection in systems related to inter-organismal or intra-genomic conflicts. Not surprisingly, such toxinimmunity systems have a large effect on the fitness of organisms [15,44], thereby escalating an arms race situation. This has resulted in a strong selective pressure for constant diversification of polymorphic toxins and their immunity proteins. Thus, such systems have acted as a "nursery" for innovations in the protein world. Given that such conflicts often extend to the sphere of symbiotic and parasitic interactions with eukaryotes, the latter have access to a "readymade" set of molecular innovations from such systems, which can be recruited to spur the emergence of new interactions in eukaryotic systems. This is consistent with the similar diversification seen in other systems involved in intra-genomic or inter-organismal conflict [5,127,196,215,216]. These include antibiotic biosynthesis systems which are used in inter-specific conflict, siderophore biosynthesis systems whose diversification helps prevent siderophore-stealing by "cheaters", R-M and TA systems involved in intra-genomic conflict [5,21,194,217]. Indeed, our earlier studies indicated that components from each of these conflict systems have played a major role in contributing components to diverse eukaryotic regulatory systems [127,196,215,216]. Thus, organismal and genomic conflicts being the basis for major molecular innovations, which in turn might facilitate major evolutionary transitions, can be considered a general evolutionary principle.

Methods
As described in the search strategy, protein sequences corresponding to predicted toxins, trafficking, presentation, processing and immunity domains were isolated using diagnostic domain architectures and geneneighborhood templates, that were initially identified in previous studies [17,18] (Figure 1). The sequences of representatives of each of the domains from toxins, immunity proteins and associated trafficking components were then used as seeds in iterative profile searches with the PSI-BLAST [218] and JACKHMMER [219] programs that run against the non-redundant (NR) protein database of National Center for Biotechnology Information (NCBI), to identify further homologs. A list of these search-seeds and the residue ranges for each domain is provided in Additional file 1. For most searches, which were used to report the relationships presented in this work, a cut-off e-value of .01 was used to assess significance. In each iteration the newly detected sequences that had e-values lower than the above cutoff were examined for being false positives and the search was continued with the same e-value threshold only if the profile was uncorrupted. The postulated relationships recovered using such iterative searches were further confirmed with other aids such as secondary structure prediction and superposition on known structures, if available. This resulted in the identification of over 250 toxin and immunity domains. Search results for these domains are provided in Additional file 1.
For each toxin or immunity gene, the gene neighborhood was also comprehensively analyzed using a custom Perl script of the inhouse TASS package. This script uses either the PTT file (downloadable from the NCBI ftp site) or the Genbank file in the case of whole genome shot gun sequences to extract the neighbors of a given query gene. Usually we used a cutoff of 5 genes on either side of the query. The protein sequences of all neighbors were clustered using the BLASTCLUST program (ftp://ftp. ncbi.nih.gov/blast/documents/blastclust.html) to identify related sequences in gene neighborhoods. Each cluster of homologous proteins were then assigned an annotation based on the domain architecture or conserved shared domain. This allowed an initial annotation of gene neigborhoods and their grouping based on conservation of neighborhood associations. The remaining gene neighborhoods were examined for specific template patterns typical of toxin-immunity systems. In this analysis care was taken to ensure that genes are unidirectional on the same strand of DNA and shared a putative common promoter to be counted as a single operon. If they were head to head on opposite strands they were examined for potential bidirection promoter sharing patterns.
Multiple sequence alignments of all domains were built by the Kalign [220], Muscle [221] and PCMA [222] programs, followed by manual adjustments on the basis of profile-profile and structural alignments. Secondary structures were predicted using the JPred [223] and PSIPred [224] programs. A comprehensive database of profiles was then constructed using these multiple alignments and was used extensively in the annotation and analysis of protein domain architectures and gene neighborhoods. For other known domains, the Pfam database database [189] was used as a guide, though the profiles were augmented in several cases by addition of newly detected divergent members that were not detected by the original Pfam models. Clustering with BLASTCLUST followed by multiple sequence alignment and further sequence profile searches were used to identify other domains that were not present in the Pfam database. Signal peptides and transmembrane segments were detected using the TMHMM [225] and Phobius [226] programs. The HHpred program [227] was used for profile-profile comparisons to either unify poorly characterized families to proteins with a known structure in the PDB database or to group related families of toxins or immunity domains. Structure similarity searches were performed using the DaliLite program [228]. Phylogenetic analysis was conducted using an approximately-maximumlikelihood method implemented in the FastTree 2.1 program under default parameters [229]. Predicted lateral transfers to eukaryotes were further evaluated for false positives by ensuring they were embedded in contigs or complete chromosome sequences with other genes typical of eukaryotes, comparing exon-intron structure of the genes, studying their phyletic distribution within eukaryotes and comparing the protein distances of the predicted eukaryotic proteins (as measured by bit scores) with bacterial homologs. Structural visualization and manipulations were performed using the VMD [230] and PyMol (http://www.pymol.org) programs. Automatic aspects of large-scale analysis of sequences, structures and genome context were perfomed by using the in-house TASS package, which comprises a collection of Perl scripts. Supplementary material can also be accessed at ftp://ftp.ncbi.nih.gov/pub/aravind/ TOXIMM/toximDBsupplementary.html.

Reviewers' comments
Reviewer 1: Dr. Igor Zhulin (Oak Ridge National Laboratory, USA) I have conflicting views on this paper. On one hand, I have read Introduction, the beginning of Results & Discussion (the authors lost me half through this section though as it become very descriptive and I had a hard time connecting the pieces), and Conclusions with a great interest. The topic is fascinating and the amount of work that has been done is unbelievable. The authors analyzed an enormous amount of data, both published and results of their computational research, and presented not only a catalog of proteinaceous toxin systems, but a multi-scale picture of their roles in various biological processes. On the other hand, it all came at a high price of lacking necessary details regarding computational analyses and focus. I perfectly understand that presenting such a huge amount of information requires sacrifices in some areas, but I do not think that it should be in describing "experimental procedures". It is a generally accepted policy in science that procedures must be presented in a sufficient detail, so experiments can be independently reproduced. This paper, in my opinion, does not fulfill this requirement. The section "Search strategy to identify new toxins and immunity proteins", which serves the purpose of providing such details, gives only a very general description. Authors' response: We have altered the Material and Methods to provide more extensive details regarding the procedures we followed with respect to sequence and structure analysis. We do not agree with the referee' s statement that experimental procedures have been sacrificed. In essence all the sequence and structure analysis was performed using publically available programs, which have been published and are well-known in the computational biology community, if not more widely. In the current version of the Material and Methods we describe these without omission and any reader with access to appropriate computer resources can use the same. We also disagree with the referee' s allegation of the lack of sufficient information for independent reproducibilitysee below for further details in this regard. Finally, the length and overall organization of this paper makes it very difficult to follow it through and the lack of page numbers is inexcusable for a manuscript that has 130 of them. Nearly each of the 38 subchapters of this paper has its own introduction and reads as a separate story. As a result, we do have an encyclopedia of polymorphic toxin systems, but its true scientific quality is hard to estimate. Personally, I would rather see much smaller pieces of this work presented in a concise way with all details of searches and analyses clearly shown. The global view that authors aimed at presenting is much better suited for review papers. Here we have a lot of original work mixed up with a review of literature: the number of references in this paper is higher than in many comprehensive reviews on similar topics. I think the quality of both original work and review suffers from this mix. The bottom line is that to me this is a paper that reaches very interesting conclusions, but which is very difficult to comprehend in its entirety and some (if not many) of its results cannot be verified (or are very difficult to verify) independently. Authors' response: We regret the inconvenience caused by the lack of page numbers, which stems from using a PDF reader which provides the page numbers as against a print version. The referee raises three basic issues which we address below-(i) Length of the articlesingle long versus multiple short papers: Short articles are useful when a single domain or computational observation needs to be succinctly presented. Indeed, upon our initial discovery of these systems we published two shorter articles outlining just the details of specific aspects of them. However, upon further investigation it became clear that neither those two works nor subsequent experimental studies on these systems really do justice to the magnitude of domain diversity seen in these systems. Unlike many other systems, despite these proteins being around and accumulating in the non-redundant protein database for now more than a decade, there has been hardly any comprehensive study on them. This is testified by the rather rudimentary annotation borne by most of them in protein databases. This being the first such treatment on a long-neglected class of highly represented proteins meant a particularly long paper. Furthermore, the practical aspects of publication meant it was quite infeasible to prepare numerous separate small papers and submit each for peer-review. We realized in course of our study that splitting the individual discoveries into multiple manuscripts would dilute the big picture emerging from these systems. With respect to shorter works being easier to read than a comprehensive manuscript as this we opine that it is largely a matter of taste. It may be noted that referee two, despite finding the length remarkable, commented regarding its easy readability. The apparent selfsufficiency of the sub-sections is primarily to help readers who might be more interested in one or few of toxin or immunity domain families but the text has been edited to minimize redundancy. Hence there is no repetition of material between sections. (ii) Review versus original paper admixture: We disagree with the referee in saying that it is a mixture of review and original research. The "review" aspect is limited to the introduction and general conclusions, as is typical of any research paper. It should be kept in mind that any kind computational analysis work based on sequence/structure analysis needs to place newly identified domains in the context of what is already known in order to make new functional predictions. This is exactly what we dothis necessitates the mention of previous studies and also precedence of biochemical activities for functional inference. We do not see this as being a mixture of review with new results but merely an aspect of building a functional argument. While there are several domains and ideas presented in this study, we were particular in only emphasizing those that are novel and discovered in this study. In our calculation,~85% of our dataset (that has about 250 toxin and immunity domains) is not found in any domain database. Those that are already present in protein domain databases like PFAM, they are typically listed as domains of unknown function (DUFs) and are need of functional annotation. (iii) Reproducibility: As noted above, we do not accept the claim that our results are not reproducible. Of course, the ease of reproducibility depends entirely on the time available to one attempting it. We should emphasize that all the computational discoveries reported here use standard sequence/structure analysis techniques laid out in the Material and Methods, as is typical of a paper in this field. Those cases involving more difficult detections we explicitly mention in the paper program used and statistical support for the particular relationship or the Z score cutoffs used by DALIlite for structural relationships. Since we have provided Genbank identifiers (gis) for the prototypical proteins of every group, all the remaining relationships can be reproduced by running profile searches with PSI-BLAST, HMMsearch3, JACKHmmer or HHpred on the Web or locally, either in a unidirectional or transitive fashion. Most importantly we have provided one of the most extensive supplements for a sequence/ structure analysis paper --alignments for each toxin and immunity domain have been provided; hence, obtaining starting points for reproducing searches should not pose any difficulty. The gis of all proteins under consideration are also provided along with an appropriate classification. This allows for independent verification of architectures and operonic associations. In addition to the extensive tables in the body of the article which provide details regarding active sites and phyletic patterns, the data is also provided in the supplement as searchable tables, where readers can browse the data by species, domain, operons, and pathway of secretion. We fear the referee did not peruse the extensive supplement that provides all the material for reproducing the presented analysis. In the revised version we have further improved the presentation of the supplement to improve ease of access to the alignments. We will also upload all the new alignments to protein databases such as Pfam making the material available upon publication to facilitate easy reproduction and use of the presented results. Reviewer's reponse to above: I am not persuaded with authors' arguments regarding their description of "experimental procedures". Let me consider just the first paragraph of Materials and Methods, which is shown below (in italics) in its entirety and is fragmented only by my interjections.
As described in the search strategy, protein sequences corresponding to predicted toxins, trafficking, presentation, processing and immunity domains were isolated using diagnostic domain architectures and gene-neighborhood templates, that were initially identified in previous studies [17,18] (Figure 1). These domains were then used as seeds in iterative profile searches with the PSI-work of experimentalists for years to come. I do have, however, several small concerns about data presentation and some comments that have to do with the broader discussion of bacterial evolution. More specifically: Authors' response: We thank the reviewer for his positive comments and suggestions. p. 21-22: a few homologs of multidomain polymorphic bacterial toxins are purported to be present in eukaryotes (e.g. gi 321474287 in Daphnia and Tox-REase-8 in a subset of insects), and it is surmised that they have been horizontally transferred from bacteria. How do we know that these genes are indeed found in the genomes of these eukaryotes, and do not represent endosymbiont DNA or other contamination? Have the genomic contigs been assembled, do these genes display eukaryotic features -e.g., introns? Authors' response: In our analysis, we were particularly careful in eliminating false assignments of lateral transfer to eukaryotes and used several parameters to decide if the laterally transferred genes were indeed encoded by the eukaryotic species. In the simplest scenario, the presence of introns was indicative of their eukaryotic presence. For example, the gene for gi 321474287 in Daphnia contains 11 introns, whereas most Tox-REase-8 genes in insects at least contain one intron, eliminating the possibility of these genes being contaminants. Other parameters that were considered include: 1) Elimination of sequences that were identical or almost identical to bacterial sequences. In our dataset, none of the proteins assigned as laterally transferred showed any identities or near identities to bacterial sequences; 2) Most proteins assigned as laterally transferred to eukaryotes also showed a presence in more than one eukaryotic species, which further helps in eliminating false lateral transfer assignments. For e.g. Tox-REase-8 is present in crustaceans, insects and placozoans. Similarly, Tox-GHH domains are present in five major lineages of bacteria, while in the eukaryotes they are only found in multiple metazoan species (TCAP domains of teneurins). In response to this comment and to that made by Reviewer 3, we have explained this procedure in more detail in the Materials and Methods. p. 44-45. The gene neighborhood network shown in Figure 12: I am not sure what it is supposed to visualize. The authors state that the direction of the edges is important, i.e., it shows the 5' to 3' order of genes or protein domains; but the arrowheads are barely visible even in the pdf at magnification 250%, and will not be seen online. In any case, the edge density is so high that the main message seems to be 'anything can link to anything'. The graphs become more sparse when clade-specific connections are shown -this is more interesting, but perhaps visualization would be better if the density of connections is modeled by the edges of different thickness. Authors' response: We agree with the reviewer that the full view of the domain architectural network was too dense for a detailed view. We have now added a simplified graph next to the central graph that further combines all nodes into metanodes based on their functional type. This simplified graph gives a better view of the follow on connectivities across all toxin polypeptides. For example, it clearly shows that toxin domains detected in this study are almost always at the C-terminus of the protein.
The next several comments have to do with somewhat superficial and inconsistent discussion of relative plausibility of various evolutionary scenarios. To wit: p. 46 "The phyletic pattern of this system suggests that it might have emerged inthe proteobacteria-bacteroidetes assemblage (members of the group I bacterial division [183]) followed by transfer to a subset of group II lineages such as negativicutes and fusobacteria." ---Why not the other direction, or ancestral origin followed by gene losses (especially given that these scenarios are discussed later for essentially the same phyletic vectors)? Authors' response: The above argument is based on parsimony. In this study, we notice a strict correlation between the occurrence of T5SS and the presence of an outer membrane. Most lineages from Group I bacteria (including all proteobacteria and bacteroidetes) contain an outer membrane and also components of T5SS. In contrast, most lineages of Group II bacteria contain only one membrane layer around the cell further encapsulated by a cell wall. Some exceptions include the negativicutes which are a subset of firmicutes that have an outer membrane. Since the ancestral state of the Group I and Group II bacteria can be generally reconstructed as possessing an outer membrane in the former and containing a single membrane layer in the latter, we propose that the T5SS were laterally transferred to the negativicutes and fusobacteria . We have added an additional remarks in this regard in the revised manuscript.
Referee's further response: The explanation is fine in this case, but compare it to the following point-counterpoint. p. 52-53: "This general rarity of the polymorphic toxin systems is in striking contrast to the general prevalence of the toxin-antitoxin systems across archaea [22]. This distribution, with a dominant presence in most major clades of both group-I and group-II bacteria, suggests that polymorphic toxin systems could have been present in the ancestral bacterium." ---First, what is meant by "this distribution"? My understanding is that "this distribution" includes "general rarity" of polymorphic toxins in archaea. How can rarity of a system in archaea suggest its presence in bacterial stem, as opposed to later invention in bacteria? I suspect that this is mostly unfortunate wording that should be edited. In contrast, my second concern is more fundamental: essentially, any phyletic distribution may be interpreted as 1. ancestral presence of a gene followed by gene losses, or 2. later invention in one clade followed by horizontal transfers to to the other clades; or else 3. some combination of ancestral presence, losses and HGT. To turn these scenarios from mere hand waving to something supported by the evidence, one has to specify the model of gene gain and gene loss more explicitly, or to bring in some auxiliary evidence that favors one of the explanations. I do not see much of this here. Authors' response: We agree that this section was a bit unclear and we have now revised it. Similar to the previous point, the polymorphic toxin systems that we report in this study are present in all major lineages of bacteria. While there is no denial that extensive lateral transfer of these systems occurs, the presence in the ancestral bacterium with divergence mirroring the evolution of different secretion systems within the bacterial superkingdom is a parsimonious argument. In contrast only a few archaeal "species" contain these systems suggesting that they were probably not present in the ancestral archaeon. Parsimoniously, this suggests that the few archaeal polymorphic toxin systems were acquired from bacterial versions, because alternatively it would require a large number of gene losses in different archaeal linaeges. Referee's further response: In the previous exchange, the presence of a gene at the root of group I only, but not at the root of group II nor at joint root of I + II, was called "parsimonious". Now, presence at the root of all bacteria is believed to be parsimonious, when the same set of taxa is examined. What kind of parsimony is invoked in each case? (I think I can discern the answer from the next two sentences, but please correct me if I am wrong). The authors appear to understand parsimony as the explanation that requires the smaller number of events. I cannot accept this as an always-preferable explanation, when it does not matter what these events are and how are they counted; in a moderate form, however, we can use parsimony as a criterion of selecting the null hypothesis, i.e., "choose the scenario with the smallest number of events, unless the additional evidence suggests that a more complex scenario has to be considered". I think that, in this case, however, precisely such additional evidence is available in the form of evolutionary estimates of the relative rate of gene gain and gene loss: almost every estimate suggests that on average gene losses are moderately to highly more frequent than gene gains. So, unweighted parsimony will not work in these casesa scenario with 1:1 gain-to-loss ratio will be actually making an additional assumption of a relative loss rate that is constrained to be lower than what is observed in nature. Everything is then hanging on the word "large"how large the excess of losses in archaea is, so that this makes the scenario so unlikely? Authors' response: We agree that the general frequencies of gene loss tend to exceed those of gains. However, with respect to the toxin systems in archaea we are dealing with the following situation: The non-redundant database has representatives from over 225 completely sequenced WGS sequences. Classical polymorphic toxin-like systems are found only in about 15 of them. Thus, there are approximately 15 times the archaeal genomes which lack these as those which have these systems. Approximately more 1/3 rd of the bacterial genomes have at least one such system. Hence, although the referee is right in pointing to the differences in the rates of loss exceeding gain, we believe our original reasoning based on the parsimony argument is a valid one. Referee's further response: This is also supported in phylogenetic trees, where the archaeal toxins or immunity domains group with particular bacterial versions. Is this true for the trees of all families, or only some? Authors' response: Baring the barnases where the relationship is difficult to ascertain one way or another, consistently the other toxin domains shows the archaeal branches embedded within the bacterial radiation. p. 53, the following sentence: "However, it should be noted that these genes and cassettes are highly prone to lateral transfer as suggested by the sporadic phyletic distribution of both toxin domains and immunity proteins [17]. Hence, the distribution of these systems might also reflect in part the secondary dispersion of such systems across diverse bacteria by lateral transfer." ---Essentially, this is the same as to say that inheritance of any genetic element may be either vertical or horizontal. So? Authors' response: While the sentence might on the surface appear trivial but needs to be seen in light of the earlier comment on the polymorphic toxins being inferred present in the stem of the bacterial superkingdom. While that inference can be made based on the distribution of the toxins and their corresponding secretion systems, we intended to provide a more realistic picture (the above sentences), lest it be taken that their evolutionary history was predominantly vertical since their emergence early in bacterial evolution. Referee's further response: Once again, in the exchange regarding the statement on p. 46, the inference was that certain toxin was present in the step of proteobacteria + Bactoroidetes, but not in the stem of all bacteria. I suppose the scenarios are really different for different toxinscan this be made more explicit? Authors' response: The toxin distributions in bacteria are certainly affected by lateral transfer so we cannot be certain of the inference of particular toxin in the common ancestor. Nevertheless, based on the differential distributions, we can tentatively propose that some of the widespread versions, such as the barnase, HNH and deaminase domain toxins might have been present in the stems of the major bacterial clades such as those uniting the group-I bacteria or group-II bacteria. p. 53: "Certain patterns of distribution of polymorphic toxin systems appear to transcend phyletic boundaries. . . 1) the hyperthermophiles, which are often chemoautotrophs, from both bacteria and archaea show a strong tendency to lack such systems." ---this seems to be the case of multiple losses in bacteria, possibly favored by similarity in the habitats, and possibly ancestral absence in archaea. Ecological adaptations like this 'transcend phyletic boundaries' more or less by definition -is this the point? Authors' response: While adaptations directly related to an ecological niche are indeed obvious in terms of transcending phyletic boundaries, this is not necessarily the case with inter-organismal conflict systems, which do not directly relate to the ecological niche. Since we nevertheless found correlations between these systems and ecology, we felt it would be useful to point them out. This would help understanding the more subtle effects of ecology of a species on their interactions with conspecifics and other organisms. Referee's further response: The correlation has been observed between hyperthermophily and lack of polymorphic toxins. As the authors imply, this may in fact be the correlation between chemoautotrophy and lack of toxinsor is it? Which effects here are gross, and which are subtle? Could it be, for example, that hyperthermophily is generally correlated with reduced repertoire of all kinds of secreted proteins, which would be more easily destabilized and inactivated by adverse environment outside the cell? Authors' response: We agree that the point raised by the referee regarding temperature affecting protein stability and thereby placing a selective constraint on the number of toxins could be in principle a valid alternative explanation. However, beyond certain compositional and length distribution differences the total number of secreted and membrane proteins in hyperthermophiles do not appear to be significantly different from other organisms (e.g. Nilson et al. Proteins. 2005 Sep 1;60(4):606-16.) Hence, we are not certain if this explanation might be more relevant than autotrophy, which additionally accounts for the comparable situation in photosynthetic autotrophs. p. 56: in the case of oral microbiomes, I am not sure how some species were assigned to 'biofilm-forming' category and others to 'cheaters' -I think that at least some species in the latter category are biofilm-forming in their own right. Authors' response: As pure cultures, all these species are likely to form biofilms, but the oral environment is a mixed population of diverse bacterial species, and it is well known that oral biofilms are comprised of mixed bacterial species (Paster BJ et al. Bacterial diversity in human subgingival plaque, ref 198). In this context, we hypothesize that the number of toxin and immunity domains predicts how a species will interact with another one during the formation of a mixed biofilm. Reviewer 3: Dr Frank Eisenhaber (Bioinformatics Institute, Singapore) I agreed to be a reviewer when reading the author list only to find out that MS is by far the longest that I have ever seen as reviewer in my life. Despite of the initial horror and of the impressive length, the text is a fine readingboth as a research paper and as a review of this specific field. One would not think to shorten it by a page. The thoughts and results are plausible (there is no hope to repeat the calculations even partially). There is considerable care for the detail throughout the text, figures and additional files (except for very minor things such as ref. 144 appearing incomplete). I find the generous addition of supplementary information especially notable. Possibly, this will be of greatest benefit for people creating annotation pipelines and sequence databases. For practical purposes, the authors might think to add archives with all the individual alignments in single files and domain models in several formats such as the HMMR2, HMMER3, etc. ready made. I think that the work is a welcome addition to the scientific literature. Authors' response: We thank the reviewer for his positive comments and suggestions. A more user-friendly supplementary file is now provided with the alignments of the toxins and immunity domains as separate files in a zipped format. We will additionally upload all alignments to protein domain databases such as Pfam, so that researchers can access them more easily. Ref. 144 has been updated in the revision.