Human TIR containing proteins and conserved motif
As described previously, TIR subfamily is represented at least in 25 genes in the human genome. They can be majorly classified into three categories: Toll-like receptor proteins (TLR1, TLR2, TLR3, TLR4, TLR5, TLR6, TLR7, TLR8, TLR9, TLR10), TLR signaling adaptor proteins (MyD88, TIRAP, TRIF/TICAM1, TRAM/TICAM2, SARM1), Interleukin receptors and accessory proteins (SIGIRR, IL1AP, IRPL1, IL1R1, ILRL1, ILRL2, IL18R, I18RA, IRPL2), PIK3AP1 and BCAP.
This set of proteins were further employed as queries to search for representative sequences from other taxa (Primates, Odd-toe ungulate, Even-toe ungulate, Carnivore, Placental, Whale and dolphins, Chiropteran, Rodentia, Lagomorpha, and Insectivores) to understand the phylogenetic relationships amongst these proteins. The sequences of the 25 TIR containing genes, along with their representative sequences from different taxa, are provided in Additional File 2.
A maximum likelihood-based tree is as Additional File 3. The respective branch lengths are mentioned and bootstrap values are shown in different sizes of circles. Also, the node ID shows the representative organism from each group. Each TIR domain containing proteins is colored distinctly. The phylogeny clearly shows the varying branch length and clustering of sequences into three major groups as specified earlier. The plasma membrane-based TLR proteins (TLR1, TLR2, TLR4, TLR6, TLR10) and endosomal membrane TLR proteins cluster separately (TLR3, TLR6, TLR7, TLR8, TLR9). TLR5 is found to cluster with TLR3 which may be because of closer identity amongst them. TRIF clusters with other adaptor molecules TRAM and TIRAP as the third cluster. Interestingly, in this phylogeny, the TRIF/TICAM1 was the protein with the highest branch lengths and appears within this cluster. From the motif search, TRAM and TRIF were seen to only have Common-4 motif conserved, unlike conventional TIR domain containing proteins which have Box1, Box2, and Box3 as well. Hence, we decided to perform genome-wide sequence search of TRAM and TRIF across all available taxa.
Search for TRAM and TRIF orthologues
TRAM and TRIF play role in the MyD88 independent pathway and, are involved in IRF3 and IFN β signaling via TLR4 (involves both TRAM and TRIF) and TLR3 (only TRIF) receptors. Activation of TRIF dependent pathway also helps in dendritic cell maturation, thereby acts as a link between innate and adaptive immune responses [2]. Therefore, a query set of TRIF and TRAM proteins from 25 organisms across different orders of Mammalia were considered to search for their orthologs (please see “Methods” section).
TRIF and TRAM share higher sequence identity compared to other TIR containing proteins. So, for comparative analysis, a sequence identity comparison was done for the TIR domain of TRAM and TRIF which is shown in Fig. 2A. This shows TRAM-TIR shares around 40% sequence identity with TRIF-TIR across all Mammalia taxa. The orthologs obtained after CS-BLAST search from TRAM and TRIF queries were used to construct a subfamily-specific sequence similarity network (SSN) using ZEBRA which is shown in Fig. 2B [10]. The SSN show 10 distinct protein subfamily clusters represented by different colors. The nodes between 45 to 100% pairwise sequence identity was connected. Also, we found a pairwise sequence identity of only 31% across the TRAM-TIR (PDB ID: 2M1W) and TRIF-TIR (PDB ID: 2M1X) [11]. Additionally, Subfamily 6 and 10 were seen to be clustered together with TRAM-TIR thereby representing TRAM orthologs. Similarly, Subfamilies 1, 2, and 5 were clustering well with TRIF-TIR representing TRIF orthologs. Apart from these, Subfamily 3 and 4 were found to be scattered and Subfamily 7, 8, and 9 clustered together. Few sequences were considered as outliers and were not shown in SSN. We constructed an unrooted phylogeny from CSBLAST hits to examine the relationship between these bigger clusters. The unrooted phylogeny is shown in Fig. 2C and it shows well-separated clades diverging at the base of the tree. The basal node represents sequences from Belcher's lancelet (Branchiostoma belcheri) and Florida lancelet (Branchiostoma floridae), commonly referred as Amphioxus and is grouped together as Leptocardii. These are known to be the oldest basal chordates with MyD88-independent pathway [9]. A similar pattern of clustering is seen in the phylogeny tree and TRAM separates well and distinctly from TRIF. Further, since subfamilies 7, 8, and 9 cluster with TRIF, a detailed analysis of genes that harbour these domains were performed. Analysis of domain architecture can enable to understand gain and loss of co-existing domains and diversification of function.
Domain architecture among subfamilies
A typical human TRIF protein has TRIF-NTD, TIR_2, and RHIM domains. Here the N-terminal is used for activation of IFNβ promoter activity [12]. TIR_2 domain is involved in homo and heterotypic TIR interaction for signaling and the RHIM domain is important for NF-κB activation [13]. Whereas the TRAM protein contains only the TIR_2 domain. Additionally, TRAM’s isoform, TAG contains EMP24_GP25L along with the TIR_2 domain. This EMP24_GP25L domain is implicated in bringing the cargo forward and binding to coat protein [14].
Next, we examined the unique domain architectures and the representative taxa amongst them. A pictorial representation of the same has been shown in Fig. 3. Also, the number of sequences in each category is mentioned. Thereby with respect to Subfamily clusters as obtained in SSN analysis, Subfamily 6 and 10 includes TRAM and TAG which is a splice variant of TRAM (TRAM adaptor with the GOLD domain) [15]. Also, subfamily 10 majorly consists of hits from Aves, and in which no conventional TIR domain was found instead a RVT_1 (Reverse transcriptase domain family) domain was found. Apart from this Subfamily 1, 2, and 5 consists of TRIF sequences with well-annotated domain architecture from Mammalia, Aves, Bifurcata, Crocodylia, and Cryptodira. Amongst these Subfamily 1, 2, and 5 each of them clusters distinctively, with Subfamily 1 including sequence only from Mammalia. Whereas Subfamily 5 and 2 both have sequences from Aves and Cryptodira. Moreover Subfamily 2 also have sequences exclusively from Bifurcata and Crocodylia. Besides these, Subfamilies 3 and 4 have sequences from Chondrichthyes, Coelacanthimorpha (living fossil), and few sequences from Amphibia and Actinopteri. But most members of Actinopteri clusters together under Subfamily 7, 8, and 9. The domain architecture for Actinopteri consists of TRIF, TIR_2 domain and lacks the RHIM domain (HMM scan, inclusion e-value = 0.001).
Sequences from Leptocardii were considered as an outlier, so we looked at the secondary structure of sequences from the primitive organisms. Sequences from Leptocardii, Chondrichthyes, and Coelacanthimorpha were compared with respect to sequences from human TRAM and TRIF. Good conservation along the TIR domains can be seen in the alignment. The image is attached in Additional File 1: Fig. S3. Moreover, good secondary structure conservation hints towards ancestral homology.
Phylogeny of TRAM and TRIF orthologs
We separated the hits into TRAM, TAG, and TRIF subgroups and constructed a phylogeny using Maximum likelihood with 100 bootstraps and branch lengths.
TRIF orthologs were found across Chondrichthyes, Coelacanthimorpha, Actinopteri, Amphibia, Cryptodira, Crocodylia, Bifurcata, Aves, and Mammalia. Human TRIF has three typical and two atypical TRAF6 binding sites. The generic motif representing this site is denoted by [PxExxD/W/E/F/Y]. The motif sequence and residue positions of human TRIF protein for typical TRAF6 binding sites are PEEPPD (86–91), PEEMSW (250–255) and PVECTE (301–306), whereas the same for atypical sites are PLESSP (491–496) and PPELPS (264–269). Amongst these, the second typical TRAF6 binding site [PEEMSW (250–255)] is one of the most important sites for TRIF interaction with TRAF6 and NF-κb induction [16]. Apart from these, TRIF also has a pLxIS motif ([[NDQEKR]LxIS]) which is a phosphorylation site used for inducing IRF3 activation and RHIM interacting motif ( [[IV]Q[ILV]GxxNx[MLI]]) which is part of RHIM domain and is important for inducing apoptosis [17, 18]. Human TRIF has these motifs in following order with their residue position ~ TRAF6(86–91)~pLxIS(207–210)~TRAF6(250–255)~atypical-TRAF6(264–269)~TRAF6(301–306)~atypical-TRAF6(491–496)~RHIM motif(687–695). Amongst these, the highlighted motifs are shown in the detailed phylogeny in alignment form in the same order as they appear in human TRIF protein (Uniprot ID: Q8IUC6) [12]. Members of Mammalia, Amphibia, Cryptodira, Coelacanthimorpha shows conserved pLxIS motif and among Bifurcata, Actinopteri, Aves, Crocodylia only few organisms have pLxIS motifs. Besides these, the second typical TRAF6 binding sites (PEEMSW) is seen conserved only in Mammalia and some organisms of Aves, it may be possible that other TRAF6 binding sites may be functional in other taxa. Whereas the RHIM motif along with the RHIM domain are seen to be conserved in all taxa except for Actinopteri and Leptocardii.
We have represented motifs and domain annotation for each organism in the phylogeny. We observe that none of the members of Crocodylia taxa show TIR_2 domains with inclusion e-value (HMM scan, inclusion e-value = 0.001). Even on increasing the value to 0.1 only one of the members Crocodylus porosus, shows TIR domain at e-value = 0.048, but that was an insignificant search (Fig S6). The detailed phylogeny with motifs and domain annotations is attached as Additional File 4.
Unlike TRIF, human TRAM (Uniprot ID: Q86XR7) consists of a putative N-Myristoylation site (residue position: 2–7) that helps in its localization to the plasma membrane and is critical for TLR4 pathways in response to LPS [12, 19]. The N-Myristoylation site has a consensus motif sequence of [G{EDRKHPFYW}xx[STAGCN]{P}], which is important for signal transduction [20]. TRAM protein also has a putative TRAF6 binding motif [PxExxP] (residue position: 181–186) that helps to interact with TRAF6 and mediate activation of the inflammatory responses by TLR4 [21]. Upon observing these in the phylogeny, we found the TIR_2 domain annotation and conserved motifs were missing from Aves, suggesting they may not have a functional TRAM protein. Additionally, Aves taxa has the highest branch lengths and highest amino acid distances as shown in Additional File 5. This implicates that Aves taxa may have undergone a higher divergence and genetic changes over evolutionary time. Previously reported studies with representative sequences across taxa claims TRAM to be lost in fishes, birds, and amphibians. But interestingly by our genome-wide search, we found TRAM orthologues for Aves [22]. None of the TRAM orthologue sequences were retrieved from Actinopteri (bony fishes).We found TRAM ortholog sequence in Callorhinchus mili from Chondrichthyes (cartilaginous fish), it also shows a concentional TIR_2 domain architecture along with conserved motif. This may be the oldest organism with functional motifs conserved with TIR_2 domain. Also, amongst members of Amphibia, all the hits have conserved TIR domain along with conventional N-Myristoylation motif except for Xenopus laevis. On combining all of our observation we observe that TRAM may have diverged from TRIF and first appeared in Chondrichthyes (399 mya), was then lost in Actinopteri, and then reappeared in Amphibia (323 mya), Cryptodira, Crocodylia, Bifurcata, Aves, and Mammalia.
TRAM isoform (Uniprot ID: Q86XR7-2) with GOLD domains was also seen across one Bifurcata, one Crocodylia, a few Aves, and Mammalia taxa. All of them depicted the well-conserved domain architecture of EMP24_GP25L (primarily involved in the transport of cargo molecules from the endoplasmic reticulum to the Golgi complex [23]) and TIR_2 domain. The difference in sequences of canonical human TRAM and isoform TAG is seen at the position from 1 to 20 (MGIGKSKINSCPLSLSWGKR→MPRPGSAQRW..), this has been shown in the alignment. A similar pattern can be seen across Mammalia. TAG seems to be found mainly in Mammalia, Aves, and in a few cases of Crocodylia, and Bifurcata. Extensive genome sequencing can help recognize TAG across other organisms. An extensive phylogeny showing the presence of TAG in different taxa is shown in Fig. 4.
A representative phylogeny from each category is shown below showing the proportion of organisms. The evolutionary timeline for the divergence of TICAM is also shown below in Fig. 5 [24]. The detailed phylogeny for TRIF and TRAM is attached as Additional Files 4 and 5.
Conservation of Synteny
To focus on the orthologous relationships of TRIF and TRAM, we next examined the syntenies and check the preservation of the order of genes amongst the species and check for the neighbour genes. One representative from each taxon e.g., Homo sapiens (Mammalia), Falco peregrinus (Aves), Gekko japonicus (Bifurcata), Alligator mississippiensis (Crocodylia), Chelonia mydas (Cryptodira), Xenopus laevis (Amphibia), Callorhinchus milii (Chondrichthyes), Danio rerio (Actinopteri), Latimeria chalumnae (Coelacanthimorpha) and Branchiostoma belcheri (Leptocardii) were selected. The accession ID and domain architecture for these representatives can be seen in Additional Files 4 and 5.
There is overall good correspondence amongst the neighbours of TRAM and TRIF orthologs, except TRIF of Amphibia (lacks the gene neighbours), and TRAM of Aves (does not have a full one-to-one correspondence) as seen in Fig. 6. Interestingly, FEM1C and CDO1 like genes were also found to be neighbouring TICAM2 genes in Chondrichthyes and other TICAM2 orthologs. Due to the event of whole-genome duplication occurring at Actinopteri, two forms of TICAMs are generated. Since these duplicates are preserved across the lineages, there may be subfunctionalization or neofunctionalization [25]. To ascertain this, it will be interesting to perform functional characterization of distant orthologs from Coelacanthimorpha and Actinopteri.
Evolutionary selection pressure in TRIF and TRAM
From the previously obtained TRIF and TRAM phylogeny we observed the variation among branch lengths, and also the varied domain architecture amongst the sequences.
We were further interested to look into the selection pressure of TRIF and TRAM protein. We used the orthologues sequences and used site model from Codon substitution models (codeml) of PAML4.9 package. The implemented site models (M0, M1, M2, M3, M7, and M8) allowed ω ratio (ω = dN/dS, ratio of nonsynonymous/synonymous substitution rates) to vary among codon sites in protein. Based on the Likelihood ratio test, using the Bayes empirical Bayes (BEB) method or posterior probabilities of model M8 for 11 site classes (k = 11) along the sequence of the protein was plotted. The values from 11 site classes were grouped into two categories as ω > 1 and ω < 1. Also, another graph depicting the mean probabilities for each site was plotted in Fig. 7.
From both these sequences, one can notice that the posterior probability of ω was moreover < = 1 for sequences in domain regions. Although some positively selected sites were detected. The propensity of these positively selected sites was high and mostly amongst the non-structural regions. A list of positively selected sites with a probability above 0.95 has been provided in the Additional File 1: Figs. S4 and S5.