In silico regulatory analysis for exploring human disease progression

Background An important goal in bioinformatics is to unravel the network of transcription factors (TFs) and their targets. This is important in the human genome, where many TFs are involved in disease progression. Here, classification methods are applied to identify new targets for 152 transcriptional regulators using publicly-available targets as training examples. Three types of sequence information are used: composition, conservation, and overrepresentation. Results Starting with 8817 TF-target interactions we predict an additional 9333 targets for 152 TFs. Randomized classifiers make few predictions (~2/18660) indicating that our predictions for many TFs are significantly enriched for true targets. An enrichment score is calculated and used to filter new predictions. Two case-studies for the TFs OCT4 and WT1 illustrate the usefulness of our predictions: • Many predicted OCT4 targets fall into the Wnt-pathway. This is consistent with known biology as OCT4 is developmentally related and Wnt pathway plays a role in early development. • Beginning with 15 known targets, 354 predictions are made for WT1. WT1 has a role in formation of Wilms' tumor. Chromosomal regions previously implicated in Wilms' tumor by cytological evidence are statistically enriched in predicted WT1 targets. These findings may shed light on Wilms' tumor progression, suggesting that the tumor progresses either by loss of WT1 or by loss of regions harbouring its targets. • Targets of WT1 are statistically enriched for cancer related functions including metastasis and apoptosis. Among new targets are BAX and PDE4B, which may help mediate the established anti-apoptotic effects of WT1. • Of the thirteen TFs found which co-regulate genes with WT1 (p ≤ 0.02), 8 have been previously implicated in cancer. The regulatory-network for WT1 targets in genomic regions relevant to Wilms' tumor is provided. Conclusion We have assembled a set of features for the targets of human TFs and used them to develop classifiers for the determination of new regulatory targets. Many predicted targets are consistent with the known biology of their regulators, and new targets for the Wilms' tumor regulator, WT1, are proposed. We speculate that Wilms' tumor development is mediated by chromosomal rearrangements in the location of WT1 targets. Reviewers This article was reviewed by Trey Ideker, Vladimir A. Kuznetsov(nominated by Frank Eisenhaber), and Tzachi Pilpel.

Hypothesis test for determining classifier significance given cross-validation accuracy. A general calculation for Wt1 data (15 known

target genes)
For any classifier we assume a "background" distribution of genomic feature data in a feature space F. For Wt1 we have a sample S of size 15 which has a possibly different distribution than background. Assume for the moment that both distributions are normal since all feature data are standardized to mean 0 and standard deviation of 1. There is then a fixed w vector which best differentiates the means of these distributions. The cross validation accuracy for Wt1 is 68% so our question is: How likely is it that the optimal separator of the target and the background distributions will have a 68% correct cross-validation rate? We will show here that the problem becomes approximately one dimensional.
To demonstrate that a 68% prediction rate under cross-validation is statistically significant even with such a small number of positive examples, we define a hypothesis test. We set the null hypothesis H 0 to be that we have picked 15 elements of the background distribution at random. Thus under H 0 in our feature space F, there is no linear information which differentiates binders from non-binders for Wt1. We test the likelihood under H 0 that a 68% positive cross-validation rate comes from the 15 positive genes. For the sample S of 15 positives, let x denote their mean position in F. Then the direction of x denotes an optimal choice of vector w which differentiates the distribution of S from the N(0, 1) distribution of the background.
Assuming that the empirical distribution of S projected onto the direction of w = x has approximately the same unit variance (which would be true if it were normal), we now want the probability that the optimal separator for the distribution of S and N(0, 1) (restricted now to the w direction) yields an empirical cross-validation rate of at least 68%. Note that by symmetry this optimal separator occurs at a distance | x |/2 from 0, and if the SVM can find this choice optimally, we would need the following (under H 0 ) for 68% discriminatory accuracy. We must require S have at least 68% of its distribution in the w direction at a distance of | x |/2 (location of separator) or more from 0. This would mean that the null distribution H 0 (now projected onto w) have at most 32% of its mass at a distance of | x | or more from 0. This requires that the classification threshold occur at z .32 = 0.47.
Note under our assumptions that the probability distribution of x in the w direction is N(0, 1/ 15) = N(0, .2582). We now need to calculate under H 0 the probability we will have a classification threshold at such a location; namely the probability that | x | 0.94 (so that the decision threshold is b 0.47 ). Under the null hypothesis, projecting x onto w, the probability | x | 0.94 is P(Z 0.94/.2582) = P(Z 3.64) = 0.000136 or about 1/7,353. This makes it unlikely that these results are sampled from the background distribution, giving a p-value of .000136 that there is no difference in F between targets and non-targets. Multiplying by 152 to account for the number of TF's, we obtain a p-value of .0207 for such a result for any such TF. Certainly any factor with greater than 15 targets would have an even more significant score.
Assuming that under H0, cross-validation yields an empirical choice of w in the SVM algorithm which is always close to the optimal x.
We know that in order to have 32% of the mass of the null distribution to fall on the positive side of the hyperplane, the hyperplane must fall at Z=0.47. Because it is a maximal margin separator, the mean of x should be no closer than Z=0.94.
Of course the assumption here is that the data is normal and (by Central Limit Theorem) the standard deviation equals 1/sqrt(N).
mean of x divided by standard deviation gives the z-score of the mean of x, which is at 3.64.

Role of WT1 in Nervous Tissue Development and Relation to Wilms' Tumor
The set of combined targets (known and newly predicted) for WT1 is significantly enriched in several annotation categories related to the nervous system and neuron growth: transmission of nerve impulse (p = 0.0069), synaptic transmission (p=0.013), and neurotransmitter receptor (p=0.058). Many genes are annotated to similar categories but do not show statistical significance (Additional File 6). These may still be important since they all relate to development or function of the nervous system. Observations have been made of neuronal differentiation markers in Wilms' Tumor [1], demonstrating that some mechanism in these tumors is activating nerve cell signature genes. WT1 has been shown to be required for normal development of the neurons in retinal [2] and olfactory [3] tissues. Furthermore, analysis of the developing mammalian embryo has shown presence of WT1 in brain, tongue, and retinal tissues [4]. Surprisingly, one highly significant predicted target for WT1 is the TAS1R1 gene, which is a taste receptor responsible for detecting sweet compounds [5,6]. This implies that, aside from its proven roles in eye and olfactory development, WT1 is also involved in taste sensation. Along these lines are also the potential new targets EYA1 and EYA4, which are members of a gene family known to be involved in kidney, eye, and ear disease [7][8][9][10][11].
Another supporting target prediction is the MTMR2 gene which, when mutated, can cause Charcot-Marie-Tooth Disease type 4B [12]. This is a demyelinating disease of the nervous system which causes sensory and motor defects. It is interesting that one of the chromosomal loci implicated in Charcot-Marie-Tooth Disease is 11p15 [13], a key Wilms' Tumor locus. Finally, 48 high confidence targets can be annotated as being either voltage gated ion channels, integral to the plasma membrane, or part of a neurotrophic ligand/receptor interaction (Additional File 6). Taken together, these predictions can provide new hypotheses about the role of WT1 in the nervous system, and point to several genes which may be examined further to elucidate WT1's function in nervous disease. These targets may be involved in producing many of the symptoms observed in Wilm's tumor patients. For example, patients with WAGR syndrome, which causes predisposition to Wilms' Tumor, show mental retardation and aniridia, a defect of the iris [14][15][16][17][18]. Also, there are reported cases of deafness and mental retardation accompanying Denys-Drash syndrome [19], which also predisposes patients to Wilms' Tumor.

Support for the Known Role of WT1 in Migration and Wnt Signalling
Recent evidence indicates that WT1 is involved in cellular migration [20], although few known targets of the TF have been previously reported to be directly involved in this process. Functional grouping of our target predictions reveals a group of 67 genes which are annotated to cellular adhesion, cytoskeleton, or cell motility (Additional File 5). This group includes many cadherin and contactin genes known to be involved in adhesion and migration. Notably, this set also contains WASF1, IRSP53, AFADIN, and ARHGAP6, which are all closely related to actin polymerization and associated with adherens junctions and cell migration [21][22][23][24][25][26]. Also of interest are NECTIN and -CATENIN, core components of the adherens junction itself [24,[27][28][29]. Regulation of these genes by WT1 may play a role in the modulation of cellular adhesion and metastasis in cancer.
The complex behavior of WT1 suggests that different genetic changes must take place in wildtype-WT1 vs. mutant-WT1 tumors (whether they are sporadic or syndromatic cases). Tumors expressing (or overexpressing) wild type WT1 have increased resistance to cell death [30,31]. Tumors with WT1 mutations may become sensitized to apoptosis [30] and thus may accumulate compensatory mutations which activate cellular growth and proliferation. In a study examining a group of WT1-mutant tumors, it was discovered that 75% also contained mutations in the -CATENIN gene [32], a known oncogene and crucial component of the Wnt-signalling pathway. The Wnt pathway influences cell growth, development, migration, and adhesion. It is also a pathway often dysregulated in cancer, containing several oncogenes and tumor suppressors [33][34][35]. It makes sense that a tumor sensitized to apoptosis (WT1-mutation) may compensate for this sensitivity by maintaining a mutant copy of -CATENIN which constitutively activates Wnt signalling.
Near the plasma membrane -CATENIN links cadherins in adherens junctions to -CATENIN [27,29,[35][36][37][38]. As cancerous cells become metastatic, they progress through the Epithelial-Mesenchymal Transition (EMT), a hallmark of which is dissociation of the E-cadherin/ -CATENIN/ -CATENIN complex [38,39]. This would result in loss of adherens junctions and increased cellular mobility. The disruption would release -CATENIN, allowing it to translocate to the nucleus where it cooperates with the TCF/LEF complex to activate targets of the Wnt pathway [32,39].
Although WT1 might conceivably act to repress Wnt signalling, it is more likely that WT1 is a Wnt activator. The case for repression of Wnt is supported by the fact that some (but not all) Wnt targets are upregulated in WT1-mutant as opposed to WT1wildtype tumors [32]. However, the observed activation of Wnt in WT1 mutants can, as indicated above, be attributed to gain of function mutations in -CATENIN, not necessarily loss of repression by WT1. Also, several studies have directly shown that WT1 actually enhances Wnt signaling. This is supported by experiments showing that the expression of WNT4 is reduced in WT1 knockout cells and that induction of WT1 causes an increase in WNT4 expression [40]. A valid model, then, is that in sporadic tumors with wild-type WT1, the TF activates Wnt directly, possibly by downregulating DVL or CTBP (new predictions, P > 0.95), or by upregulating WNT4, TCF, PP2A (the latter 2 are new predictions). In syndromatic tumors with a mutant WT1, it is secondary lesions such as activating mutations in -CATENIN that ensure the Wnt pathway remains active.
Finally, there is some evidence that WT1 may regulate both -CATENIN and -CATENIN. The prediction (P 0.95) that WT1 regulates -CATENIN is intriguing, since it suggests the possibility that WT1 could directly disrupt adherens junctions by repressing this gene. The disruption may activate Wnt signaling by freeing -CATENIN from adherens junctions and allowing it to translocate to the nucleus. Figure A below summarizes the possible relationship between WT1 and Wnt activation, showing the possible routes to Wnt activation when WT1 is either active or inactive. Although less convincing, there is also some suggestive evidence that WT1 may bind the promoter of -CATENIN itself, for which the SVM model assigns a score of 0.7. Closer inspection of the -CATENIN promoter reveals 11 matches to the WT1 consensus site within 600bp of the -CATENIN transcriptional start site ( Figure B). The true relationship between WT1 and the Wnt pathway will have to be elucidated through further experiments, but there is strong evidence that WT1 may exert regulatory control on Wnt mediators and targets.

Figure A -Pathways to Wnt activation in Wilms' tumor
The path to possible Wnt activation is different depending on the state of WT1. If wild type WT1 is present, as in sporadic tumors, WT1 may activate Wnt directly by affecting key Wnt genes like wnt4, TCF, and -catenin. If WT1 is inactivated, as in many syndromatic tumors, secondary mutations, such as activating mutations in catenin cause Wnt activation.  [41], green-GNGNGGGNG [42], blue-GNGNGGGNGNS [43]. The citations refer to the papers in which the binding sites have been established.