From: Systematic analysis of somatic mutations driving cancer: uncovering functional protein regions in disease development

Outline of the method. All local somatic mutations are collected from the COSMIC database for a given gene, discarding mutations coming from hypermutated samples (see Methods) and mutations overlapping with low complexity regions. Next, a seed region in the corresponding protein sequence is selected and is assessed for significant enrichment of mutations compared to the expected random distribution using a one-sided Fisher’s exact test. Next, if the selected region is significant (p-value <0.01) its boundaries are moved to either side to locally maximize significance. This is repeated for all possible seed regions of 7, 10 and 30 residues in length. After the evaluation of all seed regions, the resulting optimized regions are merged if overlap occurs between them. For an exhaustive description of the algorithm see Additional file 1

