Common pathways and functional profiles reveal underlying patterns in Breast, Kidney and Lung cancers

Background Cancer is a major health problem which presents a high heterogeneity. In this work we explore omics data from Breast, Kidney and Lung cancers at different levels as signalling pathways, functions and miRNAs, as part of the CAMDA 2019 Hi-Res Cancer Data Integration Challenge. Our goal is to find common functional patterns which give rise to the generic microenvironment in these cancers and contribute to a better understanding of cancer pathogenesis and a possible clinical translation down further studies. Results After a tumor versus normal tissue comparison of the signaling pathways and cell functions, we found 828 subpathways, 912 Gene Ontology terms and 91 Uniprot keywords commonly significant to the three studied tumors. Such features interestingly show the power to classify tumor samples into subgroups with different survival times, and predict tumor state and tissue of origin through machine learning techniques. We also found cancer-specific alternative activation subpathways, such as the ones activating STAT5A in ErbB signaling pathway. miRNAs evaluation show the role of miRNAs, such as mir-184 and mir-206, as regulators of many cancer pathways and their value in prognoses. Conclusions The study of the common functional and pathway activities of different cancers is an interesting approach to understand molecular mechanisms of the tumoral process regardless of their tissue of origin. The existence of platforms as the CAMDA challenges provide the opportunity to share knowledge and improve future scientific research and clinical practice. Supplementary Information The online version contains supplementary material available at 10.1186/s13062-021-00293-8.

specific characteristics of each tumor. Also, the diverse tissues or even cell types of origin define a set of specific features linked to each cancer which characterize them. However, distinct types of cancer, despite origin differences and derived specific features, share a list of common underlying patterns, known as hallmarks [2]. These hallmarks are a set of basic functionalities that the cell must acquire to become carcinogenic and include features as growth suppressors evasion, the activation of tissue invasion and metastasis or the resistance to cell death mechanisms. As cancer research advances, new biological capabilities emerge as hallmarks, such as evasion of immune destruction and reprogramming cell energy metabolism [3].
In the last years, mainly due to the decrease in the cost of the sequencing techniques and the popularization of bioinformatics methods, the analysis of the genomic and transcriptomic data of patients has become a growing approach to the study of cancer [4,5]. It is widely known that specific mutations are able to induce cancer [4] and that many genes and microRNAs are deregulated in tumor tissues [5]. The study of the genomic characteristics of each cancer has lead to the establishment of diverse subtypes defined by the different mutations they include and with a very heterogeneous response to drugs and survival [6]. To address the differences between these subtypes, many bioinformatic approaches have been developed, such as CancerSubtypes, a R package aimed to retrieve all the main computational methods for cancer subtypes identification and analysis [7].
In the past, many of the approaches followed to analyze cancer data in Bioinformatics were based in the analysis of genes individually. However, genes do not work alone, but in combination with many other genes in what can be understood as a network of interactions. In particular, signaling pathways emerge as an important cell mechanism which allow cells to respond to external stimuli and are at the core of many cell deregulation patterns which lead to different diseases [8,9].
The study of signaling pathways in the context of cancer has proven to give interesting results both in determining the mechanisms of the diseases and in the prediction of survival time [10,11]. The tool Hipathia [12] has recently demonstrated its ability to detect differentially activated subpathways while keeping a very low False Positive Rate, outperforming many other signaling pathway tools [13].
With the massive amount of data being generated around the world as a result of cancer analysis every day, one of the main aims of cancer bioinformatics is to manage and integrate this information in an understandable way. Cancer data clouds, such as The Cancer Genome Atlas (TCGA) [14] or The Cancer Imaging Archive (TCIA) [15] are one of the resources that allow researchers to manage large amounts of datasets and to keep track of the changes. Integration of the existing data has become also mandatory to interpret the results with a global perspective.
In this context, the CAMDA 2019 Hi-Res Cancer Data Integration Challenge aims to develop and demonstrate novel methods for gaining novel biological insights or improving support for Precision Medicine. In this work we aim to establish common patterns at the signaling pathways level which, beyond specific differences due to the tissue of origin, give rise to the generic microenvironment of cancer in breast, lung and kidney tissues. At the same time, we aim to determine the cancer-specific signaling pathways which define the particular features of each disease.

Tumor vs. normal tissue comparisons
The datasets were downloaded and processed as described in methods, and the comparison of tumor vs. normal samples for the levels of activity of the pathways and functions analyzed were applied accordingly. The comparison of the different activities at the subpathway and functional level between tumor and healthy tissue samples returned the number of up-and down-activated significant features in each project shown in Table   1.      Interestingly, when analyzing differential expression of the genes involved in those paths, specific cancer patterns arise. As an example, Figure 3A (top) shows the boxplots representing the distribution of the subpathway AMPK signaling pathway: CCNA2 (the subpathway from the KEGG AMPK signaling pathway with effector protein CCNA2) in tumor and normal samples for each of the projects. A common pattern of up-activation is clear. Figure 3B shows the Hipathia visualization for the same subpathway for the tumor vs. On the other hand, with respect to downregulated GO terms, we find certain functions, such as glucose homeostasis DNA repair processes, that could be related to the hallmarks deregulation of cell energetics and genome instability and mutation respectively. Also, a great number downregulated GO terms are related to the ion levels of the cell, such as sodium export and import from cell, response to calcium ion, regulation of delayed rectifier potassium channel activity or the regulation of intracellular pH. The varying levels of different kinds of ions is oftenly related to changes in the expression levels of ion channeling proteins, which can be related to identify different kinds of cancer and their severity [17].

Number of significant results per project and feature
With respect to the common Uniprot keywords, significant functions include Mitosis (upactivated, see Figure 3A, bottom), Growth arrest (down-activated), Lipid degradation (down-activated), calcium transport (down-activated), Porin (up-activated) and Chromosome partition (up-activated), which, respectively, can be related to hallmarks Enabling replicative immortality, Evading growth suppressors, Deregulating cellular energetics, Sustaining proliferative signaling, Activating invasion and metastasis and Genome instability and mutation.
Also an interesting number of cancer-specific features arise from the analysis. Concretely, Figure 4A shows the number of specific subpathways and functions for each of the projects. The most specific cancer seems to be KIRC, with the greatest number of specific subpathways and functions differentially activated.
Specific subpathways related to KIRC include the up-activated Hippo signaling pathway

Survival-related pathways and functions
After applying the survival pipeline explained in Methods, we found the number of pathways and functions related to survival depicted in Table 2.  Unfortunately, we found no common survival-related features significant in all three cancers at the same time. However, a number of survival-related features common to two of the three cancers were found: 31 subpathways, 6 Gene Ontology functions and 3

Number of significant survival-related features per project
Uniprot keywords. Figure 4B shows the number of survival-related paths shared by each pair of cancer projects by means of an UpSet plot [18].
Among the pairwise common survival-related paths we find subpathwayAMPK signaling pathway: CCNA2, which was commonly up-regulated along the three cancers (see Section Common & Specific features and Figure 3). This subpathwayhas been significantly related to survival in KIRC and LUAD. In both cancers, a higher activity of this pathway is related to a poorer outcome, and a lower activity of the pathway is linked to a better outcome. subsequently normalized with TMM normalization [19] and log transformed.

Pathway & functional level computation
The matrix of normalized gene expression was scaled between 0 and 1, and transformed to a matrix of pathway activation values by means of the Hipathia Bioconductor package [12]. This methodology computes a score representing the activity of each of the analyzed effector subpathways from the gene expression data by means of an iterative algorithm.
An effector subpathway (from now on, subpathway) includes any node in a subpathway ending in a particular effector protein, and determines the joint activity arriving to it.
Hipathia uses the information from the Kyoto Encyclopedia of Genes and Genomes (KEGG)

Survival-related pathways and functions
For each analyzed feature, samples were divided into three groups: 20% of most activated samples, 20% of lowest activated samples and the 60% of remaining samples. An analysis including function survdiff from the survival R package [23] was performed on each feature, which returns a Chi-squared statistic which is used to calculate a p-value. The FDR method [22] is used as above to correct for multiple testing effects. Kaplan -Meier curves [24] were plotted to visualize survival differences among the defined groups.
Pairwise common survival-related features were established by selecting those with a significant p-value in two different projects at the same time. UpSet plots [18] representing the number of overlapping survival-related pathways or functions were created with package UpSetR [25].

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets analysed during the current study are available in the GDC Data Portal repository, http://gdc.cancer.gov.

Competing interests
The authors declare that they have no competing interests.

Funding
This study was co-funded by the European Heatmaps of effector subpathways activity. Samples and paths were ordered following the results of a non-supervised clusterization. Left: BRCA cancer data.